Re: [Wikitech-l] Getting the list of Page Titles and Redirects of Wikipedia

2009-03-18 Thread Aryeh Gregor
On Wed, Mar 18, 2009 at 6:18 AM, Petr Kadlec petr.kad...@gmail.com wrote:
 page_title does not contains the full title, only its
 namespace-relative part. You need to use
 select page_namespace, page_title from wikidb.page
 Only this whole tuple (page_namespace, page_title) is a unique
 identifier of a page (this is true for the whole MediaWiki).

And note that the namespace is stored as a number.  You'll need to
refer to a list of the namespace numbers on the specific wiki you're
dealing with to translate it into the appropriate prefix.  There's a
way to get the list from the API, but I don't know it offhand.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] LocalSettings.php can set messages too?

2009-03-18 Thread Daniel Kinzler
jida...@jidanni.org schrieb:
 Reading http://meta.wikimedia.org/wiki/Help:System_messages, one
 wonders if one can set messages in LocalSettings.php instead of
 editing MediaWiki:Copyrightpage. However I just get 'Call to a member
 function addMessages() on a non-object'.

That's because the object isn't there yet when LocalSettings.php is read. The
config is read *before* initialization, of course. You can register a function
in wgExtensionFunctions, that will get called after setup. you can inject
messages there. But editing the system message on the wiki seems the much nicer
solution, don't you think?

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Andrew Garrett
I am pleased to announce that the Abuse Filter [1] has been activated
on English Wikipedia!

The Abuse Filter is an extension to the MediaWiki [2] software that
powers Wikipedia allowing automatic filters or rules to be run
against every edit, and to take actions if any of those rules are
triggered. It is designed to combat vandalism which is simple and
pattern-based, from blanking pages to complicated evasive page-move
vandalism.

We've already seen some pretty cool uses for the Abuse Filter. While
there are filters for the obvious personal attacks [3], many of our
filters are there just to identify common newbie mistakes such
page-blanking [4], give the users a friendly warning [5] and ask them
if they really want to submit their edits.

The best part is that these friendly soft warning messages seem to
work in passively changing user behaviour. Just the suggestion that we
frown on page-blanking was enough to stop 56 of the 78 matches [6] of
that filter when I checked. If you look closely, you'll even find that
many of the users took our advice and redirected the page or did
something else more constructive instead.

I'm very pleased at my work being used so well on English Wikipedia,
and I'm looking forward to seeing some quality filters in the near
future! While at the moment, some of the harsher actions such as
blocking are disabled on Wikimedia, we're hoping that the filters
developed will be good enough that we can think about activating them
in the future.

If anybody has any questions or concerns about the Abuse Filter, feel
free to file a bug [7], contact me on IRC (werdna on
irc.freenode.net), post on my user talk page, or send me an email at
agarrett at wikimedia.org

[1] http://www.mediawiki.org/wiki/Extension:AbuseFilter
[2] http://www.mediawiki.org
[3] http://en.wikipedia.org/wiki/Special:AbuseFilter/9
[4] http://en.wikipedia.org/wiki/Special:AbuseFilter/3
[5] http://en.wikipedia.org/wiki/MediaWiki:Abusefilter-warning-blanking
[6] http://en.wikipedia.org/w/index.php?title=Special:AbuseLogwpSearchFilter=3
[7] http://bugzilla.wikimedia.org

-- 
Andrew Garrett

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Getting the list of Page Titles and Redirects of Wikipedia

2009-03-18 Thread Roan Kattouw
Aryeh Gregor schreef:
 On Wed, Mar 18, 2009 at 6:18 AM, Petr Kadlec petr.kad...@gmail.com wrote:
 page_title does not contains the full title, only its
 namespace-relative part. You need to use
 select page_namespace, page_title from wikidb.page
 Only this whole tuple (page_namespace, page_title) is a unique
 identifier of a page (this is true for the whole MediaWiki).
 
 And note that the namespace is stored as a number.  You'll need to
 refer to a list of the namespace numbers on the specific wiki you're
 dealing with to translate it into the appropriate prefix.  There's a
 way to get the list from the API, but I don't know it offhand.
 
http://en.wikipedia.org/w/api.php?action=querymeta=siteinfosiprop=namespaces

Note that namespaces with an ID of 100 or higher are specific to enwiki 
and may have different names or not be used at all on other wikis. To 
get an accurate list for another wiki, ask that wiki's api.php .

As for redirects: yes, you'll want to do something like:

SELECT page_namespace, page_title, rd_namespace, rd_title
FROM page LEFT JOIN redirect ON rd_from=page_id;

This'll list all page titles and their redirect targets, with 
rd_namespace and rd_title set to NULL for pages that aren't redirects. 
Note that the redirect table doesn't handle section redirects (like 
redirects to [[Foo#Bar]], which are stored as redirects to [[Foo]]) and 
interwiki redirects (like redirects to [[wikt:dog]], which are stored as 
redirects to [[dog]]) too well and that some redirects may be missing 
from it entirely (IIRC about half a million redirects are missing from 
enwiki's redirect table). Even worse, the data dump you downloaded might 
not even contain the redirect table. You can rebuild the redirect table 
with:

php maintenance/refreshLinks.php --redirects-only

(Use --old-redirects-only to only add missing entries rather than 
checking existing entries for validity as well.)

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] LocalSettings.php can set messages too?

2009-03-18 Thread Roan Kattouw
Daniel Kinzler schreef:
 jida...@jidanni.org schrieb:
 Reading http://meta.wikimedia.org/wiki/Help:System_messages, one
 wonders if one can set messages in LocalSettings.php instead of
 editing MediaWiki:Copyrightpage. However I just get 'Call to a member
 function addMessages() on a non-object'.
 
 That's because the object isn't there yet when LocalSettings.php is read. The
 config is read *before* initialization, of course. You can register a function
 in wgExtensionFunctions, that will get called after setup. you can inject
 messages there. But editing the system message on the wiki seems the much 
 nicer
 solution, don't you think?
 
Yeah, why would you want that? You can protect MediaWiki: pages too 
(although editing them is already restricted to sysops by default), so 
you could protect this particular page so that only you can edit it (add 
a right to $wgRestrictionLevels and make sure only you have it).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Problems with the recent version of Cite Extension

2009-03-18 Thread Roan Kattouw
Gerard Meijssen schreef:
 Hoi,
 What revision number does the working version for REL1_14_0 of Cite have ?
/branches/REL1_14_0/extensions/Cite was touched last in r45574, and the 
function call the error complains about was added in r46271 according to 
Brad's post, so at least this particular error cannot be happening with 
REL1_14 (I've checked that in r45574, the Cite_body.php file where the 
error occurs doesn't even contain the word preview).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Problems with the recent version of Cite Extension

2009-03-18 Thread O. O.
Roan Kattouw wrote:
 Gerard Meijssen schreef:
 Hoi,
 What revision number does the working version for REL1_14_0 of Cite have ?
 /branches/REL1_14_0/extensions/Cite was touched last in r45574, and the 
 function call the error complains about was added in r46271 according to 
 Brad's post, so at least this particular error cannot be happening with 
 REL1_14 (I've checked that in r45574, the Cite_body.php file where the 
 error occurs doesn't even contain the word preview).
 
 Roan Kattouw (Catrope)
Thanks Roan. I don’t think I understood any of the above. I guess I am 
not into the Lingo as yet.
O. O.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Problems with the recent version of Cite Extension

2009-03-18 Thread Aryeh Gregor
On Wed, Mar 18, 2009 at 11:15 AM, Roan Kattouw roan.katt...@home.nl wrote:
 What revision number does the working version for REL1_14_0 of Cite have ?

How is he supposed to answer that?  ExtensionDistributor doesn't give
you a .svn folder.  (Although that's a kind of cool idea.  :) )

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Getting the list of Page Titles and Redirects of Wikipedia

2009-03-18 Thread Petr Kadlec
2009/3/18 O. O. olson...@yahoo.com:
 This is fine, but where can I find information on custom namespaces i.e.
 those that lie above 100.

In $wgExtraNamespaces (see
http://www.mediawiki.org/wiki/Manual:Using_custom_namespaces)

-- [[cs:User:Mormegil | Petr Kadlec]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] [SPAM] Re: Problems with the recent version of Cite Extension

2009-03-18 Thread Michael Daly
Aryeh Gregor wrote:
 On Wed, Mar 18, 2009 at 11:15 AM, Roan Kattouw roan.katt...@home.nl wrote:
 What revision number does the working version for REL1_14_0 of Cite have ?
 
 How is he supposed to answer that?  ExtensionDistributor doesn't give
 you a .svn folder.  (Although that's a kind of cool idea.  :) )
 

Could someone figure out a way of stuffing this into the extension's 
description automatically so that it shows up in the wiki's Version page?

Mike


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Roan Kattouw
Manuel Schneider schreef:
 ACK Lars.
 
 There are also people who - before they can register - first had to arrange 
 their vacation at the office and after that tried to get tickets, and just at 
 the day when the vacation was fixed and the tickets bought they are told that 
 this was all worth nothing.
 
... and there are people who get a ride from someone who lives nearby, 
only to hear that that person was rejected.

To get back on topic: if the problem is financial, get more money. 
Asking participants to pay a small fee is quite normal and would raise 
quite a bit of money with so many attendees (10 euros would be a 
reasonable fee and raise about 1000 euros, assuming the 100 developers 
comment was to be taken literally). I don't see how physical space would 
be a problem: I've never been to the c-base of course, but it looks like 
it'll easily fit a hundred people.

It'd be nice if we could find a solution to fit at least everybody who 
registered before registration closed, and if that involves paying a 
small fee I'd happily do so.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Brion Vibber
On 3/18/09 5:34 AM, Andrew Garrett wrote:
 I am pleased to announce that the Abuse Filter [1] has been activated
 on English Wikipedia!

I've temporarily disabled it as we're seeing some performance problems 
saving edits at peak time today. Need to make sure there's functional 
per-filter profiling before re-enabling so we can confirm if one of the 
55 active filters (!) is particularly bad or if we need to do overall 
optimization.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Robert Rohde
On Wed, Mar 18, 2009 at 12:43 PM, Brion Vibber br...@wikimedia.org wrote:
 On 3/18/09 5:34 AM, Andrew Garrett wrote:
 I am pleased to announce that the Abuse Filter [1] has been activated
 on English Wikipedia!

 I've temporarily disabled it as we're seeing some performance problems
 saving edits at peak time today. Need to make sure there's functional
 per-filter profiling before re-enabling so we can confirm if one of the
 55 active filters (!) is particularly bad or if we need to do overall
 optimization.

For a 45 minute window one specific filter was timing out the server
every time someone try to save a large page like WP:AN/I.

We found and disabled that one, but more detailed load stats would
definitely be useful.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Tim Starling
Brion Vibber wrote:
 On 3/18/09 5:34 AM, Andrew Garrett wrote:
 I am pleased to announce that the Abuse Filter [1] has been activated
 on English Wikipedia!
 
 I've temporarily disabled it as we're seeing some performance problems 
 saving edits at peak time today. Need to make sure there's functional 
 per-filter profiling before re-enabling so we can confirm if one of the 
 55 active filters (!) is particularly bad or if we need to do overall 
 optimization.

Done, took less than five minutes. Re-enabled.

We're still profiling at ~700ms CPU time per page save, with no
particular rule dominant. Disabling 20 of them would help.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Robert Rohde
On Wed, Mar 18, 2009 at 12:59 PM, Tim Starling tstarl...@wikimedia.org wrote:
 Brion Vibber wrote:
 On 3/18/09 5:34 AM, Andrew Garrett wrote:
 I am pleased to announce that the Abuse Filter [1] has been activated
 on English Wikipedia!

 I've temporarily disabled it as we're seeing some performance problems
 saving edits at peak time today. Need to make sure there's functional
 per-filter profiling before re-enabling so we can confirm if one of the
 55 active filters (!) is particularly bad or if we need to do overall
 optimization.

 Done, took less than five minutes. Re-enabled.

 We're still profiling at ~700ms CPU time per page save, with no
 particular rule dominant. Disabling 20 of them would help.

For Andrew or anyone else that knows, can we assume that the filter is
smart enough that if the first part of an AND clause fails then the
other parts don't run (or similarly if the first part of an OR
succeeds)?  If so, we can probably optimize rules by doing easy checks
first before complex ones.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Brion Vibber
On 3/18/09 12:59 PM, Tim Starling wrote:
 Brion Vibber wrote:
 On 3/18/09 5:34 AM, Andrew Garrett wrote:
 I am pleased to announce that the Abuse Filter [1] has been activated
 on English Wikipedia!
 I've temporarily disabled it as we're seeing some performance problems
 saving edits at peak time today. Need to make sure there's functional
 per-filter profiling before re-enabling so we can confirm if one of the
 55 active filters (!) is particularly bad or if we need to do overall
 optimization.

 Done, took less than five minutes. Re-enabled.

 We're still profiling at ~700ms CPU time per page save, with no
 particular rule dominant. Disabling 20 of them would help.

Not bad for a first production pass on the madness that is enwiki! :D

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Micke Nordin
Roan Kattouw roan.kattouw at home.nl writes:

 To get back on topic: if the problem is financial, get more money. 
 Asking participants to pay a small fee is quite normal and would raise 
 quite a bit of money with so many attendees
 [...] 
 It'd be nice if we could find a solution to fit at least everybody who 
 registered before registration closed, and if that involves paying a 
 small fee I'd happily do so.
 
 Roan Kattouw (Catrope)
 

+1 

I wouldn't mind a small fee and I'm sure it would be possible to find a bigger
place if the one we've got is too small.

What kind of help would you need to fit us all in? Money? More people? A bigger
place? I'm sure we can fix what ever is the problem if we try.

/Micke



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Tim Starling
Robert Rohde wrote:
 For Andrew or anyone else that knows, can we assume that the filter is
 smart enough that if the first part of an AND clause fails then the
 other parts don't run (or similarly if the first part of an OR
 succeeds)?  If so, we can probably optimize rules by doing easy checks
 first before complex ones.

No, everything will be evaluated.

Note that the problem with rule 48 was that added_links triggers a
complete parse of the pre-edit page text. It could be replaced by a
check against the externallinks table. No amount of clever shortcut
evaluation would have made it fast.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Daniel Kinzler
Hello all

I understand that your are disappointed. And I'm sad about having to reject
people myself. Which is one reason I hesitated - and made things worse. Do I
suck at organizing a conference? Quite possibly. I's my first time, and I never
asked for the job.

Anyway, I'll try to answer you as good as I can.

Lars Aronsson wrote:
 The organizing of this meet-up follows the same communication 
 disaster as the toolserver project and also the WMF tech staff. 
 First we register, then we hear nothing, now we're told it's 
 impossible. You can't get there from here. Can't be done.

The time it took for registration to go from a few people to more than ever
said that they would be interested was so short, it completely took me by
surprise.  By the time we had decided to stop registration early (which was too
late), we were already badly overbooked.

So, if I ever get into the position again of organizing something like this,
there's one round of registration, at least two months before the event, and
mandatory. That way, we know exactly what to plan for.

 I asked earlier how the Swedish chapter could help the toolserver 
 project, and all I got was that 8000 euro (which we don't have 
 available) could help to buy a new server, but smaller amounts 
 like 80 or 800 euro could not be useful at all.  

Well, who did you ask? I'd be the person to talk to.

But it's true that we can't handle small purpose-bound donations, they'd need to
go toWMDE in general. Contributing money to the toolserver as such isn't
possible, because the toolserver is not an organization and has no account.
Contributing hardware would be great, but that requires bigger sums, usually. Do
we need a Toolserver Inc. To solve this problem? Well, I fear it would create
more problems that it selves. But it's worth considering.

 Now you're out of 
 funds for buying food. Maybe 800 euro could be useful after all? 

We did get additional funds for food and travel from three sources, on very
short notice, to be able to allow more people to come. A contribution to the
travel budget would still be welcome.

 Maybe people can pay for their own food? No, you're not asking for 
 that, instead you're telling people (who might already have bought 
 air tickets) to stay home. Incredible! Berlin is full. Go home!

If money was the only problem, that would have been an option, though I would
have felt a bit bad about this too. The main problem is scaling the event as
such - making it be more than a bunch of people in a room. Maybe there are
people who can do that on their own, for 200 people, on short notice. I can't.

Also... who has bought airline tickets and i told to stay home?

 I have made it clear that I'm arriving in Berlin already on 
 Thursday April 2. I know Berlin. I speak German. Maybe I can help 
 to receive people and distribute info packages?  No, all you can 
 do is say you're overbooked. You never asked for volunteers.

We are organizing volunteers currently. And I did get someone to help me with
moderating the actual event, also on short notice, which I'm very grateful for.
Otherwise we would have had to reject more people.

 Guess what, Jens Frank has been promised 15,000 euro from WMDE to 
 organize a map toolserver. One very important item on his agenda 
 is to host a map toolserver meet-up as part of this developer 
 meet-up. Maybe some of his money can be used to help you. Did you 
 ask him?

Jens appears to be a bit hard to reach, but I did coordinate with the OSM
people, yes. Including their meeting added some more people - I'm very happy to
have them, but it made it even harder for me to fit everything in. But did
indeed organize some additional funding for getting the OSM people to Berlin.

 We're not out of resources and Berlin is big.

Yes, Berlin is big. So, what do you suggest?

Lars Aronsson wrote:
 I don't see how physical space would
 be a problem: I've never been to the c-base of course, but it looks like
 it'll easily fit a hundred people.

Standing up and drinking beer, yes. Having a conference and getting results, no.
I expected max 40 people. That number was, and still is, exceeded by far.

 It'd be nice if we could find a solution to fit at least everybody who
 registered before registration closed, and if that involves paying a
 small fee I'd happily do so.

Somehow I have the feeling that the outcry that would cause wouldn't be much
smaller than now. But as I said: money is a problem, but not the problem.
Ultimately, our (and especially my own) ability to manage and run the event is,
as well as the venue.  The c-base is great, because it's a good place for
working *and* for the party after. But for a conference with more than 50
people, I wouldn't have chosen it.

Micke Nordin wrote:
 What kind of help would you need to fit us all in? Money? More people? A 
 bigger
 place? I'm sure we can fix what ever is the problem if we try.

All of the above, but more importantly: more time.

We have committed to the venue, 

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread jidanni
AG frown on page-blanking

For now I just stop them on my wikis with
$wgSpamRegex=array('/^\B$/');
I haven't tried fancier solutions yet.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Brion Vibber
On 3/18/09 1:42 PM, Daniel Kinzler wrote:
 I understand that your are disappointed. And I'm sad about having to reject
 people myself. Which is one reason I hesitated - and made things worse. Do I
 suck at organizing a conference? Quite possibly. I's my first time, and I 
 never
 asked for the job.

I think it's going pretty well for a first time with an unexpectedly 
high response. :) Let us know if we can help...

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Did we just borkenate the secure server?

2009-03-18 Thread Brion Vibber
On 3/17/09 12:56 AM, George Herbert wrote:
 I stopped getting page load replies for about 4-5 min, and they're coming
 through now but taking several minutes...

 Main unencrypted site is unusually slow right now, but working.

 DB issue?  Network?  Secure server cough up a hairball?

Seems fine presently; don't see anything suspicious in admin logs or 
that server's load graphs. Probably a temporary hiccup.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Daniel Kinzler
Brion Vibber schrieb:
 On 3/18/09 1:42 PM, Daniel Kinzler wrote:
 I understand that your are disappointed. And I'm sad about having to reject
 people myself. Which is one reason I hesitated - and made things worse. Do I
 suck at organizing a conference? Quite possibly. I's my first time, and I 
 never
 asked for the job.
 
 I think it's going pretty well for a first time with an unexpectedly 
 high response. :) Let us know if we can help...

Thanks.

Until a few hours ago, it went a bit *too* well, I guess. Which was, in fact,
the problem.

Life is so odd.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Brian
This extension is very important for training  machine learning
vandalism detection bots. Recently published systems use only hundreds
of examples of vandalism in training - not nearly enough to
distinguish between the variety found in Wikipedia or generalize to
new, unseen forms of vandalism. A large set of human created rules
could be run against all previous edits in order to create a massive
vandalism dataset. If one includes positive and negative types of
vandalism in training practically the entire text of the history of
wikipedia can be used in the training set, possibly creating a
remarkable bot.

On Wed, Mar 18, 2009 at 6:34 AM, Andrew Garrett agarr...@wikimedia.org wrote:
 I am pleased to announce that the Abuse Filter [1] has been activated
 on English Wikipedia!

 The Abuse Filter is an extension to the MediaWiki [2] software that
 powers Wikipedia allowing automatic filters or rules to be run
 against every edit, and to take actions if any of those rules are
 triggered. It is designed to combat vandalism which is simple and
 pattern-based, from blanking pages to complicated evasive page-move
 vandalism.

 We've already seen some pretty cool uses for the Abuse Filter. While
 there are filters for the obvious personal attacks [3], many of our
 filters are there just to identify common newbie mistakes such
 page-blanking [4], give the users a friendly warning [5] and ask them
 if they really want to submit their edits.

 The best part is that these friendly soft warning messages seem to
 work in passively changing user behaviour. Just the suggestion that we
 frown on page-blanking was enough to stop 56 of the 78 matches [6] of
 that filter when I checked. If you look closely, you'll even find that
 many of the users took our advice and redirected the page or did
 something else more constructive instead.

 I'm very pleased at my work being used so well on English Wikipedia,
 and I'm looking forward to seeing some quality filters in the near
 future! While at the moment, some of the harsher actions such as
 blocking are disabled on Wikimedia, we're hoping that the filters
 developed will be good enough that we can think about activating them
 in the future.

 If anybody has any questions or concerns about the Abuse Filter, feel
 free to file a bug [7], contact me on IRC (werdna on
 irc.freenode.net), post on my user talk page, or send me an email at
 agarrett at wikimedia.org

 [1] http://www.mediawiki.org/wiki/Extension:AbuseFilter
 [2] http://www.mediawiki.org
 [3] http://en.wikipedia.org/wiki/Special:AbuseFilter/9
 [4] http://en.wikipedia.org/wiki/Special:AbuseFilter/3
 [5] http://en.wikipedia.org/wiki/MediaWiki:Abusefilter-warning-blanking
 [6] 
 http://en.wikipedia.org/w/index.php?title=Special:AbuseLogwpSearchFilter=3
 [7] http://bugzilla.wikimedia.org

 --
 Andrew Garrett

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Google Summer of Code 2009

2009-03-18 Thread Brion Vibber
On 3/10/09 5:17 PM, Brion Vibber wrote:
 I’ve just put in Wikimedia’s org application for Google Summer of Code
 2009… Hopefully we’ll get in. :)

 http://www.mediawiki.org/wiki/Summer_of_Code_2009

We're officially in! Woo!

Student applications will open *starting* March 23, *ending* April 3. 
This intermediate week before applications open is a good time to chat 
us up with your ideas and try to pair up with potential mentors. :)

http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Manuel Schneider
well, as I already bought the ticket - though I didn't make sense to register 
after this news - I will be in Berlin anyway. I'm taking care for 
participants of the chapters meeting the week before (showing around in 
Germany) and will be in Berlin at Thursday afternoon. I know the locations 
and the city quite well, so if I may help somewhere I am happy to do so.


-- 
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] developer meet-up is out of room

2009-03-18 Thread Daniel Kinzler
Hi Manuel

Manuel Schneider schrieb:
 well, as I already bought the ticket - though I didn't make sense to register 
 after this news - I will be in Berlin anyway. I'm taking care for 
 participants of the chapters meeting the week before (showing around in 
 Germany) and will be in Berlin at Thursday afternoon. I know the locations 
 and the city quite well, so if I may help somewhere I am happy to do so.

Too bad. I guess with Wikimedia, the usual only half the people who say they
are interested actually show has to be replaced by be prepared for twice as
many. I don't think I'll forget that lesson.

Anyway, thanks for the offer, and I hope you'll come to party anyway. And as to
showing people around town: I suppose a lot of people would like to do that, I
think something is being planned for Friday evening. But Guillom organizes all
that, as far as I know. We'll meet tomorrow to discuss details.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] LocalSettings.php can set messages too?

2009-03-18 Thread Ilmari Karonen
Roan Kattouw wrote:

 Yeah, why would you want that? You can protect MediaWiki: pages too 
 (although editing them is already restricted to sysops by default), so 
 you could protect this particular page so that only you can edit it (add 
 a right to $wgRestrictionLevels and make sure only you have it).

Presumably he's not using memcached, and would like to avoid the extra 
database hits.  For such uses, it'd be kind of nice to have a simple way 
of customizing system messages via LocalSettings.php.

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] LocalSettings.php can set messages too?

2009-03-18 Thread Daniel Kinzler
Ilmari Karonen schrieb:
 Roan Kattouw wrote:
 Yeah, why would you want that? You can protect MediaWiki: pages too 
 (although editing them is already restricted to sysops by default), so 
 you could protect this particular page so that only you can edit it (add 
 a right to $wgRestrictionLevels and make sure only you have it).
 
 Presumably he's not using memcached, and would like to avoid the extra 
 database hits.  For such uses, it'd be kind of nice to have a simple way 
 of customizing system messages via LocalSettings.php.

That db will be hit anyway, it needs to see if the message was overridden by
editing the wiki page.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Problems with the recent version of Cite Extension

2009-03-18 Thread O. O.
Aryeh Gregor wrote:
 On Mon, Mar 16, 2009 at 7:47 PM, O. O. olson...@yahoo.com wrote:
 I have installed Mediawiki 1.14.0 http://www.mediawiki.org/wiki/Download
 and am trying to get the  Cite Extension
 http://www.mediawiki.org/wiki/Extension:Cite version 1.14.0 to work.

 When accessing the Main_Page I get the error:

 Fatal error: Call to undefined method
 ParserOptions::getIsSectionPreview() in
 /var/www/wiki2/extensions/Cite/Cite_body.php on line 699

 I can however get the 1.13.0 version of the Cite Extension to work.
 
 When I download it from
 http://www.mediawiki.org/wiki/Special:ExtensionDistributor/Cite,
 selecting 1.14, that line is blank.  When you download 1.14, what's on
 line 699 of Cite_body.php where the error occurs?
 

Hi Aryeh.

I think I made some big mistakes in my posts for this entire Thread. My 
mistake lied in the fact that I assumed on the Extension Distributor 
http://www.mediawiki.org/wiki/Special:ExtensionDistributor/Cite  in the 
Drop Down “Current version (trunk)” was the same as the “1.14.x”. This 
is not the same and I unfortunately did not realize this.

So, when I was downloading the “Current version (trunk)” I was 
downloading r48452 and I got the same error that I was reporting before 
i.e.

Fatal error: Call to undefined method 
ParserOptions::getIsSectionPreview() in 
/var/www/wiki2/extensions/Cite/Cite_body.php on line 699

For me I am getting the line 699 in Cite_body.php as

if ( $parser-getOptions()-getIsSectionPreview() ) return true;


It is however not true that 1.14.x gives this error. It does not. When I 
downloaded it gives r45577 as against r48452. Previously I simply 
ignored these numbers. Sorry for the confusion. So this works now.

I started this thread because Robert mentioned in my previous thread 
“HTML not Rendered correctly after Import of Wikipedia” that some of the 
HTML may be badly formatted because I did not have the correct version 
of the Cite Extension. Now with the 1.14.x I still get the same problems 
with the HTML. Anyway that’s the subject of the other thread – so I 
don’t want to continue it here. Also for most part Tidy cleans the HTML 
up if it is enabled.

Thanks again guys,
O. O.



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Soxred93
However, that simply disallows them all. On enwiki, the blanking  
filter warns the user, and lets them go through with it after  
confirmation.

X!

On Mar 18, 2009, at 4:51 PM [Mar 18, 2009 ], jida...@jidanni.org wrote:

 AG frown on page-blanking

 For now I just stop them on my wikis with
 $wgSpamRegex=array('/^\B$/');
 I haven't tried fancier solutions yet.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Getting the list of Page Titles and Redirects of Wikipedia

2009-03-18 Thread Aryeh Gregor
On Wed, Mar 18, 2009 at 11:06 AM, Roan Kattouw roan.katt...@home.nl wrote:
 http://en.wikipedia.org/w/api.php?action=querymeta=siteinfosiprop=namespaces

 Note that namespaces with an ID of 100 or higher are specific to enwiki
 and may have different names or not be used at all on other wikis. To
 get an accurate list for another wiki, ask that wiki's api.php .

The same is pretty much true for all namespaces.  There's no guarantee
that any namespaces other than main will have the same names on other
wikis.  To ensure that, you need to use the canonical name if one
exists (it's helpfully provided in the API result . . . actually, what
does it mean that Portal and Portal talk are canonical? shouldn't
there be no canonical attribute if the namespace is custom?).

In particular, Wikipedia and Wikipedia talk will likely not work
on most other wikis.

On Wed, Mar 18, 2009 at 11:54 AM, O. O. olson...@yahoo.com wrote:
 Thanks Petr and Aryeh for getting back. From the Documentation at
 http://www.mediawiki.org/wiki/Page_table and
 http://meta.wikimedia.org/wiki/Help:Namespace you can get the names of
 the Real and Virtual Namespeaces in includes/Defines.php and then get
 what text they convert to in English using
 languages/messages/MessagesEn.php.

 This is fine, but where can I find information on custom namespaces i.e.
 those that lie above 100.

Use Roan's link:

http://en.wikibooks.org/w/api.php?action=querymeta=siteinfosiprop=namespaces

You might prefer this to Defines.php/MessagesEn.php.  Those will give
you the canonical names, which will always work, but which might not
be the ones used on Wikipedia.  For instance, namespace 4 is
canonically Project, but on Wikipedia the normal name for it is
Wikipedia.  Project URLs will work on Wikipedia, but automatically
redirect to Wikipedia.  E.g.,

http://en.wikipedia.org/wiki/Project:WikiProject_Dorset

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] LocalSettings.php can set messages too?

2009-03-18 Thread Aryeh Gregor
On Wed, Mar 18, 2009 at 6:22 PM, Daniel Kinzler dan...@brightbyte.de wrote:
 That db will be hit anyway, it needs to see if the message was overridden by
 editing the wiki page.

Not if you disable $wgUseDatabaseMessages.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Problems with the recent version of Cite Extension

2009-03-18 Thread Aryeh Gregor
On Wed, Mar 18, 2009 at 6:51 PM, O. O. olson...@yahoo.com wrote:
        I think I made some big mistakes in my posts for this entire Thread. My
 mistake lied in the fact that I assumed on the Extension Distributor
 http://www.mediawiki.org/wiki/Special:ExtensionDistributor/Cite  in the
 Drop Down “Current version (trunk)” was the same as the “1.14.x”. This
 is not the same and I unfortunately did not realize this.

This is our fault for sloppy wording.  In r48552 I renamed Current
version to Development version.  I didn't immediately see how to
make 1.14.x the default download, but it should be.  Most people do
not want trunk.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Getting the list of Page Titles and Redirects of Wikipedia

2009-03-18 Thread Platonides
Aryeh Gregor wrote:
 On Wed, Mar 18, 2009 at 11:06 AM, Roan Kattouw roan.katt...@home.nl wrote:
 http://en.wikipedia.org/w/api.php?action=querymeta=siteinfosiprop=namespaces

 Note that namespaces with an ID of 100 or higher are specific to enwiki
 and may have different names or not be used at all on other wikis. To
 get an accurate list for another wiki, ask that wiki's api.php .
 
 The same is pretty much true for all namespaces.  There's no guarantee
 that any namespaces other than main will have the same names on other
 wikis.  To ensure that, you need to use the canonical name if one
 exists (it's helpfully provided in the API result . . . actually, what
 does it mean that Portal and Portal talk are canonical? shouldn't
 there be no canonical attribute if the namespace is custom?).

Agree. Portal and Portal talk could still be acceptable, since the
namespace ids 100-101 are more or less reserved for portals across the
wikis.
What is scaryier is seeing ns id=102 canonical=Cookbook on
enwikibooks whereas the same ns 102 mean Wikiproject on some pedias.

Since the API provides namespacealiases linked to the id, not to the
informal canonical name I see no reason to keep the canonical
parameter on the extra ns.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Platonides
Tim Starling wrote:
 Robert Rohde wrote:
 For Andrew or anyone else that knows, can we assume that the filter is
 smart enough that if the first part of an AND clause fails then the
 other parts don't run (or similarly if the first part of an OR
 succeeds)?  If so, we can probably optimize rules by doing easy checks
 first before complex ones.
 
 No, everything will be evaluated.
 
 Note that the problem with rule 48 was that added_links triggers a
 complete parse of the pre-edit page text. It could be replaced by a
 check against the externallinks table. No amount of clever shortcut
 evaluation would have made it fast.
 
 -- Tim Starling

With branch optimization, placing the check !(autoconfirmed in
USER_GROUPS) and namespace at the beginning would avoid checking the
added_links at all (and thus the parse).

Another option could be to automatically optimize based on the cost of
each rule.

PS: Why there isn't a link to Special:AbuseFilter/history/$id on the
filter view?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Andrew Garrett
Tim Starling wrote:
 Robert Rohde wrote:
 For Andrew or anyone else that knows, can we assume that the filter is
 smart enough that if the first part of an AND clause fails then the
 other parts don't run (or similarly if the first part of an OR
 succeeds)?  If so, we can probably optimize rules by doing easy checks
 first before complex ones.

 No, everything will be evaluated.

I've written and deployed branch optimisation code, which reduced
run-time by about one third.

 Note that the problem with rule 48 was that added_links triggers a
 complete parse of the pre-edit page text. It could be replaced by a
 check against the externallinks table. No amount of clever shortcut
 evaluation would have made it fast.

I've fixed this to use the DB instead for that particular context.

On Thu, Mar 19, 2009 at 11:54 AM, Platonides platoni...@gmail.com wrote:
 PS: Why there isn't a link to Special:AbuseFilter/history/$id on the
 filter view?

There is.

I've disabled a filter or two which were taking well in excess of
150ms to run, and seemed to be targetted at specific vandals, without
any hits. The culprit seemed to be running about 20 regexes to
determine if an IP is in a particular range, where one call to
ip_in_range would suffice. Of course, this is also a documentation
issue which I'm working on.

To help a bit more with performance, I've also added a profiler within
the interface itself. Hopefully this will encourage self-policing with
regard to filter performance.

-- 
Andrew Garrett

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

2009-03-18 Thread Robert Rohde
On Wed, Mar 18, 2009 at 8:00 PM, Andrew Garrett and...@epstone.net wrote:
snip
 I've disabled a filter or two which were taking well in excess of
 150ms to run, and seemed to be targetted at specific vandals, without
 any hits. The culprit seemed to be running about 20 regexes to
 determine if an IP is in a particular range, where one call to
 ip_in_range would suffice. Of course, this is also a documentation
 issue which I'm working on.
snip

ip_in_range
rmwhitespace
rmspecials
? :
if then else end
contains

and probably some others appear in SVN but not in the dropdown list
that I assume most people are using to locate options.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Serving as xhtml+xml

2009-03-18 Thread lee worden
I'm at work on a MW extension that, among other things, uses LaTeXML [1] 
to make XHTML from full LaTeX documents.  One feature is the option to 
render the equations in MathML, which requires the skins to be patched so 
that they output the page as Content-type: application/xhtml+xml instead 
of text/html.


Attached is a patch for the skins directory that allows changing the 
Content-type dynamically.  After applying this patch, if any code sets the 
global $wgServeAsXHTML to true, the page will be output with the xhtml+xml 
content type.  This seems to work fine with the existing MW XHTML pages.


This has been done before, for instance in the ASCIIMath4Wiki extension 
[2].  I don't want to change the Content-type unconditionally, though, 
only some of the time, so that we can serve texvc-style images to browsers 
or users that don't like the modified content type.


It should be possible to use this patch without breaking any existing 
systems (unless someone else's extension happens to use the same global 
variable name, I guess).


The patch is made on the 1:1.13.3-1ubuntu1 mediawiki package (from Ubuntu 
9.04), and only modifies Monobook.php and Modern.php.  There are other 
skins in my installation here, but they don't seem to work very well and I 
didn't see where to make the change.


Is there a better way to make MathML work in MW?  Might this option be 
included in a future MW release?  Any feedback or alternative suggestions 
is welcome.


Lee Worden
McMaster University Dept of Biology

[1] http://dlmf.nist.gov/LaTeXML/
[2] http://www.mediawiki.org/wiki/Extension:ASCIIMath4Wiki

ps. I'm not sure if this list accepts attachments - if not I'll be happy 
to send it to people on request.___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Serving as xhtml+xml

2009-03-18 Thread Andrew Garrett
2009/3/19 lee worden won...@riseup.net:
 I'm at work on a MW extension that, among other things, uses LaTeXML [1] to
 make XHTML from full LaTeX documents.  One feature is the option to render
 the equations in MathML, which requires the skins to be patched so that they
 output the page as Content-type: application/xhtml+xml instead of text/html.

 Attached is a patch for the skins directory that allows changing the
 Content-type dynamically.  After applying this patch, if any code sets the
 global $wgServeAsXHTML to true, the page will be output with the xhtml+xml
 content type.  This seems to work fine with the existing MW XHTML pages.

It should be done with the Parser Output instead.

This mailing list does not accept attachments

-- 
Andrew Garrett

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l