Re: [Wikitech-l] Extension:OpenID 3.00 - Security Release

2013-03-07 Thread Petr Bena
This is indeed a problem but given that rename permissions are granted
by default to bureaucrats who are most trusted users, and on small
wikis typically sysadmins with shell access, this shouldn't be very
dangerous. Sysadmin with shell access will be able to steal your
identity anyway.

It's a problem in case of large wikis like these on wmf

On Fri, Mar 8, 2013 at 2:19 AM, Ryan Lane  wrote:
> *Marc-Andre Pelletier discovered a vulnerability in the MediaWiki OpenID
> extension for the case that MediaWiki is used as a “provider” and the wiki
> allows renaming of users.
>
> All previous versions of the OpenID extension used user-page URLs as
> identity URLs. On wikis that use the OpenID extension as “provider” and
> allows user renames, an attacker with rename privileges could rename a user
> and could then create an account with the same name as the victim. This
> would have allowed the attacker to steal the victim’s OpenID identity.
>
> Version 3.00 fixes the vulnerability by using Special:OpenIDIdentifier/
> as the user’s identity URL,  being the immutable MediaWiki-internal
> userid of the user. The user’s old identity URL, based on the user’s
> user-page URL, will no longer be valid.
>
> The user’s user page can still be used as OpenID identity URL, but will
> delegate to the special page.
>
> This is a breaking change, as it changes all user identity URLs. Providers
> are urged to upgrade and notify users, or to disable user renaming.
>
> Respectfully,
>
> Ryan Lane
>
> https://gerrit.wikimedia.org/r/#/c/52722
> Commit: f4abe8649c6c37074b5091748d9e2d6e9ed452f2*
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Redis with SSDs

2013-03-07 Thread George Herbert
On Thu, Mar 7, 2013 at 2:16 PM, Tyler Romeo  wrote:
> Interesting article I found about Redis and its poor performance with SSDs
> as a swap medium. For whoever might be interested.
>
> http://antirez.com/news/52

This was not particularly insightful or useful; Redis swapping is
known poor, and swapping to SSD is only slightly less bad than
swapping to spinning hard drives.

Also, is horrible for SSD longevity.  Lots of small random writes
throughout the disk?

This would be the "wrong tool" test.


-- 
-george william herbert
george.herb...@gmail.com

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Extension:OpenID 3.00 - Security Release

2013-03-07 Thread Ryan Lane
*Marc-Andre Pelletier discovered a vulnerability in the MediaWiki OpenID
extension for the case that MediaWiki is used as a “provider” and the wiki
allows renaming of users.

All previous versions of the OpenID extension used user-page URLs as
identity URLs. On wikis that use the OpenID extension as “provider” and
allows user renames, an attacker with rename privileges could rename a user
and could then create an account with the same name as the victim. This
would have allowed the attacker to steal the victim’s OpenID identity.

Version 3.00 fixes the vulnerability by using Special:OpenIDIdentifier/
as the user’s identity URL,  being the immutable MediaWiki-internal
userid of the user. The user’s old identity URL, based on the user’s
user-page URL, will no longer be valid.

The user’s user page can still be used as OpenID identity URL, but will
delegate to the special page.

This is a breaking change, as it changes all user identity URLs. Providers
are urged to upgrade and notify users, or to disable user renaming.

Respectfully,

Ryan Lane

https://gerrit.wikimedia.org/r/#/c/52722
Commit: f4abe8649c6c37074b5091748d9e2d6e9ed452f2*
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread bawolff
On 2013-03-07 4:06 PM, "Matthew Flaschen"  wrote:
>
> On 03/07/2013 12:00 PM, Antoine Musso wrote:
> > Le 06/03/13 23:58, Federico Leva (Nemo) a écrit :
> >> There's slow-parse.log, but it's private unless a solution is found for
> >> https://gerrit.wikimedia.org/r/#/c/49678/
> >> https://wikitech.wikimedia.org/wiki/Logs
> >
> > And slow-parse.log is probably going to be kept private unless proven it
> > is not harmful =)
>
> Why would it be harmful for public wikis?  Anyone can do this on an
> article-by-article basis by copying the source their own MediaWiki
> instances.
>
> But it ends up being repeated work.
>
> Matt
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

+1 . I have trouble imagining how making this public could be harmful.
There are plenty of well known slow to parse pages already. There's also
more than a couple of ways to convince mw to make slow queries (longer than
the php time limit), we publically release detailed profiling data, etc.
Well that sort of thing isnt exactly proclaimed to the world, its also not
a secret. If someone wanted to find slow points on mediawiki, theres a lot
worse things just floating around the internet than a slow to parse page
list.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Asher Feldman
On Thu, Mar 7, 2013 at 3:57 PM, Tim Starling wrote:

> On 07/03/13 12:12, Asher Feldman wrote:
> > Ori - I think this has been discussed but automated xhprof configuration
> as
> > part of the vagrant dev env setup would be amazing :)
>
> I don't think xhprof is the best technology for PHP profiling. I
> reported a bug a month ago which causes the times it reports to be
> incorrect by a random factor, often 4 or so. No response so far. And
> its web interface is packed full of XSS vulnerabilities. XDebug +
> KCacheGrind is quite nice.


That's disappointing, I wonder if xhprof has become abandonware since
facebook moved away from zend.  Have you looked at Webgrind (
http://code.google.com/p/webgrind/)?  If not, I'd love to see it at least
get a security review.  KCacheGrind is indeed super powerful and nice, and
well suited to a dev vm.  I'm still interested in this sort of profiling
for a very small percentage of production requests though, such as 0.1% of
requests hitting a single server.  Copying around cachegrind files and
using KCacheGrind wouldn't be very practical.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Tim Starling
On 07/03/13 12:12, Asher Feldman wrote:
> Ori - I think this has been discussed but automated xhprof configuration as
> part of the vagrant dev env setup would be amazing :)

I don't think xhprof is the best technology for PHP profiling. I
reported a bug a month ago which causes the times it reports to be
incorrect by a random factor, often 4 or so. No response so far. And
its web interface is packed full of XSS vulnerabilities. XDebug +
KCacheGrind is quite nice.

-- Tim Starling



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits

2013-03-07 Thread Chris Steipp
On Thu, Mar 7, 2013 at 1:34 PM, Platonides  wrote:
> On 07/03/13 21:03, anubhav agarwal wrote:
>> Hey Chris
>>
>> I was exploring SpamBlaklist Extension. I have some doubts hope you could
>> clear them.
>>
>> Is there any place I can get documentation of
>> Class SpamBlacklist in the file SpamBlacklist_body.php. ?

There really isn't any documentation besides the code, but a couple
more things you should look at. Notice that in SpamBlacklist.php,
there is the line "$wgHooks['EditFilterMerged'][] =
'SpamBlacklistHooks::filterMerged';", which is the way that
SpamBlacklist registers itself with MediaWiki core to filter edits. So
when MediaWiki core runs the EditFilterMerged hooks (which it does in
includes/EditPage.php, line 1287), all of the extensions that have
registered a function for that hook are run with the passed in
arguments, so SpamBlacklistHooks::filterMerged is run. And
SpamBlacklistHooks::filterMerged then just sets up and calls
SpamBlacklist::filter. So that is where you can start tracing what is
actually in the variables, in case Platonides summary wasn't enough.


>>
>> In function filter what does the following variables represent ?
>>
>> $title
> Title object (includes/Title.php) This is the page where it tried to save.
>
>> $text
> Text being saved in the page/section
>
>> $section
> Name of the section or ''
>
>> $editpage
> EditPage object if EditFilterMerged was called, null otherwise
>
>> $out
>
> A ParserOutput class (actually, this variable name was a bad choice, it
> looks like a OutputPage), see includes/parser/ParserOutput.php
>
>
>> I have understood the following things from the code, please correct me if
>> I am wrong. It extracts the edited text, and parse it to find the links.
>
> Actually, it uses the fact that the parser will have processed the
> links, so in most cases just obtains that information.
>
>
>> It then replaces the links which match the whitelist regex,
> This doesn't make sense as you explain it. It builds a list of links,
> and replaces whitelisted ones with '', ie. removes whitelisted links
> from the list.
>
>> and then checks if there are some links that match the blacklist regex.
> Yes
>
>> If the check is greater you return the content matched.
>
> Right, $check will be non-0 if the links matched the blacklist.
>
>> it already enters in the debuglog if it finds a match
>
> Yes, but that is a private log.
> Bug 1542 talks about making that accesible in the wiki.

Yep. For example, see
* https://en.wikipedia.org/wiki/Special:Log
* https://en.wikipedia.org/wiki/Special:AbuseLog

>
>
>> I guess the bug aims at creating a sql table.
>> I was thinking of the following fields to log.
>> Title, Text, User, URLs, IP. I don't understand why you denied it.
>
> Because we don't like to publish the IPs *in the wiki*.

The WMF privacy policy also discourages us from keeping IP addresses
longer than 90 days, so if you do keep IPs, then you need a way to
hide / purge them, and if they allow someone to see what IP address a
particular username was using, then only users with checkuser
permissions are allowed to see that. So it would be easier for you not
to include it, but if it's desired, then you'll just have to build
those protections out too.

>
> I think the approach should be to log matches using abusefilter
> extension if that one is loaded.

The abusefilter log format has a lot of data in it specific to
AbuseFilter, and is used to re-test abuse filters, so adding these
hits into that log might cause some issues. I think either the general
log, or using a separate, new log table would be best. Just for some
numbers, in the first 7 days of this month, we've had an average of
27,000 hits each day. So if this goes into an existing log, it's going
to generate a significant amount of data.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Redis with SSDs

2013-03-07 Thread Tyler Romeo
Interesting article I found about Redis and its poor performance with SSDs
as a swap medium. For whoever might be interested.

http://antirez.com/news/52
*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Munagala Ramanath
(1) seems like the right way to go to me too.

There may be other ways but puppet/files/lucene/lucene.jobs.sh has a
function called
import-db() which creates a dump like this:

   php $MWinstall/common/multiversion/MWScript.php dumpBackup.php $dbname
--current > $dumpfile

Ram


On Thu, Mar 7, 2013 at 1:05 PM, Daniel Kinzler  wrote:

> On 07.03.2013 20:58, Brion Vibber wrote:
> >> 3) The indexer code (without plugins) should not know about Wikibase,
> but it may
> >> have hard coded knowledge about JSON. It could have a special indexing
> mode for
> >> JSON, in which the structure is deserialized and traversed, and any
> values are
> >> added to the index (while the keys used in the structure would be
> ignored). We
> >> may still be indexing useless interna from the JSON, but at least there
> would be
> >> a lot fewer false negatives.
> >
> > Indexing structured data could be awesome -- again I think of file
> > metadata as well as wikidata-style stuff. But I'm not sure how easy
> > that'll be. Should probably be in addition to the text indexing,
> > rather than replacing.
>
> Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want
> indexed structured data, the question is just how to get that into the
> LSearch
> infrastructure.
>
> -- daniel
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Thomas Gries
Am 07.03.2013 21:09, schrieb Petr Bena:
> ah ok I was confused by it being flagged stable
>
>
Yes. It *is* stable, at least since I took over the maintenance a long
time ago.

This does not say, that it cannot be further improved.

Currently I am very busy adding new necessary features to the user
interface (preferences),
which can already be seen at
http://openid-wiki.instance-proxy.wmflabs.org/wiki/ .
Some new patches are in the pipe and will be published in the next days.

The manual page is fully reflecting the current status.
I am always looking for developers who install the extension in their
wikis and send us their feedback - and file bug reports if needed.

Tom
Maintainer of E:OpenID



signature.asc
Description: OpenPGP digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits

2013-03-07 Thread Platonides
On 07/03/13 21:03, anubhav agarwal wrote:
> Hey Chris
> 
> I was exploring SpamBlaklist Extension. I have some doubts hope you could
> clear them.
> 
> Is there any place I can get documentation of
> Class SpamBlacklist in the file SpamBlacklist_body.php. ?
> 
> In function filter what does the following variables represent ?
> 
> $title
Title object (includes/Title.php) This is the page where it tried to save.

> $text
Text being saved in the page/section

> $section
Name of the section or ''

> $editpage
EditPage object if EditFilterMerged was called, null otherwise

> $out

A ParserOutput class (actually, this variable name was a bad choice, it
looks like a OutputPage), see includes/parser/ParserOutput.php


> I have understood the following things from the code, please correct me if
> I am wrong. It extracts the edited text, and parse it to find the links.

Actually, it uses the fact that the parser will have processed the
links, so in most cases just obtains that information.


> It then replaces the links which match the whitelist regex, 
This doesn't make sense as you explain it. It builds a list of links,
and replaces whitelisted ones with '', ie. removes whitelisted links
from the list.

> and then checks if there are some links that match the blacklist regex.
Yes

> If the check is greater you return the content matched. 

Right, $check will be non-0 if the links matched the blacklist.

> it already enters in the debuglog if it finds a match

Yes, but that is a private log.
Bug 1542 talks about making that accesible in the wiki.


> I guess the bug aims at creating a sql table.
> I was thinking of the following fields to log.
> Title, Text, User, URLs, IP. I don't understand why you denied it.

Because we don't like to publish the IPs *in the wiki*.

I think the approach should be to log matches using abusefilter
extension if that one is loaded.
I concur that it seems too hard to begin with.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Daniel Kinzler
On 07.03.2013 20:58, Brion Vibber wrote:
>> 3) The indexer code (without plugins) should not know about Wikibase, but it 
>> may
>> have hard coded knowledge about JSON. It could have a special indexing mode 
>> for
>> JSON, in which the structure is deserialized and traversed, and any values 
>> are
>> added to the index (while the keys used in the structure would be ignored). 
>> We
>> may still be indexing useless interna from the JSON, but at least there 
>> would be
>> a lot fewer false negatives.
> 
> Indexing structured data could be awesome -- again I think of file
> metadata as well as wikidata-style stuff. But I'm not sure how easy
> that'll be. Should probably be in addition to the text indexing,
> rather than replacing.

Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want
indexed structured data, the question is just how to get that into the LSearch
infrastructure.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Ryan Lane
On Thu, Mar 7, 2013 at 12:10 PM, Tyler Romeo  wrote:

> On Thu, Mar 7, 2013 at 3:05 PM, Antoine Musso  wrote:
>
> > We still have to figure out which account will be used, the URL, whether
> > we want a dedicated wiki etc...
> >
>
> Those discussions are unrelated to using OpenID as a client, though.
>
>
As I've mentioned before. I'm the one championing OpenID support on the
sites and I have no current plans on enabling OpenID as a consumer. Making
authentication changes is difficult. We're focusing on OpenID as a provider
and OAuth support right now, and that's way more than enough to try to do
this quarter.

- Ryan
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Chad
Those tags are arbitrary :(

-Chad
On Mar 7, 2013 12:09 PM, "Petr Bena"  wrote:

> ah ok I was confused by it being flagged stable
>
> On Thu, Mar 7, 2013 at 8:35 PM, Tyler Romeo  wrote:
> > On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena  wrote:
> >
> >> I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
> >>
> >>
> >> why we don't have it on production? :)
> >>
> >
> > Just last week there was a thread about this. Extension:OpenID is under
> > active development, but I think it could be ready for deployment in the
> > near future (if not right now).
> >
> > *--*
> > *Tyler Romeo*
> > Stevens Institute of Technology, Class of 2015
> > Major in Computer Science
> > www.whizkidztech.com | tylerro...@gmail.com
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Jeremy Baron
On Thu, Mar 7, 2013 at 8:06 PM, Matthew Flaschen
 wrote:
> Why would it be harmful for public wikis?  Anyone can do this on an
> article-by-article basis by copying the source their own MediaWiki
> instances.

That user would have to pick which articles to copy and test (or test them all).

The log doesn't contain (I guess?) all articles. Only slow articles.

-Jeremy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Tyler Romeo
On Thu, Mar 7, 2013 at 3:05 PM, Antoine Musso  wrote:

> We still have to figure out which account will be used, the URL, whether
> we want a dedicated wiki etc...
>

Those discussions are unrelated to using OpenID as a client, though.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Petr Bena
ah ok I was confused by it being flagged stable

On Thu, Mar 7, 2013 at 8:35 PM, Tyler Romeo  wrote:
> On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena  wrote:
>
>> I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
>>
>>
>> why we don't have it on production? :)
>>
>
> Just last week there was a thread about this. Extension:OpenID is under
> active development, but I think it could be ready for deployment in the
> near future (if not right now).
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | tylerro...@gmail.com
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Matthew Flaschen
On 03/07/2013 12:00 PM, Antoine Musso wrote:
> Le 06/03/13 23:58, Federico Leva (Nemo) a écrit :
>> There's slow-parse.log, but it's private unless a solution is found for
>> https://gerrit.wikimedia.org/r/#/c/49678/
>> https://wikitech.wikimedia.org/wiki/Logs
> 
> And slow-parse.log is probably going to be kept private unless proven it
> is not harmful =)

Why would it be harmful for public wikis?  Anyone can do this on an
article-by-article basis by copying the source their own MediaWiki
instances.

But it ends up being repeated work.

Matt

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Antoine Musso
Le 07/03/13 11:32, Petr Bena wrote:
> I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
> why we don't have it on production? :)

As far as I know, that extension is pending a full review before it
lands on the Wikimedia cluster.

Ryan Lane wrote about it:
 http://lists.wikimedia.org/pipermail/wikitech-l/2013-March/067124.html

There is a draft document at:
 https://www.mediawiki.org/wiki/OpenID_Provider

We still have to figure out which account will be used, the URL, whether
we want a dedicated wiki etc...


-- 
Antoine "hashar" Musso

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Antoine Musso
Le 06/03/13 22:05, Robert Rohde a écrit :
> On enwiki we've already made Lua conversions with most of the string
> templates, several formatting templates (e.g. {{rnd}}, {{precision}}),
> {{coord}}, and a number of others.  And there is work underway on a
> number of the more complex overhauls (e.g. {{cite}}, {{convert}}).
> However, it would be nice to identify problematic templates that may
> be less obvious.

You can get in touch with Brad Jorsch and Tim Starling. They most
probably have a list of templates that should quickly converted to LUA
modules.

If we got {{cite}} out, that will be already a nice improvement :-]

-- 
Antoine "hashar" Musso


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Bug 1542 - Log spam blacklist hits

2013-03-07 Thread anubhav agarwal
Hey Chris

I was exploring SpamBlaklist Extension. I have some doubts hope you could
clear them.

Is there any place I can get documentation of
Class SpamBlacklist in the file SpamBlacklist_body.php. ?

In function filter what does the following variables represent ?

$title
$text
$section
$editpage
$out

I have understood the following things from the code, please correct me if
I am wrong.
It extracts the edited text, and parse it to find the links. It then
replaces the links which match the whitelist regex, and then checks if
there are some links that match the blacklist regex.
If the check is greater you return the content matched. it already enters
in the debuglog if it finds a match

I guess the bug aims at creating a sql table.
I was thinking of the following fields to log.
Title, Text, User, URLs, IP. I don't understand why you denied it.


On Tue, Feb 26, 2013 at 1:25 AM, Chris Steipp  wrote:

> That's an ambitious first bug, Anubhav!
>
> Since this is an extension, it plugs into MediaWiki core using hooks.
> So periodically, the core code will run all of the functions
> registered for a particular hook, so the extensions can interact with
> the logic. In this case, SpamBlacklist has registered
> SpamBlacklistHooks::filterMerged to run whenever an editor attempts to
> save a page, or SpamBlacklistHooks::filterAPIEditBeforeSave if the
> edit came in through the api. So that is where you will want to log.
>
> Although MediaWiki has a logging feature, it sounds like you may want
> to add your own logging table (like the AbuseFilter extension). If you
> do that, make sure that you're only storing data that you really need,
> and is ok with our privacy policy (so no ip addresses!).
>
> Feel free to add me as a reviewer when you submit your code to gerrit.
>
> Chris
>
> On Mon, Feb 25, 2013 at 11:21 AM, Tyler Romeo 
> wrote:
> > Hey,
> >
> > I don't know much about that, or how much you know, but at the very
> least I
> > can tell you that the bug is in Extension:SpamBlacklist, which can be
> found
> > at http://www.mediawiki.org/wiki/Extension:SpamBlacklist. From what I
> can
> > see from the code, it seems to just use various Hooks in MediaWiki in
> order
> > to stop editing, e-mailing, etc. if the request matches a parsed
> blacklist
> > it has.
> >
> > *--*
> > *Tyler Romeo*
> > Stevens Institute of Technology, Class of 2015
> > Major in Computer Science
> > www.whizkidztech.com | tylerro...@gmail.com
> >
> >
> > On Mon, Feb 25, 2013 at 2:17 PM, anubhav agarwal  >wrote:
> >
> >> Hi Guys,
> >>
> >> I was trying to fix
> >> thisbug. I am a
> >> newbie to mediawiki and it's a first bug I'm trying to solve,
> >> so I don't know much.
> >> I want to know about the spam block list, how does it works, how does
> >> trigger the action, and its logging mechanism.
> >> It would be great if some one could help me fix this bug.
> >>
> >> Cheers,
> >> Anubhav
> >>
> >>
> >> Anubhav Agarwal| 4rth Year  | Computer Science & Engineering | IIT
> Roorkee
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Cheers,
Anubhav


Anubhav Agarwal| 4rth Year  | Computer Science & Engineering | IIT Roorkee
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Identifying pages that are slow to render

2013-03-07 Thread Antoine Musso
Le 06/03/13 23:58, Federico Leva (Nemo) a écrit :
> There's slow-parse.log, but it's private unless a solution is found for
> https://gerrit.wikimedia.org/r/#/c/49678/
> https://wikitech.wikimedia.org/wiki/Logs

And slow-parse.log is probably going to be kept private unless proven it
is not harmful =)

-- 
Antoine "hashar" Musso

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Brion Vibber
On Thu, Mar 7, 2013 at 11:45 AM, Daniel Kinzler  wrote:
> 1) create a specialized XML dump that contains the text generated by
> getTextForSearchIndex() instead of actual page content.

That probably makes the most sense; alternately, make a dump that
includes both "raw" data and "text for search". This also allows for
indexing extra stuff for files -- such as extracted text from a PDF of
DjVu or metadata from a JPEG -- if the dump process etc can produce
appropriate indexable data.

> However, that only works
> if the dump is created using the PHP dumper. How are the regular dumps 
> currently
> generated on WMF infrastructure? Also, would be be feasible to make an extra
> dump just for LuceneSearch (at least for wikidata.org)?

The dumps are indeed created via MediaWiki. I think Ariel or someone
can comment with more detail on how it currently runs, it's been a
while since I was in the thick of it.

> 2) We could re-implement the ContentHandler facility in Java, and require
> extensions that define their own content types to provide a Java based handler
> in addition to the PHP one. That seems like a pretty massive undertaking of
> dubious value. But it would allow maximum control over what is indexed how.

No don't do it :)

> 3) The indexer code (without plugins) should not know about Wikibase, but it 
> may
> have hard coded knowledge about JSON. It could have a special indexing mode 
> for
> JSON, in which the structure is deserialized and traversed, and any values are
> added to the index (while the keys used in the structure would be ignored). We
> may still be indexing useless interna from the JSON, but at least there would 
> be
> a lot fewer false negatives.

Indexing structured data could be awesome -- again I think of file
metadata as well as wikidata-style stuff. But I'm not sure how easy
that'll be. Should probably be in addition to the text indexing,
rather than replacing.


-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Indexing non-text content in LuceneSearch

2013-03-07 Thread Daniel Kinzler
Hi all!

I would like to ask for you input on the question how non-wikitext content can
be indexed by LuceneSearch.

Background is the fact that full text search (Special:Search) is nearly useless
on wikidata.org at the moment, see
.

The reason for the problem appears to be that when rebuilding a Lucene index
from scratch, using an XML dump of wikidata.org, the raw JSON structure used by
Wikibase gets indexed. The indexer is blind, it just takes whatever "text" it
finds in the dump. Indexing JSON does not work at all for fulltext search,
especially not when non-ascii characters are represented as unicode escape
sequences.

Inside MediaWiki, in PHP, this work like this:

* wikidata.org (or rather, the Wikibase extension) stores non-text content in
wiki pages, using a ContentHandler that manages a JSON structure.
* Wikibase's EntityContent class implements Content::getTextForSearchIndex() so
it returns the labels and aliases of an entity. Data items thus get indexed by
their labels and aliases.
* getTextForSearchIndex() is used by the default MySQL search to build an index.
It's also (ab)used by things that can only operate on flat text, like the
AbuseFilter extension.
* The LuceneSearch index gets updated live using the OAI extension, which in
turn knows to use getTextForSearchIndex() to get the text for indexing.

So, for anything indexed live, this works, but for rebuilding the search index
from a dump, it doesn't - because the Java indexer knows nothing about content
types, and has no interface for an extension to register additional content 
types.


To improve this, I can think of a few options:

1) create a specialized XML dump that contains the text generated by
getTextForSearchIndex() instead of actual page content. However, that only works
if the dump is created using the PHP dumper. How are the regular dumps currently
generated on WMF infrastructure? Also, would be be feasible to make an extra
dump just for LuceneSearch (at least for wikidata.org)?

2) We could re-implement the ContentHandler facility in Java, and require
extensions that define their own content types to provide a Java based handler
in addition to the PHP one. That seems like a pretty massive undertaking of
dubious value. But it would allow maximum control over what is indexed how.

3) The indexer code (without plugins) should not know about Wikibase, but it may
have hard coded knowledge about JSON. It could have a special indexing mode for
JSON, in which the structure is deserialized and traversed, and any values are
added to the index (while the keys used in the structure would be ignored). We
may still be indexing useless interna from the JSON, but at least there would be
a lot fewer false negatives.


I personally would prefer 1) if dumps are created with PHP, and 3) otherwise. 2)
looks nice, but is hard to keep the Java and the PHP version from diverging.

So, how would you fix this?

thanks
daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread MZMcBride
Andreas Nüßlein wrote:
>so I need to set up a local instance of the dewiki- and enwiki-DB with all
>revisions.. :-D
>
>I know it's rather a mammoth project so I was wondering if somebody could
>give me some pointers?
>
>First of all, I would need to know what kind of hardware I should get. Is
>it possible/smart to have it all in two ginormous MySQL-Instance (one for
>each of the languages) or will I need to do sharding?
>
>I don't need it to run smoothly. I only need to be able to query the
>database (and I know some of these queries can run for days)
>
>I will probably have access to some rather powerful machines here at the
>university and I have also quite a few workstation-machines on which I
>could theoretically do the sharding.

Ryan L. or Marc P.: I routed Andreas to this list (from
#wikimedia-toolserver), as I figured these questions related to the work
that you all have been doing for Wikimedia Labs. Or at least I figured you
all probably had some kind of formula for hardware provisioning that might
be reusable here. Any pointers would be great. :-)

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Tyler Romeo
On Thu, Mar 7, 2013 at 2:32 PM, Petr Bena  wrote:

> I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
>
>
> why we don't have it on production? :)
>

Just last week there was a thread about this. Extension:OpenID is under
active development, but I think it could be ready for deployment in the
near future (if not right now).

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | tylerro...@gmail.com
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Petr Bena
I just discovered this: http://www.mediawiki.org/wiki/Extension:OpenID
why we don't have it on production? :)

On Thu, Mar 7, 2013 at 8:30 PM, Petr Bena  wrote:
> Hi,
>
> we discussed OAuth many times... but - what's the current status?
>
> Do we have working extensions which support using OpenID in order to
> login to mediawiki, or OAuth? So that you can login using your google
> account or such? I believe that WMF is working on this, so can we have
> some update?
>
> I know that english wikipedia community hates facebook and basically
> anything new :P but if not wikipedia at least many small wikis could
> use it.
>
> Thanks

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Editing wikipedia using google, openID or facebook

2013-03-07 Thread Petr Bena
Hi,

we discussed OAuth many times... but - what's the current status?

Do we have working extensions which support using OpenID in order to
login to mediawiki, or OAuth? So that you can login using your google
account or such? I believe that WMF is working on this, so can we have
some update?

I know that english wikipedia community hates facebook and basically
anything new :P but if not wikipedia at least many small wikis could
use it.

Thanks

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Replicating enwiki and dewiki for research purposes

2013-03-07 Thread Andreas Nüßlein
Hey Quim, hey Maria,

thank you for your replies!
I actually knew where to find the XML-dumps but that pointer about the new
XML-import tools is really helpful.


So eventually, I was able to acquire a Xeon 8 core, 32GB RAM, 6TB SAS  to
start my experiments on :)
Let's see what this baby can do * http://i.imgur.com/J47GJ.gif *

Thanks again
Andreas



On Tue, Mar 5, 2013 at 3:33 PM, Maria Miteva wrote:

> Hi,
>
> You might also try the following mailing list:
> * XML Data Dumps mailing
> list
>  *
>
> Here is some info on importing XML dumps ( not sure what tools work well
> but probably the mailing list can help with that)
> http://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing
>
> Also, Ariel Glenn recently announced two new tools for importing dumps on
> the XML list:
>
> http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-February/000701.html
>
> Mariya
>
>
>
> On Tue, Mar 5, 2013 at 4:15 PM, Quim Gil  wrote:
>
> > On 03/05/2013 02:54 AM, Andreas Nüßlein wrote:
> >
> >> Hi list,
> >>
> >> so I need to set up a local instance of the dewiki- and enwiki-DB with
> all
> >> revisions.. :-D
> >>
> >
> > Just in case:
> > http://meta.wikimedia.org/**wiki/Mirroring_Wikimedia_**project_XML_dumps
> 
> >
> > Also, you might want to ask / discuss at
> >
> > https://lists.wikimedia.org/**mailman/listinfo/offline-l<
> https://lists.wikimedia.org/mailman/listinfo/offline-l>
> >
> > Good luck with this interesting project!
> >
> > --
> > Quim Gil
> > Technical Contributor Coordinator @ Wikimedia Foundation
> > http://www.mediawiki.org/wiki/**User:Qgil<
> http://www.mediawiki.org/wiki/User:Qgil>
> >
> >
> > __**_
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Luke Welling WMF
The advice on
https://wikitech.wikimedia.org/wiki/Query_profiling_for_features_developers
sounds
good.

Is there more detail somewhere on how to do this part "Test your query
against production slaves prior to full deployment"?

Luke


On Wed, Mar 6, 2013 at 8:14 PM, Matthew Flaschen wrote:

> On 03/06/2013 04:36 PM, Sumana Harihareswara wrote:
> > If you want your code merged, you need to keep your database queries
> > efficient.  How can you tell if a query is inefficient? How do you write
> > efficient queries, and avoid inefficient ones?  We have some resources
> > around:
> >
> > Roan Kattouw's
> >
> https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial
> > -- slides at
> >
> https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf
> >
> > Asher Feldman's
> > https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv
> > -- slides at
> https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf
>
> And
> https://wikitech.wikimedia.org/wiki/Query_profiling_for_features_developers
>
> Matt Flaschen
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Query performance - run code faster, merge code faster :-)

2013-03-07 Thread Nischay Nahata
I found EXPLAIN (http://dev.mysql.com/doc/refman/5.0/en/using-explain.html)
pretty useful during my project; rather than theories it shows
the exact way the query is being resolved and if the indexes are being used
rightly.

On Thu, Mar 7, 2013 at 6:06 AM, Sumana Harihareswara
wrote:

> If you want your code merged, you need to keep your database queries
> efficient.  How can you tell if a query is inefficient? How do you write
> efficient queries, and avoid inefficient ones?  We have some resources
> around:
>
> Roan Kattouw's
>
> https://www.mediawiki.org/wiki/Manual:Database_layout/MySQL_Optimization/Tutorial
> -- slides at
> https://commons.wikimedia.org/wiki/File:MediaWikiPerformanceProfiling.pdf
>
> Asher Feldman's
> https://www.mediawiki.org/wiki/File:MediaWiki_Performance_Profiling.ogv
> -- slides at https://www.mediawiki.org/wiki/File:SQL_indexing_Tutorial.pdf
>
> More hints:
> http://lists.wikimedia.org/pipermail/toolserver-l/2012-June/005075.html
>
> When you need to ask for a performance review, you can check out
> https://www.mediawiki.org/wiki/Developers/Maintainers#Other_Areas_of_Focus
> which suggests Tim Starling, Asher Feldman, and Ori Livneh.  I also
> BOLDly suggest Nischay Nahata, who worked on Semantic MediaWiki's
> performance for his GSoC project in 2012.
>
> --
> Sumana Harihareswara
> Engineering Community Manager
> Wikimedia Foundation
>



-- 
Cheers,

Nischay Nahata
nischayn22.in
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Indexing structures for Wikidata

2013-03-07 Thread Denny Vrandečić
As you probably know, the search in Wikidata sucks big time.

Until we have created a proper Solr-based search and deployed on that
infrastructure, we would like to implement and set up a reasonable stopgap
solution.

The simplest and most obvious signal for sorting the items would be to
1) make a prefix search
2) weight all results by the number of Wikipedias it links to

This should usually provide the item you are looking for. Currently, the
search order is random. Good luck with finding items like California,
Wellington, or Berlin.

Now, what I want to ask is, what would be the appropriate index structure
for that table. The data is saved in the wb_terms table, which would need
to be extended by a "weight" field. There is already a suggestion (based on
discussions between Tim and Daniel K if I understood correctly) to change
the wb_terms table index structure (see here <
https://bugzilla.wikimedia.org/show_bug.cgi?id=45529> ), but since we are
changing the index structure anyway it would be great to get it right this
time.

Anyone who can jump in? (Looking especially at Asher and Tim)

Any help would be appreciated.

Cheers,
Denny

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] IRC office hour on Tue March 19th, 1700 UTC, about Bug management

2013-03-07 Thread Andre Klapper
Hi everybody,

on Tuesday 19th 17:00 UTC[1], there will be an IRC Office Hour in
#wikimedia-office about Wikimedia's issue tracker[2] and Bug
management[3]. 

Add it to your calendar and come to ask how to better find information
in Bugzilla that interests you, and to share ideas and criticism how to
make Bugzilla better.

andre

[1] https://meta.wikimedia.org/wiki/IRC_office_hours
[2] https://bugzilla.wikimedia.org
[3] https://www.mediawiki.org/wiki/Bug_management
-- 
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How do MS SQL users install MediaWiki?

2013-03-07 Thread Marcin Cieslak
>> Mark A. Hershberger  wrote:
> On 03/04/2013 01:34 AM, Chad wrote:
>> However, we do
>> have people who want/use MSSQL, so I think taking the effort to
>> keep it working is worthwhile--if someone's willing to commit.
>
> Since Danny Bauch has been using MSSQL and modifying MW for his needs,
> I'll work with him to get the necessary changes committed.
>
> Danny, if you could commit your changes into Gerrit, I'd be happy to
> test them.

I'll be happy to come back to my PostgreSQL work and I'd happy to
talk to other RDBMs people to coordinate some stuff (like getting
unit tests to work or getting some abstractions right - transactions,
schema management etc.).

//Saper


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Seemingly proprietary Javascript

2013-03-07 Thread Platonides
On 06/03/13 16:28, Jay Ashworth wrote:
>>> To “convey” a work means any kind of propagation that enables other
>>> parties to make or receive copies. Mere interaction with a user
>>> through a computer network, with no transfer of a copy, is not
>>> conveying.
>>
>> As javascript is executed in the client, it probably is.
> 
> Perhaps.  But HTML is also "executed" in the client, and some legal
> decisions have gone each way on whether the mere viewing of a page 
> constitutes "copying" in violation of copyright (the trend is towards
> "no", thankfully. :-)
> 
> Cheers,
> -- jra

Interesting. Although HTML is presentational, while js is executable.

I wouldn't consider most of our javascript as "significant" -even though
we have plenty of usages considered non-trivial by [1]- since it is
highly based on MediaWiki classes and ids. However, we also have some
"big javascript programs" (WikiEditor, VisualEditor...)

@Alexander: I would consider something like
>  src="//bits.wikimedia.org/www.mediawiki.org/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.triggerQueueCallback%2CloadingSpinner%2CmwEmbedUtil%7Cmw.MwEmbedSupport&only=scripts&skin=vector&version=20130304T183632Z"
>  
> license="//bits.wikimedia.org/www.mediawiki.org/load.php?debug=false&lang=en&modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.triggerQueueCallback%2CloadingSpinner%2CmwEmbedUtil%7Cmw.MwEmbedSupport&only=scripts&skin=vector&version=20130304T183632Z&mode=license">

with license attribute pointing to a JavaScript License Web Labels page
for that script (yes, that would have to go up to whatwg).

Another, easier, option would be that LibreJS detected the "debug=false"
in the url and changed it to debug=true, expecting to find the license
information there.
It's also a natural change for people intending to reuse such
javascript, even if they were unaware of such convention.

@Chad: We use free licenses since we care about the freedom of our cde
to be reused, but if the license is not appropiate to what we really
intend, or even worse, is placing such a burden that even us aren't
properly presenting them, it's something very discussion worthy.
Up to the point where we could end up relicensing the code to better
reflect our intention, as it was done from GFDL to CC-BY-SA with
wikipedia content.


1- http://www.gnu.org/philosophy/javascript-trap.html


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l