On Thu, Mar 7, 2013 at 1:34 PM, Platonides <platoni...@gmail.com> wrote:
> On 07/03/13 21:03, anubhav agarwal wrote:
>> Hey Chris
>>
>> I was exploring SpamBlaklist Extension. I have some doubts hope you could
>> clear them.
>>
>> Is there any place I can get documentation of
>> Class SpamBlacklist in the file SpamBlacklist_body.php. ?

There really isn't any documentation besides the code, but a couple
more things you should look at. Notice that in SpamBlacklist.php,
there is the line "$wgHooks['EditFilterMerged'][] =
'SpamBlacklistHooks::filterMerged';", which is the way that
SpamBlacklist registers itself with MediaWiki core to filter edits. So
when MediaWiki core runs the EditFilterMerged hooks (which it does in
includes/EditPage.php, line 1287), all of the extensions that have
registered a function for that hook are run with the passed in
arguments, so SpamBlacklistHooks::filterMerged is run. And
SpamBlacklistHooks::filterMerged then just sets up and calls
SpamBlacklist::filter. So that is where you can start tracing what is
actually in the variables, in case Platonides summary wasn't enough.


>>
>> In function filter what does the following variables represent ?
>>
>> $title
> Title object (includes/Title.php) This is the page where it tried to save.
>
>> $text
> Text being saved in the page/section
>
>> $section
> Name of the section or ''
>
>> $editpage
> EditPage object if EditFilterMerged was called, null otherwise
>
>> $out
>
> A ParserOutput class (actually, this variable name was a bad choice, it
> looks like a OutputPage), see includes/parser/ParserOutput.php
>
>
>> I have understood the following things from the code, please correct me if
>> I am wrong. It extracts the edited text, and parse it to find the links.
>
> Actually, it uses the fact that the parser will have processed the
> links, so in most cases just obtains that information.
>
>
>> It then replaces the links which match the whitelist regex,
> This doesn't make sense as you explain it. It builds a list of links,
> and replaces whitelisted ones with '', ie. removes whitelisted links
> from the list.
>
>> and then checks if there are some links that match the blacklist regex.
> Yes
>
>> If the check is greater you return the content matched.
>
> Right, $check will be non-0 if the links matched the blacklist.
>
>> it already enters in the debuglog if it finds a match
>
> Yes, but that is a private log.
> Bug 1542 talks about making that accesible in the wiki.

Yep. For example, see
* https://en.wikipedia.org/wiki/Special:Log
* https://en.wikipedia.org/wiki/Special:AbuseLog

>
>
>> I guess the bug aims at creating a sql table.
>> I was thinking of the following fields to log.
>> Title, Text, User, URLs, IP. I don't understand why you denied it.
>
> Because we don't like to publish the IPs *in the wiki*.

The WMF privacy policy also discourages us from keeping IP addresses
longer than 90 days, so if you do keep IPs, then you need a way to
hide / purge them, and if they allow someone to see what IP address a
particular username was using, then only users with checkuser
permissions are allowed to see that. So it would be easier for you not
to include it, but if it's desired, then you'll just have to build
those protections out too.

>
> I think the approach should be to log matches using abusefilter
> extension if that one is loaded.

The abusefilter log format has a lot of data in it specific to
AbuseFilter, and is used to re-test abuse filters, so adding these
hits into that log might cause some issues. I think either the general
log, or using a separate, new log table would be best. Just for some
numbers, in the first 7 days of this month, we've had an average of
27,000 hits each day. So if this goes into an existing log, it's going
to generate a significant amount of data.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to