Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

Delirium Thu, 19 Mar 2009 05:04:56 -0700

Brian wrote:
> This extension is very important for training  machine learning
> vandalism detection bots. Recently published systems use only hundreds
> of examples of vandalism in training - not nearly enough to
> distinguish between the variety found in Wikipedia or generalize to
> new, unseen forms of vandalism. A large set of human created rules
> could be run against all previous edits in order to create a massive
> vandalism dataset.
As a machine-learning person, this seems like a somewhat problematic 
idea--- generating training examples *from a rule set* and then learning 
on them is just a very roundabout way of reconstructing that rule set. 
What you really want is a large dataset of human-labeled examples of 
vandalism / non-vandalism that *can't* currently be distinguished 
reliably by rules, so you can throw a machine-learning algorithm at the 
problem of trying to come up with some.


-Mark


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

Reply via email to