Ultimately we need a system that integrates information from multiple sources, such as WikiTrust, AbuseFilter and the Wikipedia Editorial Team.
A general point - there is a *lot* of information contained in edits that AbuseFilter cannot practically characterize due to the complexity of language and the subtelty of certain types of abuse. A system with access to natural language features (and wikitext features) could theoretically detect them. My quality research group considered including features relating to the [[Thematic relation]]s found in an article (we have access to a thematic role parser) which could potentially be used to detect bad writing - indicative of the edit containing vandalism. On Thu, Mar 19, 2009 at 3:17 PM, Delirium <[email protected]> wrote: > But if your training data is > the output of the previous rule set, you aren't going to be able to > *improve* on its performance without some additional information (or > built-in inductive bias). > > -Mark > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
