Ultimately we need a system that integrates information from multiple
sources, such as WikiTrust, AbuseFilter and the Wikipedia Editorial
Team.

A general point - there is a *lot* of information contained in edits
that AbuseFilter cannot practically characterize due to the complexity
of language and the subtelty of certain types of abuse. A system with
access to natural language features  (and wikitext features) could
theoretically detect them. My quality research group considered
including features relating to the [[Thematic relation]]s found in an
article (we have access to a thematic role parser) which could
potentially be used to detect bad writing - indicative of the edit
containing vandalism.

On Thu, Mar 19, 2009 at 3:17 PM, Delirium <[email protected]> wrote:
> But if your training data is
> the output of the previous rule set, you aren't going to be able to
> *improve* on its performance without some additional information (or
> built-in inductive bias).
>
> -Mark
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to