FYI, these new models are now live.  We'll be running some maintenance
scripts for the ORES review tool to update the recent historic scores.
Otherwise, you should expect to see ORES producing scores with updated
model version numbers.

On Sat, Aug 20, 2016 at 1:39 AM, Amir Ladsgroup <[email protected]> wrote:

> One small note: This will cause the ORES review tool to invalidate it's db
> cache. So we will probably need to run some maintenance scripts here and
> there. You might feel a few bumps in the tool in Wikipedia. We will let you
> know beforehand :)
>
> Best
>
> On Sat, Aug 20, 2016 at 3:10 AM Aaron Halfaker <[email protected]>
> wrote:
>
>> Hey folks,
>>
>> We've been working on generating some updated models for ORES.  These
>> models will behave slightly differently from the models that we currently
>> have deployed.  This is a natural artifact of retraining the models on the
>> *exact same data* again because of some random properties of the learning
>> algorithms.  So, for the most part, this should be a non-issue for any
>> tools that use ORES.  However, I wanted to take this opportunity to
>> highlight some of the facilities ORES provides to help automatically detect
>> and adjust for these types of changes.
>>
>> *== Versions ==*
>> ORES provides information about all of the models.  This information
>> includes a model version number.  If you are caching ORES scores locally,
>> we recommend invalidating old scores whenever this model number changes.
>> For example, https://ores.wikimedia.org/v2/scores/
>> enwiki/damaging/12345678 currently returns
>>
>> {
>>   "scores": {
>>     "enwiki": {
>>       "damaging": {
>>         "scores": {
>>           "12345678": {
>>             "prediction": false,
>>             "probability": {
>>               "false": 0.7141333465390294,
>>               "true": 0.28586665346097057
>>             }
>>           }
>>         },
>>         "version": "0.1.1"
>>       }
>>     }
>>   }
>> }
>>
>> This score was generated with the "0.1.1" version of the model.  But once
>> we deploy the new models, the same request will return:
>> {
>>   "scores": {
>>     "enwiki": {
>>       "damaging": {
>>         "scores": {
>>           "12345678": {
>>             "prediction": false,
>>             "probability": {
>>               "false": 0.8204647324045306,
>>               "true": 0.17953526759546945
>>             }
>>           }
>>         },
>>         "version": "0.1.2"
>>       }
>>     }
>>   }
>> }
>>
>> Note that the version number changes to "0.1.2" and the probabilities
>> change slightly.  In this case, we're essentially re-training the same
>> model in a similar way, so we increment the "patch" number.
>>
>> However, we're switching modeling strategies for the article quality
>> models (enwiki-wp10, frwiki-wp10 & ruwiki-wp10), so those versions
>> increment the minor version from "0.3.2" to "0.4.0".  You may see more
>> substantial changes in prediction probabilities with those models, but a
>> quick spot-checking suggests that the changes are not substantial.
>>
>> *== Test statistics and threshholding ==*
>> So, many tools that use our edit quality models (reverted, damaging and
>> goodfaith) will set threshholds for flagging edits for review.  In order to
>> support these tools, we produce test statistics that suggest useful
>> thresholds.
>>
>> https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info=test_stats
>> produces:
>>
>>       ...
>>             "filter_rate_at_recall(min_recall=0.75)": {
>>               "filter_rate": 0.869,
>>               "recall": 0.752,
>>               "threshold": 0.492
>>             },
>>             "filter_rate_at_recall(min_recall=0.9)": {
>>               "filter_rate": 0.753,
>>               "recall": 0.902,
>>               "threshold": 0.173
>>             },
>>       ...
>>
>> These two statistics show useful thresholds for detecting damaging
>> edits.  E.g. if you want to be sure that you catch nearly all vandalism
>> (and are OK with a higher false-positive rate), set the threshold at 0.173,
>> but if you'd like to catch most vandalism with almost no false-positives,
>> set the threshold at 0.492.  These fields can be read automatically by
>> tools so that they do not need to be manually updated every time that we
>> deploy a new model.
>>
>> Let me know if you have any questions and happy hacking!
>>
>> -Aaron
>> _______________________________________________
>> AI mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/ai
>>
>
> _______________________________________________
> AI mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/ai
>
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to