See https://phabricator.wikimedia.org/T347344
On Mon, Sep 25, 2023 at 12:26 PM Aaron Halfaker <[email protected]> wrote: > It looks like user-scripts running on Wikipedia can no longer use ORES. > I'm getting a CORS error. You can test this by trying to run the > following the JS dev console on a Wikimedia page: > > $.ajax({url: "https://ores.wikimedia.org/v3/scores/ > "}).done(function(response){console.log(response)}) > > This is what I see: > > Access to XMLHttpRequest at 'https://ores.wikimedia.org/v3/scores/' > from origin 'https://en.wikipedia.org' has been blocked by CORS policy: > No 'Access-Control-Allow-Origin' header is present on the requested > resource. > > GET https://ores.wikimedia.org/v3/scores/ net::ERR_FAILED 307 > > I'll file a bug, but I thought elevating this to the migration thread was > a good idea. > > On Mon, Sep 25, 2023 at 7:37 AM Chris Albon <[email protected]> wrote: > >> Hey SJ! >> >> > Is there a reason to think that separate models for each wiki are more >> effective than one general model that sees the name of the wiki as part of >> its context? >> >> Intuitively one model per wiki has a lot of merit. The training data >> comes from the community that is impacted by the model, etc. However, there >> are scale and equity issues we wrestled with. One lesson we have learned >> training ~300+ models for the Add-A-Link >> <https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_a_link> >> project is that if we continued down that path, Lift Wing would eventually >> be hosting 3000+ models (i.e. 330 models per new feature) pretty quickly >> and overwhelm any ability of our small team to maintain, quality control, >> support, and improve them over their lifespan. Regarding equity, even with >> a multi-year effort the one model per wiki RevScoring models only covered >> ~33 out of 330 wikis. The communities we didn't reach didn't get the >> benefit of those models. But with language agnostic models we can make that >> model available to all communities. For example, the language agnostic >> revert risk model will likely be the model selected for the automoderator >> project <https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator>, >> which means hundreds more wikis get access to the tool compared to a one >> model per wiki approach. >> >> > I'd love to read more about the cost of training and updating current >> models, how much material they are trained on, and how others w/ their own >> GPUs can contribute to updates. >> >> The training data information should be available in the model cards >> <https://meta.wikimedia.org/wiki/Machine_learning_models>. If it isn't, >> let me know so we can change it. Regarding GPUs and contributions, we are >> still working on what a good training environment will be. Our initial idea >> was a kubeflow-based cluster we called Train Wing, but we've had to put >> them on hold for resource reasons (i.e. we couldn't build Lift Wing, >> deprecate ORES, and build Train Wing all at the same time). More on that >> soon after the Research-ML offsite when we'll have those conversations. >> >> All this said, one thing we do want to support is hosting community >> created models. So, if a community has a model they want to host, we can >> load it into Lift Wing and host it for them at scale. We have a lot of >> details to work out (e.g. community consensus, human rights review as part >> of being a Very Large Online Platform etc.) as to what that would look >> like, but that is the goal. >> >> Chris >> >> On Fri, Sep 22, 2023 at 5:42 PM Samuel Klein <[email protected]> wrote: >> >>> Luca writes: >>> >>> > Managing several hundreds models for goodfaith and damaging is not >>> very scalable in a modern micro-service architecture like Lift Wing >>> > (since we have a model for each supported wiki). We (both Research >>> and ML) are oriented on having fewer models that manage more languages at >>> the same time, >>> >>> Is there a reason to think that separate models for each wiki are more >>> effective than one general model that sees the name of the wiki as part of >>> its context? >>> I'd love to read more about the cost of training and updating current >>> models, how much material they are trained on, and how others w/ their own >>> GPUs can contribute to updates. >>> >>> Personally I wouldn't mind a single model that can suggest multiple >>> properties of an edit, including goodfaith, damaging, and likelihood of >>> reversion. They are different if related concepts -- the first deals with >>> the intent and predicted further editing history of the editor, the second >>> with article accuracy and quality, and the latter with the size + >>> activity + norms of the other editors... >>> >>> SJ >>> >>> >>> >>> >>> On Fri, Sep 22, 2023 at 5:34 PM Aaron Halfaker <[email protected]> >>> wrote: >>> >>>> All fine points. As you can see, I've filed some phab tasks where I >>>> saw a clear opportunity to do so. >>>> >>>> > as mentioned before all the models that currently run on ORES are >>>> available in both ores-legacy and Lift Wing. >>>> >>>> I thought I read that damaging and goodfaith models are going to be >>>> replaced. Should I instead read that they are likely to remain available >>>> for the foreseeable future? When I asked about a community discussion >>>> about the transition from damaging/goodfaith to revertrisk, I was imagining >>>> that many people who use those predictions might have an opinion about them >>>> going away. E.g. people who use the relevant filters in RecentChanges. >>>> Maybe I missed the discussions about that. >>>> >>>> I haven't seen a mention of the article quality or article topic models >>>> in the docs. Are those also going to remain available? I have some user >>>> scripts that use these models and are relatively widely used. I didn't >>>> notice anyone reaching out. ... So I checked and setting a User-Agent on my >>>> user scripts doesn't actually change the User-Agent. I've read that you >>>> need to set "Api-User-Agent" instead, but that causes a CORS error when >>>> querying ORES. I'll file a bug. >>>> >>>> On Fri, Sep 22, 2023 at 1:22 PM Luca Toscano <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Sep 22, 2023 at 8:59 PM Aaron Halfaker < >>>>> [email protected]> wrote: >>>>> >>>>>> We could definitely file a task. However, it does seem like >>>>>> highlighting the features that will no longer be available is an >>>>>> appropriate topic for a discussion about migration in a technical mailing >>>>>> list. >>>>>> >>>>> >>>>> A specific question related to a functionality is the topic for a >>>>> task, I don't think that we should discuss every detail that differs from >>>>> the ORES API (Wikitech-l doesn't seem a good medium for it). We are >>>>> already >>>>> following up on Phabricator, let's use tasks if possible to keep the >>>>> conversation as light and targeted as possible. >>>>> >>>>> Is there a good reference for which features have been excluded from >>>>>> ores-legacy? It looks like https://wikitech.wikimedia.org/wiki/ORES >>>>>> covers >>>>>> some of the excluded features/models, but not all of them. >>>>>> >>>>> >>>>> We spent the last months helping the community to migrate away from >>>>> the ORES API (to use Lift Wing instead), the remaining traffic is only >>>>> related to few low traffic IPs that we are not able to contact. We didn't >>>>> add feature injection or threshold optimization to ores-legacy, for >>>>> example, since there was no indication on our logs that users were relying >>>>> on it. We have always stated everywhere (including all emails sent in this >>>>> mailing list) that we are 100% open to add a functionality if it is backed >>>>> up by a valid use case. >>>>> >>>>> >>>>>> I see now that it looks like the RevertRisk model will be replacing >>>>>> the *damaging *and *goodfaith *models that differentiate intentional >>>>>> damage from unintentional damage. There's a large body of research on >>>>>> why >>>>>> this is valuable and important to the social functioning of the wikis. >>>>>> This literature also discusses why being reverted is not a very good >>>>>> signal >>>>>> for damage/vandalism and can lead to problems when used as a signal for >>>>>> patrolling. Was there a community discussion about this deprecation >>>>>> that I >>>>>> missed? I have some preliminary results (in press) that demonstrate that >>>>>> the RevertRisk model performs significantly worse than the damaging and >>>>>> goodfaith models in English Wikipedia for patrolling work. Do you have >>>>>> documentation for how you evaluated this model and compared it to >>>>>> damaging/goodfaith? >>>>>> >>>>> >>>>> We have model cards related to both Revert Risk models, all of them >>>>> linked in the API portal docs (more info: >>>>> https://api.wikimedia.org/wiki/Lift_Wing_API). All the community >>>>> folks that migrated their bots/tools/etc.. to Revert Risk were very happy >>>>> about the change, and we haven't had any request to switch back since >>>>> then. >>>>> >>>>> The ML team provides all the models deployed on ORES on Lift Wing, so >>>>> any damaging and goodfaith variant is available in the new API. We chose >>>>> to >>>>> not pursue the development of those models for several reasons: >>>>> - We haven't had any indication/request from the community about those >>>>> models in almost two years, except few Phabricator updates that we >>>>> followed >>>>> up on. >>>>> - Managing several hundreds models for goodfaith and damaging is not >>>>> very scalable in a modern micro-service architecture like Lift Wing (since >>>>> we have a model for each supported wiki). We (both Research and ML) are >>>>> oriented on having fewer models that manage more languages at the same >>>>> time, and this is the direction that we are following at the moment. It >>>>> may >>>>> not be the perfect one but so far it seems a good choice. If you want to >>>>> chime in and provide your inputs we are 100% available in hearing >>>>> suggestions/concerns/doubts/recommendations/etc.., please follow up in any >>>>> of our channels (IRC, mailing lists, Phabricator for example). >>>>> - Last but not the least, most of the damaging/goodfaith models have >>>>> been trained with data coming from years ago, and never re-trained. The >>>>> efforts to keep several hundreds models up-to-date with recent data versus >>>>> doing the same of few models (like revert risk) weights in favor of the >>>>> latter for a relatively small team of engineers like us. >>>>> >>>>> >>>>>> FWIW, from my reading of these announcement threads, I believed that >>>>>> generally functionality and models would be preserved in >>>>>> ores-legacy/LiftWing. This is the first time I've realized the scale of >>>>>> what will become unavailable. >>>>>> >>>>> >>>>> This is the part that I don't get, since as mentioned before all the >>>>> models that currently run on ORES are available in both ores-legacy and >>>>> Lift Wing. What changes is that we don't expose anymore functionality that >>>>> logs clearly show are not used, and that would need to be maintained and >>>>> improved over time. We are open to improve and add any requirement that >>>>> the >>>>> community needs, the only thing that we ask is to provide a valid use case >>>>> to support it. >>>>> >>>>> I do think that Lift Wing is a great improvement for the community, we >>>>> have been working with all the folks that reached out to us, without >>>>> hiding >>>>> anything (including deprecation plans and path forwards). >>>>> >>>>> Thanks for following up! >>>>> >>>>> Luca >>>>> _______________________________________________ >>>>> Wikitech-l mailing list -- [email protected] >>>>> To unsubscribe send an email to [email protected] >>>>> >>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >>>> >>>> _______________________________________________ >>>> Wikitech-l mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> >>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >>> >>> >>> >>> -- >>> Samuel Klein @metasj w:user:sj +1 617 529 >>> 4266 >>> _______________________________________________ >>> Wikitech-l mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> >>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >> >> _______________________________________________ >> Wikitech-l mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> >> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ > >
_______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
