[Wikitech-l] Re: ORES To Lift Wing Migration

Aaron Halfaker Mon, 25 Sep 2023 12:28:07 -0700

It looks like user-scripts running on Wikipedia can no longer use ORES.
I'm getting a CORS error.   You can test this by trying to run the
following the JS dev console on a Wikimedia page:
> $.ajax({url: "https://ores.wikimedia.org/v3/scores/
"}).done(function(response){console.log(response)})


This is what I see:
> Access to XMLHttpRequest at 'https://ores.wikimedia.org/v3/scores/' from
origin 'https://en.wikipedia.org' has been blocked by CORS policy: No
'Access-Control-Allow-Origin' header is present on the requested resource.
> GET https://ores.wikimedia.org/v3/scores/ net::ERR_FAILED 307

I'll file a bug, but I thought elevating this to the migration thread was a
good idea.

On Mon, Sep 25, 2023 at 7:37 AM Chris Albon <[email protected]> wrote:

> Hey SJ!
>
> > Is there a reason to think that separate models for each wiki are more
> effective than one general model that sees the name of the wiki as part of
> its context?
>
> Intuitively one model per wiki has a lot of merit. The training data comes
> from the community that is impacted by the model, etc. However, there are
> scale and equity issues we wrestled with. One lesson we have learned
> training ~300+ models for the Add-A-Link
> <https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Structured_tasks/Add_a_link>
> project is that if we continued down that path, Lift Wing would eventually
> be hosting 3000+ models (i.e. 330 models per new feature) pretty quickly
> and overwhelm any ability of our small team to maintain, quality control,
> support, and improve them over their lifespan. Regarding equity, even with
> a multi-year effort the one model per wiki RevScoring models only covered
> ~33 out of 330 wikis. The communities we didn't reach didn't get the
> benefit of those models. But with language agnostic models we can make that
> model available to all communities. For example, the language agnostic
> revert risk model will likely be the model selected for the automoderator
> project <https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator>,
> which means hundreds more wikis get access to the tool compared to a one
> model per wiki approach.
>
> > I'd love to read more about the cost of training and updating current
> models, how much material they are trained on, and how others w/ their own
> GPUs can contribute to updates.
>
> The training data information should be available in the model cards
> <https://meta.wikimedia.org/wiki/Machine_learning_models>. If it isn't,
> let me know so we can change it. Regarding GPUs and contributions, we are
> still working on what a good training environment will be. Our initial idea
> was a kubeflow-based cluster we called Train Wing, but we've had to put
> them on hold for resource reasons (i.e. we couldn't build Lift Wing,
> deprecate ORES, and build Train Wing all at the same time). More on that
> soon after the Research-ML offsite when we'll have those conversations.
>
> All this said, one thing we do want to support is hosting community
> created models. So, if a community has a model they want to host, we can
> load it into Lift Wing and host it for them at scale. We have a lot of
> details to work out (e.g. community consensus, human rights review as part
> of being a Very Large Online Platform etc.) as to what that would look
> like, but that is the goal.
>
> Chris
>
> On Fri, Sep 22, 2023 at 5:42 PM Samuel Klein <[email protected]> wrote:
>
>> Luca writes:
>>
>> >  Managing several hundreds models for goodfaith and damaging is not
>> very scalable in a modern micro-service architecture like Lift Wing
>> >  (since we have a model for each supported wiki). We (both Research and
>> ML) are oriented on having fewer models that manage more languages at the
>> same time,
>>
>> Is there a reason to think that separate models for each wiki are more
>> effective than one general model that sees the name of the wiki as part of
>> its context?
>> I'd love to read more about the cost of training and updating current
>> models, how much material they are trained on, and how others w/ their own
>> GPUs can contribute to updates.
>>
>> Personally I wouldn't mind a single model that can suggest multiple
>> properties of an edit, including goodfaith, damaging, and likelihood of
>> reversion.  They are different if related concepts -- the first deals with
>> the intent and predicted further editing history of the editor, the second
>> with article accuracy and quality, and the latter with the size +
>> activity + norms of the other editors...
>>
>> SJ
>>
>>
>>
>>
>> On Fri, Sep 22, 2023 at 5:34 PM Aaron Halfaker <[email protected]>
>> wrote:
>>
>>> All fine points.  As you can see, I've filed some phab tasks where I saw
>>> a clear opportunity to do so.
>>>
>>> >  as mentioned before all the models that currently run on ORES are
>>> available in both ores-legacy and Lift Wing.
>>>
>>> I thought I read that damaging and goodfaith models are going to be
>>> replaced.  Should I instead read that they are likely to remain available
>>> for the foreseeable future?   When I asked about a community discussion
>>> about the transition from damaging/goodfaith to revertrisk, I was imagining
>>> that many people who use those predictions might have an opinion about them
>>> going away.  E.g. people who use the relevant filters in RecentChanges.
>>> Maybe I missed the discussions about that.
>>>
>>> I haven't seen a mention of the article quality or article topic models
>>> in the docs.  Are those also going to remain available?  I have some user
>>> scripts that use these models and are relatively widely used.  I didn't
>>> notice anyone reaching out. ... So I checked and setting a User-Agent on my
>>> user scripts doesn't actually change the User-Agent.  I've read that you
>>> need to set "Api-User-Agent" instead, but that causes a CORS error when
>>> querying ORES.  I'll file a bug.
>>>
>>> On Fri, Sep 22, 2023 at 1:22 PM Luca Toscano <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 22, 2023 at 8:59 PM Aaron Halfaker <
>>>> [email protected]> wrote:
>>>>
>>>>> We could definitely file a task.  However, it does seem like
>>>>> highlighting the features that will no longer be available is an
>>>>> appropriate topic for a discussion about migration in a technical mailing
>>>>> list.
>>>>>
>>>>
>>>> A specific question related to a functionality is the topic for a task,
>>>> I don't think that we should discuss every detail that differs from the
>>>> ORES API (Wikitech-l doesn't seem a good medium for it). We are already
>>>> following up on Phabricator, let's use tasks if possible to keep the
>>>> conversation as light and targeted as possible.
>>>>
>>>> Is there a good reference for which features have been excluded from
>>>>> ores-legacy?  It looks like  https://wikitech.wikimedia.org/wiki/ORES 
>>>>> covers
>>>>> some of the excluded features/models, but not all of them.
>>>>>
>>>>
>>>> We spent the last months helping the community to migrate away from the
>>>> ORES API (to use Lift Wing instead), the remaining traffic is only related
>>>> to few low traffic IPs that we are not able to contact. We didn't add
>>>> feature injection or threshold optimization to ores-legacy, for example,
>>>> since there was no indication on our logs that users were relying on it. We
>>>> have always stated everywhere (including all emails sent in this mailing
>>>> list) that we are 100% open to add a functionality if it is backed up by a
>>>> valid use case.
>>>>
>>>>
>>>>> I see now that it looks like the RevertRisk model will be replacing
>>>>> the *damaging *and *goodfaith *models that differentiate intentional
>>>>> damage from unintentional damage.  There's a large body of research on why
>>>>> this is valuable and important to the social functioning of the wikis.
>>>>> This literature also discusses why being reverted is not a very good 
>>>>> signal
>>>>> for damage/vandalism and can lead to problems when used as a signal for
>>>>> patrolling.  Was there a community discussion about this deprecation that 
>>>>> I
>>>>> missed?  I have some preliminary results (in press) that demonstrate that
>>>>> the RevertRisk model performs significantly worse than the damaging and
>>>>> goodfaith models in English Wikipedia for patrolling work.  Do you have
>>>>> documentation for how you evaluated this model and compared it to
>>>>> damaging/goodfaith?
>>>>>
>>>>
>>>> We have model cards related to both Revert Risk models, all of them
>>>> linked in the API portal docs (more info:
>>>> https://api.wikimedia.org/wiki/Lift_Wing_API). All the community folks
>>>> that migrated their bots/tools/etc.. to Revert Risk were very happy about
>>>> the change, and we haven't had any request to switch back since then.
>>>>
>>>> The ML team provides all the models deployed on ORES on Lift Wing, so
>>>> any damaging and goodfaith variant is available in the new API. We chose to
>>>> not pursue the development of those models for several reasons:
>>>> - We haven't had any indication/request from the community about those
>>>> models in almost two years, except few Phabricator updates that we followed
>>>> up on.
>>>> - Managing several hundreds models for goodfaith and damaging is not
>>>> very scalable in a modern micro-service architecture like Lift Wing (since
>>>> we have a model for each supported wiki). We (both Research and ML) are
>>>> oriented on having fewer models that manage more languages at the same
>>>> time, and this is the direction that we are following at the moment. It may
>>>> not be the perfect one but so far it seems a good choice. If you want to
>>>> chime in and provide your inputs we are 100% available in hearing
>>>> suggestions/concerns/doubts/recommendations/etc.., please follow up in any
>>>> of our channels (IRC, mailing lists, Phabricator for example).
>>>> - Last but not the least, most of the damaging/goodfaith models have
>>>> been trained with data coming from years ago, and never re-trained. The
>>>> efforts to keep several hundreds models up-to-date with recent data versus
>>>> doing the same of few models (like revert risk) weights in favor of the
>>>> latter for a relatively small team of engineers like us.
>>>>
>>>>
>>>>> FWIW, from my reading of these announcement threads, I believed that
>>>>> generally functionality and models would be preserved in
>>>>> ores-legacy/LiftWing.  This is the first time I've realized the scale of
>>>>> what will become unavailable.
>>>>>
>>>>
>>>> This is the part that I don't get, since as mentioned before all the
>>>> models that currently run on ORES are available in both ores-legacy and
>>>> Lift Wing. What changes is that we don't expose anymore functionality that
>>>> logs clearly show are not used, and that would need to be maintained and
>>>> improved over time. We are open to improve and add any requirement that the
>>>> community needs, the only thing that we ask is to provide a valid use case
>>>> to support it.
>>>>
>>>> I do think that Lift Wing is a great improvement for the community, we
>>>> have been working with all the folks that reached out to us, without hiding
>>>> anything (including deprecation plans and path forwards).
>>>>
>>>> Thanks for following up!
>>>>
>>>> Luca
>>>> _______________________________________________
>>>> Wikitech-l mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>>
>>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>>
>>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>>
>>
>> --
>> Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
>> _______________________________________________
>> Wikitech-l mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
> _______________________________________________
> Wikitech-l mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

_______________________________________________
Wikitech-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: ORES To Lift Wing Migration

Reply via email to