Addshore added a comment.
I do not think we want to use the wb_changes table.
wmf.mediawiki_history in hadoop is probably the right way to go with we are
figuring this out from edit summaries.
> (2) use the API to collect the JSON representation of the revised entity by
revision-id,
Note, this will have to be done using Special:EntityData and the revision
parameter (wbgetentities doesn't have this functionality)
> Wikibase does not help much to identifying triggering and cleaning edits
It could do though?
Without adding anything to wikibase i guess the general approach has to be:
- Find revisions that touch statement mainsnak values and or references
- Considerations:
- These values can be touched using a variety of different api modules
and with a variety of different summaries, so not just wbsetclaim-update, if
anything working with a blacklist of summaries might be easier (eliminate
things that only touch terms for examples)
- This could be simplified if the definition of a tainted reference had
something to do with being done by a real user via our UI, but maybe we don't
want to say that.
- Fetch the entity either side of the change and see what happened and
classify that?
- Once this has been done for a window of data try to figure out exactly what
is happening to the statements based on the classifications?
I would be pro a call to discuss this.
TASK DETAIL
https://phabricator.wikimedia.org/T240466
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic, Addshore
Cc: Aklapper, Addshore, Jan_Dittrich, hoo, rosalieper, noarave, Tarrow,
Lydia_Pintscher, GoranSMilovanovic, WMDE-leszek, Sarai-WMDE, darthmon_wmde,
Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, Scott_WUaS, Wikidata-bugs,
aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs