Hi Jérôme, I wrote a little overview of this a while back that might be of use: https://meta.wikimedia.org/wiki/User:Isaac_(WMF)/Analysis_gotchas#Reverts_(Patrolling_and_Vandalism)
Essentially, the library that Nathan suggested (mwreverts) is great for the shasum-based approach and you'll need to use edit tags <https://en.wikipedia.org/wiki/Wikipedia:Tags> to check for additional tool-based reverts like mw-undo, mw-rollback, etc. I think combining the two approaches makes the most sense and you can see a bunch more details on their overlap for English Wikipedia in this task: https://phabricator.wikimedia.org/T266374 It sounds like you're collecting specific edits so this is probably less relevant, but I'll also highlight the excellent public dataset put together by the Wikimedia Foundation Data Engineering team that has the full edit history for each language edition and includes metadata such as whether the edit was a revert based on shasums as well as the edit tags. If you were processing many many edits, I'd suggest starting with this as it would have all the information you need in one place. - More details: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history - You can see an example of how to access and process these dumps on Wikimedia-hosted Jupyter notebooks (PAWS <https://wikitech.wikimedia.org/wiki/PAWS>) here: https://public-paws.wmcloud.org/User:Isaac_(WMF)/Denormalized%20Edit%20History%20Dumps.ipynb Hope that helps! Best, Isaac On Fri, Apr 14, 2023 at 11:33 AM J. Nathan Matias <[email protected]> wrote: > Hi Jérôme, > > Have you looked at python-mwreverts > <https://github.com/mediawiki-utilities/python-mwreverts>? This library > has > been used by many researchers who are studying reverted edits, and it may > be useful for your work as well. > > All the best, > > --Nathan > > On Fri, Apr 14, 2023 at 11:15 AM <[email protected]> wrote: > > > Sending this again from my current address. Left Gmail a long time ago -- > > not sure the redirect still works... My apologies if this is hitting your > > inbox twice! > > > > > > > > _ _ _ _ > > > > > > > > Dear Wikimedia research community, > > > > > > > > I'd have a question for the data savvy people on this list :) > > > > > > > > My goal is simple: for a sample of English Wikipedia editors, I'm trying > > to identify their edits which were reverted. I can see two possible way > of > > doing this: > > > > > > > > 1. Identify the reverts using the SHA1 values. (A revert happens when the > > edit exactly restores the page to its previous state.) > > > > > > > > 2. Identify the reverts using the "undo" button. > > > > > > > > As I see it, solution 2 is less "precise" (you'll miss some reverts, > e.g., > > those performed manually). However, it would also be less computationally > > intensive, and I don't see that it would introduce any bias (results can > be > > compared across editors in a statistical model). > > > > > > > > However, I do not see the information about whether a revision was > > reverted using the “undo” button in the enwiki database: > > > https://www.mediawiki.org/w/index.php?title=Manual:Database_layout/diagram&action=render > > > > > > > > I find this surprising. Am I missing something? (And if so, how do you > > personally feel about strategy 1 vs. strategy 2?) > > > > > > > > Thank you so much for any insight you might be willing to provide! :D > > > > > > > > Sincerely, > > > > Jérôme > > > > > > _______________________________________________ > > Wiki-research-l mailing list -- [email protected] > > To unsubscribe send an email to > [email protected] > > > > > -- > J. Nathan Matias <http://natematias.com/> : Center for Advanced Study in > the Behavioral Sciences : Cornell University : Citizens and Technology Lab > <https://citizensandtech.org> : social.coop/@natematias : blog > <https://natematias.com/external-posts/> : daylight time photos > <https://social.coop/@natematias/109423664679446879> > _______________________________________________ > Wiki-research-l mailing list -- [email protected] > To unsubscribe send an email to [email protected] > -- Isaac Johnson (he/him/his) -- Senior Research Scientist -- Wikimedia Foundation _______________________________________________ Wiki-research-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
