Hi both, Many thanks for all the help! All the information was really helpful.
Best wishes, Elisavet Στις Δευ, 14 Σεπ 2020 στις 3:32 μ.μ., ο/η Isaac Johnson <[email protected]> έγραψε: > Hi Elisavet, > > Reverts: > Sumit pointed you in the right direction. In case you're curious about > types of reverts beyond identity reverts, this phab task has an excellent > write-up / analysis of how to detect the different types: > https://phabricator.wikimedia.org/T252366 > > Deletions: > I'm not fully sure what you are referring to here, but if it's: > * revision deletion > <https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Revision_deletion>: > edits that have been deleted will show up as deleted in the XML history > dumps. You can also find more details in the deletion log (table details > <https://www.mediawiki.org/wiki/Manual:Logging_table>; example XML dump > <https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-pages-logging.xml.gz> > ) > * blanking of content: no fool-proof automatic way to detect when editors > are deleting full sections/pages, but there are a few options. You can > inspect edit comments and page diffs for evidence of blanking, but an > easier place to start is probably with automatic edit tags (e.g., > mw-blank). Here's the full list of tags > <https://www.wikidata.org/wiki/Special:Tags> and dump of tag types (table > <https://www.mediawiki.org/wiki/Manual:Change_tag_def_table>; example XML > dump > <https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag_def.sql.gz>) > and all tags applied (table > <https://www.mediawiki.org/wiki/Manual:Change_tag_table>; example XML dump > <https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag.sql.gz>). > I'm not familiar with Wikidata tags so you probably want to do some > examination of what they're actually detecting to make sure it's what you > are looking for before you rely on them for analysis. > > Best, > Isaac > > On Fri, Sep 11, 2020 at 8:48 PM Sumit Asthana <[email protected]> > wrote: > >> Hi Elisavet, >> >> You can identify reverts using the sha1 checksum of revisions You can use >> the mwreverts library[0] to do that in the dump. Editquality[1] repository >> has such a use case for detecting reverts. You will not be able to detect >> partial reverts but it will detect identity reverts which form majority of >> the reverts. >> >> - Regards >> Sumit Asthana >> >> [0] - https://pythonhosted.org/mwreverts/ >> [1] - >> https://github.com/wikimedia/editquality/blob/master/editquality/utilities/extract_damaging.py#L160 >> >> >> On Fri, Sep 11, 2020 at 2:55 AM Elisavet Koutsiana < >> [email protected]> wrote: >> >>> Hello, >>> >>> I wanted to ask if there is any canonical way to identify deletion, >>> reverts etc in the edit history xml files. I can understand that the action >>> of every revision is described in the "comment" element of the xml format, >>> but is there a code name or number or anything else that will help me to >>> identify one revision for example as deletion? >>> >>> Thank you, >>> Elisavet >>> _______________________________________________ >>> Wikidata mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> _______________________________________________ >> Wikidata mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
