Hi Elisavet,

Reverts:
Sumit pointed you in the right direction. In case you're curious about
types of reverts beyond identity reverts, this phab task has an excellent
write-up / analysis of how to detect the different types:
https://phabricator.wikimedia.org/T252366

Deletions:
I'm not fully sure what you are referring to here, but if it's:
* revision deletion
<https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Revision_deletion>:
edits that have been deleted will show up as deleted in the XML history
dumps. You can also find more details in the deletion log (table details
<https://www.mediawiki.org/wiki/Manual:Logging_table>; example XML dump
<https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-pages-logging.xml.gz>
)
* blanking of content: no fool-proof automatic way to detect when editors
are deleting full sections/pages, but there are a few options. You can
inspect edit comments and page diffs for evidence of blanking, but an
easier place to start is probably with automatic edit tags (e.g.,
mw-blank). Here's the full list of tags
<https://www.wikidata.org/wiki/Special:Tags> and dump of tag types (table
<https://www.mediawiki.org/wiki/Manual:Change_tag_def_table>; example XML
dump
<https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag_def.sql.gz>)
and all tags applied (table
<https://www.mediawiki.org/wiki/Manual:Change_tag_table>; example XML dump
<https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag.sql.gz>).
I'm not familiar with Wikidata tags so you probably want to do some
examination of what they're actually detecting to make sure it's what you
are looking for before you rely on them for analysis.

Best,
Isaac

On Fri, Sep 11, 2020 at 8:48 PM Sumit Asthana <[email protected]>
wrote:

> Hi Elisavet,
>
> You can identify reverts using the sha1 checksum of revisions You can use
> the mwreverts library[0] to do that in the dump. Editquality[1] repository
> has such a use case for detecting reverts. You will not be able to detect
> partial reverts but it will detect identity reverts which form majority of
> the reverts.
>
> - Regards
> Sumit Asthana
>
> [0] - https://pythonhosted.org/mwreverts/
> [1] -
> https://github.com/wikimedia/editquality/blob/master/editquality/utilities/extract_damaging.py#L160
>
>
> On Fri, Sep 11, 2020 at 2:55 AM Elisavet Koutsiana <
> [email protected]> wrote:
>
>> Hello,
>>
>> I wanted to ask if there is any canonical way to identify deletion,
>> reverts etc in the edit history xml files. I can understand that the action
>> of every revision is described in the "comment" element of the xml format,
>> but is there a code name or number or anything else that will help me to
>> identify one revision for example as deletion?
>>
>> Thank you,
>> Elisavet
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to