Hi both,

Many thanks for all the help!
All the information was really helpful.

Best wishes,
Elisavet

Στις Δευ, 14 Σεπ 2020 στις 3:32 μ.μ., ο/η Isaac Johnson <[email protected]>
έγραψε:

> Hi Elisavet,
>
> Reverts:
> Sumit pointed you in the right direction. In case you're curious about
> types of reverts beyond identity reverts, this phab task has an excellent
> write-up / analysis of how to detect the different types:
> https://phabricator.wikimedia.org/T252366
>
> Deletions:
> I'm not fully sure what you are referring to here, but if it's:
> * revision deletion
> <https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Revision_deletion>:
> edits that have been deleted will show up as deleted in the XML history
> dumps. You can also find more details in the deletion log (table details
> <https://www.mediawiki.org/wiki/Manual:Logging_table>; example XML dump
> <https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-pages-logging.xml.gz>
> )
> * blanking of content: no fool-proof automatic way to detect when editors
> are deleting full sections/pages, but there are a few options. You can
> inspect edit comments and page diffs for evidence of blanking, but an
> easier place to start is probably with automatic edit tags (e.g.,
> mw-blank). Here's the full list of tags
> <https://www.wikidata.org/wiki/Special:Tags> and dump of tag types (table
> <https://www.mediawiki.org/wiki/Manual:Change_tag_def_table>; example XML
> dump
> <https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag_def.sql.gz>)
> and all tags applied (table
> <https://www.mediawiki.org/wiki/Manual:Change_tag_table>; example XML dump
> <https://dumps.wikimedia.org/wikidatawiki/20200701/wikidatawiki-20200701-change_tag.sql.gz>).
> I'm not familiar with Wikidata tags so you probably want to do some
> examination of what they're actually detecting to make sure it's what you
> are looking for before you rely on them for analysis.
>
> Best,
> Isaac
>
> On Fri, Sep 11, 2020 at 8:48 PM Sumit Asthana <[email protected]>
> wrote:
>
>> Hi Elisavet,
>>
>> You can identify reverts using the sha1 checksum of revisions You can use
>> the mwreverts library[0] to do that in the dump. Editquality[1] repository
>> has such a use case for detecting reverts. You will not be able to detect
>> partial reverts but it will detect identity reverts which form majority of
>> the reverts.
>>
>> - Regards
>> Sumit Asthana
>>
>> [0] - https://pythonhosted.org/mwreverts/
>> [1] -
>> https://github.com/wikimedia/editquality/blob/master/editquality/utilities/extract_damaging.py#L160
>>
>>
>> On Fri, Sep 11, 2020 at 2:55 AM Elisavet Koutsiana <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>> I wanted to ask if there is any canonical way to identify deletion,
>>> reverts etc in the edit history xml files. I can understand that the action
>>> of every revision is described in the "comment" element of the xml format,
>>> but is there a code name or number or anything else that will help me to
>>> identify one revision for example as deletion?
>>>
>>> Thank you,
>>> Elisavet
>>> _______________________________________________
>>> Wikidata mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to