I am not a mediawiki developer, but shouldn't sha1 be moved instead of
deleted/not deleted? Moved to the content table- so it is kept
unaltered.

That way it can be used for all the goals that have been discussed
(detect reversions, XML dumps, etc.) and they are not altered, just
moved away (being more compatible). And it is not like structure
compatibility is going to be kept, as many fields are going to be
"moved" there, so code using the tables directly has to change anyway;
but if the actual content is not altered, the sha field can be kept
unaltered with the same value as before. It would also allow to detect
a "partial revertion", that means, mediawiki text is set to the same
than a previous one, which is what I assume it is used now. However,
now there will be other content that can be reverted individually.

I do not know what exactly MCR is going to be used for, but if (silly
idea), main text article and categories are 2 different contents of an
article, if user A edits both, and user B reverts the text only, that
would get a different revision sha1 value; however, most reasons here
would want to detect the reversion by checking the sha of the text
only (aka content). Equally, for backwards compatibility, storing it
on content would allow to not have to recalculate it for all already
existing values literally reducing it to a "trivial" code change,
while keeping all old data valid. Keeping the field as is, on
revision, will mean all historical data and old dumps are invalid.
Full revision reversions, if needed, can be checked by checking each
individual content sha or the linked content ids.

If, on the other side, revision should be kept completely backwards
compatible, some helper views can be created on the cloud
wikireplicas, but other than that, MCR would not be possible.

If at a later time, text with the same hash is detected (and content
double checked), content could be normalized by assigning the same id
to the same content?

On Mon, Sep 18, 2017 at 8:25 PM, Danny B. <wikipedia.dann...@email.cz> wrote:
>
> ---------- Původní e-mail ----------
> Od: Dan Andreescu <dandree...@wikimedia.org>
> Komu: Wikimedia developers <wikitech-l@lists.wikimedia.org>
> Datum: 18. 9. 2017 16:26:18
> Předmět: Re: [Wikitech-l] Can we drop revision hashes (rev_sha1)?
> "So, as things stand, rev_sha1 in the database is used for:
>
> 1. the XML dumps process and all the researchers depending on the XML dumps
> (probably just for revert detection)
> 2. revert detection for libraries like python-mwreverts [1]
> 3. revert detection in mediawiki history reconstruction processes in Hadoop
> (Wikistats 2.0)
> 4. revert detection in Wikistats 1.0
> 5. revert detection for tools that run on labs, like Wikimetrics
> ?. I think Aaron also uses rev_sha1 in ORES, but I can't seem to find the
> latest code for that service
>
> If you think about this list above as a flow of data, you'll see that
> rev_sha1 is replicated to xml, labs databases, hadoop, ML models, etc. So
> removing it and adding it back downstream from the main mediawiki database
> somewhere, like in XML, cuts off the other places that need it. That means
> it must be available either in the mediawiki database or in some other
> central database which all those other consumers can pull from.
> "
>
>
>
> I use rev_sha1 on replicas to check the consistency of modules, templates or
> other pages (typically help) which should be same between projects (either
> within one language or even crosslanguage, if the page is not language
> dependent). In other words to detect possible changes in them and syncing
> them.
>
>
>
>
> Also, I haven't noticed it mentioned in the thread: Flow also notices users
> on reverts, but IDK whether it uses rev_sha1 or not. So I'm rather
> mentioning it.
>
>
>
>
>
>
>
> Kind regards
>
>
>
>
>
>
>
> Danny B.
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Jaime Crespo
<http://wikimedia.org>

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to