ArielGlenn added a comment.

...

Anything using the existing revision level sha1 for revert detection will miss-detect a revert (or a null-edit) for *all* revisions that did not affect the main slot. While analysis on the slot level may be useful, existing analysis is on the revision level (by definition - slots are new). So it seems reasonable to keep revision-level semantics intact.

Whatever we do, we should definitely include both hashes (main slot and revision), to make the distinction obvious, and the path forward clear, and give consumers the option to change their code to consistently do the thing they want. Only if both hashes are available can we support both kinds of consumers - those that focus on the revision as a whole, and those that focus on individual slots.

Yes, I think we do need and want both sha1 values in the output.

A more philosophical reason to maintain the revision level logic: the "content of the revision" is the combined content of all slots. The intended semantics is not "slots are optional miscellany bits and pieces". The intended semantics is "the revision's content consists of multiple slots".

So, in essense: we would break the assumption that two revisions that have the same hash have the same content. We'd also break consistency with revision hashes reported the API.

I understand and agree with the idea that a reivision consists of the content of all of its slots.
What I'm getting at is that folks have until now been studying article or other page content. Sure, there hasn't been other content available for them to examine, but I imagine that a vast majority of folks will still be interested primarily in article content and how it changes over time, as opposed to , say, considering reverts also of various structured data entries for media files. And folks looking at article reverts and expecting to just pick those up will get a bunch of extra entries if they rely on the rev sha1 once other slots have content in them. It's not unreasonable to ask folks who are interested in the content *also* of other slots to check those sha1s specifically, or to look at a new entry which contains *specifically* the rev sha1.

The problem is that there's no nested level where we can put another sha1 tag. We could introduce a new element revsha1, though I don't like it much. We could move both sha1s into attributes of their respective tags (text and revision), which I like even less.


TASK DETAIL
https://phabricator.wikimedia.org/T199121

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: kchapman, tstarling, awight, JAllemandou, hoo, pmiazga, Nemo_bis, brion, Tgr, Aklapper, Fjalapeno, ArielGlenn, daniel, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, JJMC89, Agabi10, D3r1ck01, SBisson, gnosygnu, Wikidata-bugs, aude, GWicke, jayvdb, fbstj, santhosh, Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to