ArielGlenn added a comment.

That was very helpful, thanks.

Okay, here's my take. At some point in the future (unknown when), we might lose the text table; we'd have to have someplace for third-party installations to store their revision text, and whatever that mechanism is (not necessarily an external store), would be supported along with the external store. We also need to support installations that do not enable MCR. What that all looks like should be dealt with then. Until then, it's ok to continue to use the text ids as is.

Because there's some doubt about which element a given parser may grab if it expects the element to be unique (see T199121#4494050), I'd rather avoid having multiple text elements and use the safer, if less satisfying, mixed format for now, of an unwrapped text element tag formatted the same old way, and new content tags which will be unlooked for and hopefully ignored by existing tools.This also avoids duplicating revision information into the text attributes for the main slot for the sake of backwards compatibility. I agree that when we come up to another breaking change in the future, we should revisit the mixed format.

It's true that most information we provide about the content element really will be attributes of the content in a given slot and not pieces of data somehow bundled up as part of the content, so it would more properly be represented as attributes rather an as revision child elements. We could move the attributes into the content tag thus:

<content origin="308722098" role="wd_entity" model="metadata" format="text/json" text_id="305112983" sha1="..." bytes="xxx" />

for first-pass ('stub') dumps, or like this:

<content xml:space="preserve" origin="308722098" role="wd_entity" model="metadata" format="text/json" sha1="...">stuff goes here...
... 
</content>

for second-pass (revision content) dumps.
In the case that the revision has been suppressed/deleted, we could produce the following:

<content role="wd_entity" model="metadata" format="text/json" deleted="deleted" />

for either pass of the dumps. We would do this for each slot (except 'main', which would be formatted as a plain old boring text element in the way currently done), since suppression/deletion is at the revision level only, not per slot.

What do folks think about the above?


TASK DETAIL
https://phabricator.wikimedia.org/T199121

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: tstarling, awight, JAllemandou, hoo, pmiazga, Nemo_bis, brion, Tgr, Aklapper, Fjalapeno, ArielGlenn, daniel, kostajh, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, Tramullas, Acer, LawExplorer, JJMC89, Agabi10, Susannaanas, SBisson, gnosygnu, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, GWicke, jayvdb, Ricordisamoa, fbstj, Lydia_Pintscher, Fabrice_Florin, Raymond, santhosh, Jdforrester-WMF, Steinsplitter, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to