On Monday, July 11, 2016, Daniel Kinzler <[email protected]>
wrote:
> It seems there is disagreement about what the correct interpretation of
NULL in
> the rev_content_model column is. Should NULL there mean
>
> (a) "the current page content model, as recorded in page_content_model"
>
> or should it mean
>
> (b) "the default for this title, no matter what page_content_model says"?
>
>
> Kunal and I have had an unintentional edit war about this question in
Revision.php:
>
> Kunal changed it from (a) to (b) in
https://gerrit.wikimedia.org/r/#/c/222043/
> I later changed it from (b) to (a) in
https://gerrit.wikimedia.org/r/#/c/297787/
> Kunal reverted me from (a) to (b) in
https://gerrit.wikimedia.org/r/#/c/298239/
>
>
> So, which way do we want it?
>
>
> The conflict seems to arise from (at least) three competing use cases:
>
> I) re-interpreting page content. For instance, a user may move a misnamed
> User:Foo.jss to User:Foo.js. In this case, the content should be
re-interpreted
> as JavaScript, including all old revisions. This would be in favor of
behavior
> (a), though it still works with (b), because the default model changes
based on
> the suffix ".js". I think it would however be better to only rely on title
> parsing magic once, when creating the page, not later, when rendering old
revisions.
>
> II) converting page content. For instance, if a talk page gets converted
to
> using Flow, new revisions (and page_content_model) will have the Flow
model,
> while old revisions need to keep their original wikitext model (even
though
> their rev_content_model is null). That would need behavior (b).
>
> III) changing a namespace's default content model. E.g. when installing an
> extension that changes the default content model of a namespace (such as
> Wikibase with Items in the main namespace, or Flow-per-default for Talk
pages),
> existing pages that were already in that namespace should still be
readable.
> With (b), this would fail: even though page_content_model has the correct
model
> for reading the page, rev_content_model is null, so the new namespace
default is
> used, which will fail. With (a), this would simply work: the page will be
> rendered according to page_content_model.
>
>
> In all cases it's possible to resolve the issue by replacing the NULL
entries
> for all revisions of a page with the current model id. The question is
just when
> and how we do that, and when and how we can even detect that this needs
doing.
>
> There is also an in-between option, let's call it a/b: fall back to
> page_content_model for the latest revision (that should *always* be
right), but
> to ignore page_content_model for older revisions. That would cater to use
case
> III at least in so far as it would be possible to view the "misplaced"
pages.
> But viewing old revisions or diffs would still fail with a nasty error.
This
> option may look better on the surface, but I fear it will just add to the
confusion.
>
> There's another fix: never write null into rev_content_model. Always put
the
> actual model ID there. That's pretty wasteful, but it's robust and
reliable.
> When we decided to use null as a placeholder for the default, we assumed
the
> default would never change. But as we now see, it sometimes does...
>
>
> So, what should it be, option (a) or (b)? And how do we address the use
case
> that is then broken? What should we write into rev_content_model in the
future?
>
> I personally think that option (a) makes more sense, because the
resolutions of
> defaults is then local to the database. It could even be done within the
SQL
> query. It's easier to maintain consistency that way. For use case II,
that would
> require us to "fill in" all the rev_content_model fields in old revisions
when
> converting a page. I think it would be a good thing to do that. If we
have the
> content model change between revisions, it seems prudent to record it
explicitly.
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

To me, (b) makes more sense, as all the other fields in page represent the
info for the current revision. Additionally all the fields in revision
(except rev_deleted) are immutable and never change, and definitely dont
change interpretation based on other db fields. Having old revisions have a
dependency on the page table (especially a dependency going in the
direction revision->page) seems wrong to me.

--
bawolff
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to