On Monday, July 11, 2016, Daniel Kinzler <[email protected]> wrote: > It seems there is disagreement about what the correct interpretation of NULL in > the rev_content_model column is. Should NULL there mean > > (a) "the current page content model, as recorded in page_content_model" > > or should it mean > > (b) "the default for this title, no matter what page_content_model says"? > > > Kunal and I have had an unintentional edit war about this question in Revision.php: > > Kunal changed it from (a) to (b) in https://gerrit.wikimedia.org/r/#/c/222043/ > I later changed it from (b) to (a) in https://gerrit.wikimedia.org/r/#/c/297787/ > Kunal reverted me from (a) to (b) in https://gerrit.wikimedia.org/r/#/c/298239/ > > > So, which way do we want it? > > > The conflict seems to arise from (at least) three competing use cases: > > I) re-interpreting page content. For instance, a user may move a misnamed > User:Foo.jss to User:Foo.js. In this case, the content should be re-interpreted > as JavaScript, including all old revisions. This would be in favor of behavior > (a), though it still works with (b), because the default model changes based on > the suffix ".js". I think it would however be better to only rely on title > parsing magic once, when creating the page, not later, when rendering old revisions. > > II) converting page content. For instance, if a talk page gets converted to > using Flow, new revisions (and page_content_model) will have the Flow model, > while old revisions need to keep their original wikitext model (even though > their rev_content_model is null). That would need behavior (b). > > III) changing a namespace's default content model. E.g. when installing an > extension that changes the default content model of a namespace (such as > Wikibase with Items in the main namespace, or Flow-per-default for Talk pages), > existing pages that were already in that namespace should still be readable. > With (b), this would fail: even though page_content_model has the correct model > for reading the page, rev_content_model is null, so the new namespace default is > used, which will fail. With (a), this would simply work: the page will be > rendered according to page_content_model. > > > In all cases it's possible to resolve the issue by replacing the NULL entries > for all revisions of a page with the current model id. The question is just when > and how we do that, and when and how we can even detect that this needs doing. > > There is also an in-between option, let's call it a/b: fall back to > page_content_model for the latest revision (that should *always* be right), but > to ignore page_content_model for older revisions. That would cater to use case > III at least in so far as it would be possible to view the "misplaced" pages. > But viewing old revisions or diffs would still fail with a nasty error. This > option may look better on the surface, but I fear it will just add to the confusion. > > There's another fix: never write null into rev_content_model. Always put the > actual model ID there. That's pretty wasteful, but it's robust and reliable. > When we decided to use null as a placeholder for the default, we assumed the > default would never change. But as we now see, it sometimes does... > > > So, what should it be, option (a) or (b)? And how do we address the use case > that is then broken? What should we write into rev_content_model in the future? > > I personally think that option (a) makes more sense, because the resolutions of > defaults is then local to the database. It could even be done within the SQL > query. It's easier to maintain consistency that way. For use case II, that would > require us to "fill in" all the rev_content_model fields in old revisions when > converting a page. I think it would be a good thing to do that. If we have the > content model change between revisions, it seems prudent to record it explicitly. > > -- > Daniel Kinzler > Senior Software Developer > > Wikimedia Deutschland > Gesellschaft zur Förderung Freien Wissens e.V. > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
To me, (b) makes more sense, as all the other fields in page represent the info for the current revision. Additionally all the fields in revision (except rev_deleted) are immutable and never change, and definitely dont change interpretation based on other db fields. Having old revisions have a dependency on the page table (especially a dependency going in the direction revision->page) seems wrong to me. -- bawolff _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
