Addendum, after sleeping over this:

Do we really want to manage something that is essentially configuration, namely
the set of available content models and formats, in a database table? How is it
maintained?

For context:
* As per T113034, we are movign away from managing interwiki prefixes in the
database, in favor of configuration files.
* Namespace IDs are defined in LocalSettings.php.

The original design of ContentHandler used integer IDs for content models and
formats in the DB. A mapping to human readable names is only needed for logging
and error messages anyway. Such a mapping could be maintain in
LocalSettings.php, just like we do for namespaces. This would also serve to
avoid ID clashes. My idea back then was to have a sort of registry on
mediawiki.org where extensions could reserve an ID for themselves, so that the
same ID would stand for the same model everywhere.

The disadvantage is of course that the model and format are not obvious when
eyeballing the result of an SQL query. It also makes database dumps more
brittle, since they cannot be interpreted without knowledge of the format and
model identifiers. That's an argument for having these in the DB.

Still... configuration in the database is nasty to maintain by hand, and also
annoying for extensions that define content models. Do we introduce a simple
hook that makes sure the content model and format gets registered in the 
database?


Am 11.07.2016 um 21:26 schrieb Daniel Kinzler:
> Hi Jaime, thanks for the pointer! I had completely forgotten about that.
> 
> A few thoughts about that RFC:
> 
> * I have long thought that content_format is pretty pointless and redundant. I
> haven't seen any content model that uses different serialization formats (I
> wrote a few that support two, but only ever used one). If the serialization 
> does
> need to change for some reason, it's usually easy to detect from the first few
> bytes.
> 
> * What we need instead is versioning on the content model. It happens quite
> often that the data structure you store changes slightly. Knowing what version
> you are dealing with is quite helpful when deserializing and processing. These
> differences are much harder to auto-detect than the serialization format,
> 
> * Per-page and per-revision content model will become redundant with
> Multi-Content-Revisions. We will instead have this info in the revision_slot
> table (multiple per revision). The same design still applies, but changing the
> page and revision table would be pointless. We would just ignore the content
> model (and format) in the page and revision table, and rely on the info for 
> the
> slot table instead. At some point, we can then drop this info from page and
> revision.
> 
> I propose to introduce the content_model (and maybe also content_format) 
> tables,
> but not touch the page and revision table for now. Instead, we introduce
> revision_slots for Multi-Content-Revisions first, using the content_model 
> table,
> and introduce model versioning; maybe drop the format in the process.
> 
> What do you think?
> 
> Am 11.07.2016 um 14:27 schrieb Jaime Crespo:
>> On Mon, Jul 11, 2016 at 2:07 PM, Daniel Kinzler
>> <[email protected]> wrote:
>>> It seems there is disagreement about what the correct interpretation of 
>>> NULL in
>>> the rev_content_model column is. Should NULL there mean
>>
>>> What should we write into rev_content_model in the future
>>
>> Content model handling is pending a refactoring:
>> <https://www.mediawiki.org/wiki/Requests_for_comment/Content_model_storage>
>> Once that happens, they should never be NULL.
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> 
> 


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to