jcrespo added a comment.

Title is already a first-class entity, that's what the page table is.

No, right now, page is an entity that has a series of properties: at title, a text, etc. By setting title as a strong entity, it has meaning by its own: a page has 1:1 titles, a title has 1:0 pages. There is some issues with the namespace handling, but I will ignore those for now.

But yes, the idea is that a title can exist independently if a page exists. If at some point they are referenced, they are created. While that can create a lot of garbage, I believe it will not be worse than the amount of duplicate references we have to non-existent page titles on the link* templates (e.g. red links), while being more efficient in space. E.g. linking to the template "Template:WPBannerMeta/importancescale" takes 28 bytes (I think the template namespace is not used for templatelinks), and it is being used 6 million times (that is 128 MB, without compression) while an identifier would only take 4 or 8 bytes. I do not have a tought on the hashing, although I find "I found that using a 64 bit hash of the title is a better solution than auto-increment" strange, as in both cases, the mapping would be 1 way, and impossible to revert. I do not intend to manage the ids, we store them forever (append only table).

What I mean is we could apply such a patch to core, and you can immediately use it for your own needs, and I would take care of other people using it (any help is welcome). For example, page assessment has its own duplicate table (for projects namespace only) where normalization happens for titles- and it is not an uncommon request for other extensions, so duplicating efforts is a waste. I also predict it will halve our database storage requirements and io if it was applied to the link* tables. That is instantly double the db resources!

However, it is not an immediate change- there are cases where references to titles are actually references to pages, denormalized. In some cases, references should change on page rename, in others it shouldn't (*links). Also simple selects now become joins, and many people apparently doesn't like joins, even when in this case they would be faster, because they would run over smaller tables.

Let's create a separate task for this.



To: Addshore, jcrespo
Cc: jcrespo, Meno25, gerritbot, Darkdadaah, WMDE-leszek, Lydia_Pintscher, gabriel-wmde, JAnD, daniel, Addshore, Aklapper, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331
Wikidata-bugs mailing list

Reply via email to