| Addshore added a comment. |
In T148988#2762557, @jcrespo wrote:CREATE TABLE IF NOT EXISTS cognate_titles (You separate the trio site, namespace, title- ok, but you do not give that a unique identifier, so this table is useless?
Any sort of uniquie identifier would be useless / never used. As can be seen in the sample queries above.
CREATE TABLE IF NOT EXISTS cognate_normalizations (This is the only table creating a relationship, but it goes against everything we talked before- you use full titles (so no space is saved), and it is impossible, as it is now, to establish any kind of relationship between actual titles in this title column.
I am not saying this is wrong, but if it is right, you have changed completely the model you explained to me before, and I no longer understand it. There is an n:n relationship between raw title and cognate titles, and that makes no sense, because it is cross-wiki.
There is a 1:n relationship between the normalizations and the titles.
For example a normalization of "Ellipsis..." using 3 dots relates to the titles "Ellipsis..." using 3 dots and "Ellipsis…" using the single char ellipsis.
Out of the 27 million rows in the titles table, probably only a few thousand (defiantly under 100k) will result in normalizations that differ from the original title.
Also no matter which wiki the title comes from the normalization process would always be the same.
If on the en and de site there are titles of "Ellipsis…" using the single char ellipsis, but on the fr and pt site there are titles of "Ellipsis..." using 3 dots then the data in the tables would look like the below:
titles site, ns, title enwiktionary, Ellipsis… dewiktionary, Ellipsis… frwiktionary Ellipsis... ptwiktionary Ellipsis... normalizations raw, normalized Ellipsis…, Ellipsis...
I think the key here that may have been missed in our previous chat is that the normalization step is the same for all wikis.
Cc: hoo, Aklapper, jcrespo, Addshore, Marostegui, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, Darkdadaah, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
