| Lucas_Werkmeister_WMDE added a comment. |
term_type, small table holding the types of strings to be indexed in the db, right now this would be labels, descriptions and aliases, but this would scale to allowing more similar terms into the index (if desired) It might be the case not having a table here would be better and just keep INT ids in code.
There may be little point in normalizing the term_type for example. This thing only has 3 rows. (languages is also pretty small)
We could also turn term types and languages into short IDs via a hash function: as far as I’m aware, Wikibase only needs the string→ID direction (hash function), and if we need the ID→string direction (e. g. during manual investigation) we can hash all the known term types / language codes and look for the value we have.
Why not combining strings langstring and langstringtype on the same table?
For certain common types of items – especially people, but also e. g. cities – it is common to have the same label in a lot of different languages (see also T188992#4026839), so I think a strings table without a language code should help a lot. I’m not sure about the distinction between langstring and langstringtype though.
I guess the easiest is to migrate things while writing to both and at some point once both set of tables are in sync switch the writes to the new tables only?
Do the DB servers have enough storage space for this?
Cc: Lucas_Werkmeister_WMDE, alaa_wmde, JeroenDeDauw, Ladsgroup, Marostegui, Aklapper, Addshore, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
