| Marostegui added a comment. |
In T215902#4950653, @Lucas_Werkmeister_WMDE wrote:term_type, small table holding the types of strings to be indexed in the db, right now this would be labels, descriptions and aliases, but this would scale to allowing more similar terms into the index (if desired) It might be the case not having a table here would be better and just keep INT ids in code.
There may be little point in normalizing the term_type for example. This thing only has 3 rows. (languages is also pretty small)
We could also turn term types and languages into short IDs via a hash function: as far as I’m aware, Wikibase only needs the string→ID direction (hash function), and if we need the ID→string direction (e. g. during manual investigation) we can hash all the known term types / language codes and look for the value we have.
Why not combining strings langstring and langstringtype on the same table?
For certain common types of items – especially people, but also e. g. cities – it is common to have the same label in a lot of different languages (see also T188992#4026839), so I think a strings table without a language code should help a lot. I’m not sure about the distinction between langstring and langstringtype though.
I guess the easiest is to migrate things while writing to both and at some point once both set of tables are in sync switch the writes to the new tables only?
Do the DB servers have enough storage space for this?
The old ones certainly not (ie: the master or codfw) - we are discussing now how to proceed in regard to those old servers, but we are on early stages of this.
I should have been clearer though, my point was to, while you move stuff from one table to the other(s) you keep delete rows from wb_terms - we might need to optimize the table along the way to actually claim back all that disk space on the servers.
Cc: Lucas_Werkmeister_WMDE, alaa_wmde, JeroenDeDauw, Ladsgroup, Marostegui, Aklapper, Addshore, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
