thiemowmde added a comment.

Story time questions

Questions we collected on 2018-02-06, copied from the PM/Engineering time document for reference:

  • Q: When listing multiple lemmas, how does listing work?
    • A: PM says “unordered” is fine for now, until otherwise demanded. "Unordered" currently means as stuff have been created.
  • Q: In which language is the “/” between multiple lemmas?
    • A: PM suggests to use the users language.
  • Q: Do we ever need a derived label to contain links? Or is plain text always enough?
    • A: PM thinks links inside a derived label are almost always more confusing than helpful. Possible exception: Summaries. But this is outside of the scope of the current story.
  • Q: How to apply language fallbacks on the individual parts?
    • A: PM wants this to be consistent with how fallback chains work everywhere else: all fall back to English, or the item ID if English is missing.
  • Q: Store “Ladder (English, Noun)” as one string, or individually?
    • A: Must be stored individually, for various reasons. One is that the individual elements must be marked with <span lang="…">…</span>.
  • Q: Store “English, Noun” as one string, or individually?
    • A: PM does not care that much. Probably needs to be stored individually for the same reason as above.

Other questions not relevant for PM:

  • Q: Store derived labels for all languages we support in advance?
  • Q: We are going to have stuff like “English, Noun” repeated a lot. Is it worth optimizing the storage layer for duplications?
  • Q: Can the same solution we investigate here work for MediaInfo?
  • Q: Can the solution we investigate here replace Label/DescriptionLookups in Wikibase? See T163538.

Proof of concept

My review of and related:

  • Two new secondary tables are introduced:
    • One stores the individual lemmas from a Lexeme, as strings. These can be used directly, similar to how wb_terms is used.
    • One stores the lexical categories and languages, as item IDs. These references are used to query wb_terms, where the Item labels are stored.
  • The current implementation does not do any prefetching for multiple Lexeme references.
  • It also does not do any prefetching for multiple Item references.

My impression is that this approach is the one we should follow, and build up as we need to. Things to consider:

  • Can we estimate how big the two new secondary tables might grow?
  • We must think about prefetching or something else to avoid querying the database one (or even multiple) times for each Lexeme reference individually. Can we already write down a story and actionable tasks for this?



To: thiemowmde
Cc: thiemowmde, Lucas_Werkmeister_WMDE, gerritbot, Aklapper, WMDE-leszek, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Lordiis, Cinemantique, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, Wikidata-bugs, aude, Darkdadaah, Mbch331
Wikidata-bugs mailing list

Reply via email to