LucasWerkmeister added a comment.
It’s still not clear to me which problem the `-x-Q123-1` patch is trying to solve. Several languages have been mentioned in this task, but which of them would benefit from this system? I feel like for several of them, we’ve already reached the conclusion that separate forms are in fact the way to go. I’d like to extract a general rule from @Fnielsen’s comment above (T236593#5610903 <https://phabricator.wikimedia.org/T236593#5610903>): if you need separate statements, then you need separate forms or lexemes. (I think this is a sufficient condition, though it might not be a necessary one.) Pronunciation (whether pronunciation audio <https://www.wikidata.org/wiki/Property:P443> or IPA transcription <https://www.wikidata.org/wiki/Property:P898>) is probably the most significant kind of statement here: if a speaker would pronounce the spellings differently, then they should be different forms – regardless of whether the difference is a completely different ending as in octopuses/octopi, or just an extra schwa as in aft(e)nen. (I don’t find the hyphenation example as convincing… don’t you need a different hyphenation for every spelling variant, even for cases that really should just be multiple representations of one form? E.g. co‧lor/co‧lour – that could just be multiple statements on the same form, with different monolingual text language codes.) I suspect this rule covers the Norwegian example that originally motivated this task: I feel like “parametere” and “parametre” are probably pronounced differently, much like “aftnen” and “aftenen” are pronounced differently in Danish according to Finn. For Vietnamese chữ Nôm, I feel like @mxn’s comment at T236593#8024999 <https://phabricator.wikimedia.org/T236593#8024999> goes in a similar direction, though I admit I find the whole Chinese-characters part of this discussion hard to follow. For the cases where you really only want to have one form with multiple representations, I still agree with @daniel’s comment (T236593#5610378 <https://phabricator.wikimedia.org/T236593#5610378>): “you make up a code for each of the spellings”. In practice, the only way to “make up a code” that we currently support is to append -x-Q//12345// to an existing, established language code. As far as I understand, this solution works well for Hebrew: e.g. ספר/סֵפֶר (L67105) <https://www.wikidata.org/wiki/Lexeme:L67105> (the “book” word) uses the language codes `he` and `he-x-Q21283070`, where Q21283070 <https://www.wikidata.org/wiki/Q21283070> represents Tiberian vocalization, the orthography with diacritics. At some point, an editorial decision was made that the spelling without diacritics “deserves” the unsuffixed `he` language code (instead of both spellings using an -x-Q//12345// language code), which I think is reasonable: data reusers who don’t care about the different spellings can use the most standard language code (`he`) and its single representation per form. Allowing people to append an integer number to the item ID adds a second way to make up a code, and one that seems less useful to me: without knowing what the number means, how do I know which form representation to use? To me this runs counter to the goal of “allow[ing] the consumer to choose which variant they prefer”. For the languages that appear to need multiple representations for the same language code per form (e.g. the Indian languages @Mahir256 mentioned in T236593#5608530 <https://phabricator.wikimedia.org/T236593#5608530>?), is it not possible to make the item ID approach work, by creating more special-purpose items? Wikidata editors would then make a decision which of the possible spellings “deserves” the standard language code, and which additional items need to be created (“spelling with character X”, “spelling with sequence Y”?). I understand that not all languages have standardized spellings where you can use a single item ID to refer to the spelling variants of a wide range of lexemes (like in Hebrew), but I think it should still be possible to describe different spellings using items that carry more meaning than just a number. TASK DETAIL https://phabricator.wikimedia.org/T236593 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: LucasWerkmeister Cc: LucasWerkmeister, C933103, AGutman-WMF, mxn, So9q, Ijon, daniel, Asaf, Mahir256, Danmichaelo, Fnielsen, Lucas_Werkmeister_WMDE, Denny, Lydia_Pintscher, jeblad, jhsoby, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org