mxn added a comment.
In T236593#8025472 <https://phabricator.wikimedia.org/T236593#8025472>, @AGutman-WMF wrote: > @mxn If these are purely orthographic variants (i.e. the pronunciation is the same) I would list them under a single lexeme. And in that case, the most natural way would be to list them as spelling variants rather than distinct forms. This assumption is only valid in an environment with purely phonetic/alphabetic writing systems. But in Chinese, two characters that are “spelled” distinctly but carry the same semantics and pronunciation would still have distinct lexemes. This also makes it possible to indicate that the two characters are pronounced similarly in one dialect but differently in another. //Chữ Nôm// is a Chinese-based writing system that adds a phonosemantic aspect. If not for its relationship to the //quốc ngữ// alphabet, every character would clearly get its own lexeme, just like in Chinese. Any similarity in pronunciation would be irrelevant, because this writing system makes finer semantic distinctions than any alphabet would. For example, the difference between 𬖾 and 頗 (both interchangeable written forms of //phở//) is that 𬖾 combines 頗 with the component 米 as a disambiguator, clarifying that it has to do with rice (because phở noodles are made of rice), as opposed to whatever 頗 originally meant in Chinese. This is only one of many possible ways in which characters may be used interchangeably but can carry different nuances. Yet all this is secondary to the fact that the two characters are equivalent to //phở//, which makes no such distinctions. To further illustrate the difficulty, if you look at a //quốc ngữ//–to–//chữ Nôm// dictionary and a //chữ Nôm//–to–//quốc ngữ// dictionary by the same author, the entries will not line up, just as there isn’t a one-to-one correspondence between the English-to-German and German-to-English halves of an English–German dictionary. If you look up “bỏ” in this dictionary <http://www.nomfoundation.org/nom-tools/Nom-Lookup-Tool/Nom-Lookup-Tool?uiLang=en>, you’ll get three characters from the source “vhn” corresponding to two different senses of //bỏ//. Any Vietnamese dictionary would have just one entry for these two senses of //bỏ//, because Vietnamese speakers no longer illustrate semantics in writing. If it is so important that forms not be used for orthographic variants of a non-alphabetic writing system, then the alternative approach would be to store the //quốc ngữ// and //chữ Nôm// representations in separate lexemes, as though they’re different languages. We could link individual //quốc ngữ// and //chữ Nôm// senses together as translations. This would be broadly consistent with the approach taken on every Wiktionary and render this ticket moot for Vietnamese, but it bends the definition of a language quite a bit. > To attach statements to specific variants, I believe that you can qualify statements using the "subject form <https://www.wikidata.org/wiki/Property:P5830>" property This is for statements on senses. If we somehow combine all the //Nôm// characters into a single form, then it would make sense to qualify sources and P5425 statements by a “applies to representation” property, but even this would get messy with compounds. > (although, aside, I must admit I don't understand the need for the "Han character in this lexeme" property; what novel information does it bring on top of the orthography itself?) Translingual data about a Han character is stored in an item. There’s a need to connect this translingual data to individual senses via language-specific forms. TASK DETAIL https://phabricator.wikimedia.org/T236593 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mxn Cc: AGutman-WMF, mxn, So9q, Ijon, daniel, Asaf, Mahir256, Danmichaelo, Fnielsen, Lucas_Werkmeister_WMDE, Denny, Lydia_Pintscher, jeblad, jhsoby, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org