[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

mxn Fri, 24 Jun 2022 12:09:05 -0700

mxn added a comment.

  In T236593#8025472 <https://phabricator.wikimedia.org/T236593#8025472>, 
@AGutman-WMF wrote:

  > @mxn If these are purely orthographic variants (i.e. the pronunciation is 
the same) I would list them under a single lexeme. And in that case, the most 
natural way would be to list them as spelling variants rather than distinct 
forms.

  This assumption is only valid in an environment with purely 
phonetic/alphabetic writing systems. But in Chinese, two characters that are 
“spelled” distinctly but carry the same semantics and pronunciation would still 
have distinct lexemes. This also makes it possible to indicate that the two 
characters are pronounced similarly in one dialect but differently in another.

  //Chữ Nôm// is a Chinese-based writing system that adds a phonosemantic 
aspect. If not for its relationship to the //quốc ngữ// alphabet, every 
character would clearly get its own lexeme, just like in Chinese. Any 
similarity in pronunciation would be irrelevant, because this writing system 
makes finer semantic distinctions than any alphabet would. For example, the 
difference between 𬖾 and 頗 (both interchangeable written forms of //phở//) is 
that 𬖾 combines 頗 with the component 米 as a disambiguator, clarifying that it 
has to do with rice (because phở noodles are made of rice), as opposed to 
whatever 頗 originally meant in Chinese. This is only one of many possible ways 
in which characters may be used interchangeably but can carry different 
nuances. Yet all this is secondary to the fact that the two characters are 
equivalent to //phở//, which makes no such distinctions.

  To further illustrate the difficulty, if you look at a //quốc ngữ//–to–//chữ 
Nôm// dictionary and a //chữ Nôm//–to–//quốc ngữ// dictionary by the same 
author, the entries will not line up, just as there isn’t a one-to-one 
correspondence between the English-to-German and German-to-English halves of an 
English–German dictionary. If you look up “bỏ” in this dictionary 
<http://www.nomfoundation.org/nom-tools/Nom-Lookup-Tool/Nom-Lookup-Tool?uiLang=en>,
 you’ll get three characters from the source “vhn” corresponding to two 
different senses of //bỏ//. Any Vietnamese dictionary would have just one entry 
for these two senses of //bỏ//, because Vietnamese speakers no longer 
illustrate semantics in writing.

  If it is so important that forms not be used for orthographic variants of a 
non-alphabetic writing system, then the alternative approach would be to store 
the //quốc ngữ// and //chữ Nôm// representations in separate lexemes, as though 
they’re different languages. We could link individual //quốc ngữ// and //chữ 
Nôm// senses together as translations. This would be broadly consistent with 
the approach taken on every Wiktionary and render this ticket moot for 
Vietnamese, but it bends the definition of a language quite a bit.

  > To attach statements to specific variants,  I believe that you can qualify 
statements using the "subject form 
<https://www.wikidata.org/wiki/Property:P5830>" property

  This is for statements on senses. If we somehow combine all the //Nôm// 
characters into a single form, then it would make sense to qualify sources and 
P5425 statements by a “applies to representation” property, but even this would 
get messy with compounds.

  > (although, aside, I must admit I don't understand the need for the "Han 
character in this lexeme" property; what novel information does it bring on top 
of the orthography itself?)

  Translingual data about a Han character is stored in an item. There’s a need 
to connect this translingual data to individual senses via language-specific 
forms.

TASK DETAIL
  https://phabricator.wikimedia.org/T236593

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mxn
Cc: AGutman-WMF, mxn, So9q, Ijon, daniel, Asaf, Mahir256, Danmichaelo, 
Fnielsen, Lucas_Werkmeister_WMDE, Denny, Lydia_Pintscher, jeblad, jhsoby, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Bodhisattwa, Scott_WUaS, Wikidata-bugs, aude, Mbch331

_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

Reply via email to