[Wikidata-bugs] [Maniphest] T285156: Add termbox language code mul

Mahir256 Mon, 27 Sep 2021 12:40:59 -0700

Mahir256 added a comment.


  **//The following is a comment made in another forum regarding this ticket by 
User:Nikki,// ** who has allowed me to repost it here after some copy-editing 
in good faith:
  
  Regarding the fallback chain: English should fall back to mul too (as Mahir 
originally wrote), or otherwise we would have to duplicate everything from mul 
under English as well (and having everything **except** English fall back to 
mul has an icky "English is special" vibe).
  
  Regarding which script subtags to add: I think it would make sense to start 
with **only** mul and revisit whether (and which) script-specific codes would 
be useful later. I think there is a clear use case for a script-independent 
code which applies to any language (e.g. all the examples I provided are things 
which are by definition that string regardless of language—some use Latin 
characters but they're still valid for all languages, e.g. the ISO country code 
for Switzerland is still "CH" in Arabic or Russian, in the same way that the 
symbol for pi is π even in English) but it's less clear how useful individual 
script-specific codes would be.
  
  I did a bit of analysis and there are 521 language codes usable for labels. 
343 are for Latin, 55 for Cyrillic, 34 for Arabic, 16 for Devanagari, 10 for 
Traditional Chinese and 7 for Simplified Chinese. The other 36 scripts are 
associated with fewer than 5 language codes. 2/3 of all the the language codes 
are thus for Latin.
  
  Regarding mul-latn: I'm not sure what the point of having both mul and 
mul-latn would be. Doesn't that imply that there's a situation where mul would 
be different from mul-latn **and** that people would be specifically requesting 
mul and nothing else (since anything else would fall back to English and 
mul-latn first)?
  
  Regarding mul-cyrl and mul-arab: in theory there's enough language codes 
using those scripts that mul might make sense, but I looked up a bunch of 
leaders of countries which use Cyrillic or Arabic script (as items I thought 
would be likely to have pages in those languages) and the majority only had 3-5 
identical Cyrillic or Arabic labels, so I haven't yet found evidence that it's 
common for a lot of languages to share the same Cyrillic or Arabic label in 
practice.
  
  Regarding mul-deva: I didn't look into Devanagari, but I would be very 
surprised if the situation there is any different from mul-cyrl and mul-arab.
  
  Regarding mul-hans and mul-hant: I don't think there's any obvious benefit to 
having those. All of the Simplified/Traditional Chinese language codes are for 
languages in the zh macrolanguage. zh-classical and zh-yue shouldn't be used—a 
bot already replaces them with lzh and yue. zh, zh-cn, zh-my, zh-sg, gan-hans 
and wuu already fall back to zh-hans and gan, gan-hant, lzh, zh-hk, zh-mo and 
zh-tw already fall back to zh-hant. Only nan-hani and yue don't have any 
fallbacks defined…I don't know why yue doesn't, but nan-hani is a code added 
for Wikidata and **should** really fall back to nan or zh-hant. Either way, 
zh-hans/zh-hant are approximately equal to mul-hans/mul-hant and even if we 
define a distinction between them, I'm very sceptical that we could 
**maintain** a distinction.

TASK DETAIL
  https://phabricator.wikimedia.org/T285156

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mahir256
Cc: Ash_Crow, Moebeus, Lucas_Werkmeister_WMDE, So9q, Ainali, Epidosis, 
Shushugah, Manuel, Nikki, Mbch331, jhsoby, Amire80, Lydia_Pintscher, 
ChristianKl, Mahir256, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude

_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata-bugs] [Maniphest] T285156: Add termbox language code mul

Reply via email to