Mahir256 added a comment.
**//The following is a comment made in another forum regarding this ticket by User:Nikki,// ** who has allowed me to repost it here after some copy-editing in good faith: Regarding the fallback chain: English should fall back to mul too (as Mahir originally wrote), or otherwise we would have to duplicate everything from mul under English as well (and having everything **except** English fall back to mul has an icky "English is special" vibe). Regarding which script subtags to add: I think it would make sense to start with **only** mul and revisit whether (and which) script-specific codes would be useful later. I think there is a clear use case for a script-independent code which applies to any language (e.g. all the examples I provided are things which are by definition that string regardless of language—some use Latin characters but they're still valid for all languages, e.g. the ISO country code for Switzerland is still "CH" in Arabic or Russian, in the same way that the symbol for pi is π even in English) but it's less clear how useful individual script-specific codes would be. I did a bit of analysis and there are 521 language codes usable for labels. 343 are for Latin, 55 for Cyrillic, 34 for Arabic, 16 for Devanagari, 10 for Traditional Chinese and 7 for Simplified Chinese. The other 36 scripts are associated with fewer than 5 language codes. 2/3 of all the the language codes are thus for Latin. Regarding mul-latn: I'm not sure what the point of having both mul and mul-latn would be. Doesn't that imply that there's a situation where mul would be different from mul-latn **and** that people would be specifically requesting mul and nothing else (since anything else would fall back to English and mul-latn first)? Regarding mul-cyrl and mul-arab: in theory there's enough language codes using those scripts that mul might make sense, but I looked up a bunch of leaders of countries which use Cyrillic or Arabic script (as items I thought would be likely to have pages in those languages) and the majority only had 3-5 identical Cyrillic or Arabic labels, so I haven't yet found evidence that it's common for a lot of languages to share the same Cyrillic or Arabic label in practice. Regarding mul-deva: I didn't look into Devanagari, but I would be very surprised if the situation there is any different from mul-cyrl and mul-arab. Regarding mul-hans and mul-hant: I don't think there's any obvious benefit to having those. All of the Simplified/Traditional Chinese language codes are for languages in the zh macrolanguage. zh-classical and zh-yue shouldn't be used—a bot already replaces them with lzh and yue. zh, zh-cn, zh-my, zh-sg, gan-hans and wuu already fall back to zh-hans and gan, gan-hant, lzh, zh-hk, zh-mo and zh-tw already fall back to zh-hant. Only nan-hani and yue don't have any fallbacks defined…I don't know why yue doesn't, but nan-hani is a code added for Wikidata and **should** really fall back to nan or zh-hant. Either way, zh-hans/zh-hant are approximately equal to mul-hans/mul-hant and even if we define a distinction between them, I'm very sceptical that we could **maintain** a distinction. TASK DETAIL https://phabricator.wikimedia.org/T285156 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Mahir256 Cc: Ash_Crow, Moebeus, Lucas_Werkmeister_WMDE, So9q, Ainali, Epidosis, Shushugah, Manuel, Nikki, Mbch331, jhsoby, Amire80, Lydia_Pintscher, ChristianKl, Mahir256, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
