mrephabricator added a comment.
This should not be done. ک in Urdu is ڪ in Sindhi, but Sindhi still has ک but uses it for a different sound. It is exceptional in this regard, so it would not be surprising for the "mul" label to be read as using ک to represent what it does more commonly. This would mean that a label in Sindhi could be identical to an Urdu one while representing a word that is meant to be pronounced distinctly from the Urdu one. This likely extends to most scripts. "W" and "v" are homophonous sounds to many users of Latin scripts. For example with Latin script, if we look at this item: https://www.wikidata.org/wiki/Q113450202 I have labeled this in English as "Waddi Punjabi Lughat" as this is how many South Asian English speakers and users of Latin script would be inclined to spell it. However, Vaddi Punjabi Lughat is the label I have used for Canadian, American, and British English because to speakers of these English dialects, the sound they would associate with "V" would be a closer match to the correct pronunciation. If I were to duplicate the label across dialects, this would be indicating the useful information that the "W" would be understood as a typical spelling in all of them, meaning that it would be reasonable for an American to pronounce "Waddi" like "water" even if this is not the "original" pronunciation. That makes duplicating the label an indicator of useful information which would not be clear otherwise. I think it is quite likely that people will use homoglyph letters as substitutes to get around this, or even unintentionally. For example, ڻ and ٹ are different letters which are associated with different sounds. However, they look identical in middle and initial positions. So if we have ڻڻڻ and ٹٹٹ, you would have a hard time telling what the first two letters are. There are lots of things we can fudge like this in various scripts and have it go unnoticed. Hawaii in the native language Hawaiian, which uses the Latin script, is spelled Hawaiʻi. If we write this as Hawai'i, using an apostrophe rather than the ʻokina character used for Polynesian languages in Latin script, we have now "duplicated" the string without using the same characters. Many would do this entirely unintentionally not knowing ʻokina is a different character, and then if someone wanted to correct the character in the termbox it is in, it would give an error. TASK DETAIL https://phabricator.wikimedia.org/T306918 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mrephabricator Cc: mrephabricator, Lucas_Werkmeister_WMDE, Lydia_Pintscher, Nikki, Mahir256, Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org