[Bug 42396] duplicate/invalid language codes

bugzilla-daemon Sun, 11 Aug 2013 22:07:50 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=42396


Nemo <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]
           See Also|                            |https://bugzilla.wikimedia.
                   |                            |org/show_bug.cgi?id=37459,
                   |                            |https://bugzilla.wikimedia.
                   |                            |org/show_bug.cgi?id=37754

--- Comment #1 from Nemo <[email protected]> ---
This bug is probably too general to be useful (perhaps transform into a
tracking bug?), but as we have another equally general report let me copy it
here:

----

Small update: I went through the language list at

https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py#L472

and added a number of TODOs to the most obvious problematic cases. Typical
problems are:

* Malformed language codes ('tokipona')
* Correctly formed language codes without any official meaning (e.g.,
'cbk-zam')
* Correctly formed codes with the wrong meaning (e.g., 'sr-ec': Serbian from
Ecuador?!)
* Language codes with redundant information (e.g., 'kk-cyrl' should be the same
as 'kk' according to IANA, but we have both)
* Use of macrolanguages instead of languages (e.g., "zh" is not "Mandarin" but
just "Chinese"; I guess we mean Mandarin; less sure about Kurdish ...)
* Language codes with incomplete information (e.g., "sr" should be "sr-Cyrl" or
"sr-Latn", both of which already exist; same for "zh" and "zh-Hans"/"zh-Hant",
but also for "zh-HK" [is this simplified or traditional?]). 

----

Small update: I went through the language list at

https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py#L472

and added a number of TODOs to the most obvious problematic cases. Typical
problems are:

* Malformed language codes ('tokipona')
* Correctly formed language codes without any official meaning (e.g.,
'cbk-zam')
* Correctly formed codes with the wrong meaning (e.g., 'sr-ec': Serbian from
Ecuador?!)
* Language codes with redundant information (e.g., 'kk-cyrl' should be the same
as 'kk' according to IANA, but we have both)
* Use of macrolanguages instead of languages (e.g., "zh" is not "Mandarin" but
just "Chinese"; I guess we mean Mandarin; less sure about Kurdish ...)
* Language codes with incomplete information (e.g., "sr" should be "sr-Cyrl" or
"sr-Latn", both of which already exist; same for "zh" and "zh-Hans"/"zh-Hant",
but also for "zh-HK" [is this simplified or traditional?]).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 42396] duplicate/invalid language codes

Reply via email to