Re: [Wikitech-l] Language codes vs site codes
Hi Denny, On Wed, May 9, 2012 at 3:36 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: (Siebrand, I am unsure if this will arrive at mediawiki-i18n, feel free to forward it you consider it interesting to them). It does after the list admin approves it, but you may just want to subscribe[1]. OK, I've written a few lines of Python [1] which actually helped me answer my questions. Sorry to bother. And the answers are yes, yes, no, but close, and i hope so. No problem. We like people answering their own question. More time for us to do other things :). There are a small number of wikis which use a different language code than their site code is, namely: crh - crh-latn als - gsw be-x-old - be-tarask roa-rup - rup simple - en There are a few more, actually. See includes/DefaultSettings.php, $wgDummyLanguageCodes. But, at the same time, the given *site* codes exist as *language* codes as well, i.e. the languages/messages files exist for them, but they just fallback to the given language code (i.e. MessagesAls.php just names gsw as a fallback). That's an issue with the Wikimedia setup, I guess. If the language code could be different from the subdomain name, that is what should be done. It's probably not as simple as it looks. See next item. I would not be surprised if each of these five examples would have an anecdote to explain why they are the way they are :) That, and a past in which there was less attention for trying to stick to a particular standard. Doesn't really matter, we're stuck with it for now, and should try to not make it worse, and fix it on the long run. For the 5+ years that I'm involved in MediaWiki development, there have been requests to rename wikis to more appropriate subdomain names, but for some reason, no progress has been made on it yet. These 10+ requests are tracked in https://bugzilla.wikimedia.org/19986. P.S.: There is one thing I do not understand though. According to https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the language of simple.wp is en, but MessagesSimple.php seems to be taken into account (instead of edit it has change in the UI, one of only two changes in MessagesSimple to MessagesEn). Thanks for mentioning. That shouldn't have been there. Fixed in https://gerrit.wikimedia.org/r/#/c/7035/. So it seems that the language is simple -- why does it say en in the siteinfo? Because it *is* English. It just should be English with a reduced vocabulary. There have been many debates in the past over its usefulness, and if possibly other languages should also get a simple vocabulary Wikimedia project (and subdomain). This is just for reference, please do not comment on this in this thread, but start a new one if you'd wish to discuss simple language versions. Not sure if this made your insight clearer, but at least I hope I was able to add some details :). [1] https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n Cheers! -- Siebrand Mazeland Product Manager Localisation Wikimedia Foundation M: +31 6 50 69 1239 Skype: siebrand Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language codes vs site codes
Thanks, the conversation indeed helped me! Cheers, Denny 2012/5/9 Siebrand Mazeland (WMF) smazel...@wikimedia.org Hi Denny, On Wed, May 9, 2012 at 3:36 AM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: (Siebrand, I am unsure if this will arrive at mediawiki-i18n, feel free to forward it you consider it interesting to them). It does after the list admin approves it, but you may just want to subscribe[1]. OK, I've written a few lines of Python [1] which actually helped me answer my questions. Sorry to bother. And the answers are yes, yes, no, but close, and i hope so. No problem. We like people answering their own question. More time for us to do other things :). There are a small number of wikis which use a different language code than their site code is, namely: crh - crh-latn als - gsw be-x-old - be-tarask roa-rup - rup simple - en There are a few more, actually. See includes/DefaultSettings.php, $wgDummyLanguageCodes. But, at the same time, the given *site* codes exist as *language* codes as well, i.e. the languages/messages files exist for them, but they just fallback to the given language code (i.e. MessagesAls.php just names gsw as a fallback). That's an issue with the Wikimedia setup, I guess. If the language code could be different from the subdomain name, that is what should be done. It's probably not as simple as it looks. See next item. I would not be surprised if each of these five examples would have an anecdote to explain why they are the way they are :) That, and a past in which there was less attention for trying to stick to a particular standard. Doesn't really matter, we're stuck with it for now, and should try to not make it worse, and fix it on the long run. For the 5+ years that I'm involved in MediaWiki development, there have been requests to rename wikis to more appropriate subdomain names, but for some reason, no progress has been made on it yet. These 10+ requests are tracked in https://bugzilla.wikimedia.org/19986. P.S.: There is one thing I do not understand though. According to https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the language of simple.wp is en, but MessagesSimple.php seems to be taken into account (instead of edit it has change in the UI, one of only two changes in MessagesSimple to MessagesEn). Thanks for mentioning. That shouldn't have been there. Fixed in https://gerrit.wikimedia.org/r/#/c/7035/. So it seems that the language is simple -- why does it say en in the siteinfo? Because it *is* English. It just should be English with a reduced vocabulary. There have been many debates in the past over its usefulness, and if possibly other languages should also get a simple vocabulary Wikimedia project (and subdomain). This is just for reference, please do not comment on this in this thread, but start a new one if you'd wish to discuss simple language versions. Not sure if this made your insight clearer, but at least I hope I was able to add some details :). [1] https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n Cheers! -- Siebrand Mazeland Product Manager Localisation Wikimedia Foundation M: +31 6 50 69 1239 Skype: siebrand Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Language codes vs site codes
Hi, Wikimedia projects like Wikipedia exist for a large number of languages (280, wow). Let us call them the sites. They are identified by the site code used in the subdomain and in language and interwiki links, e.g. en for en.wikipedia.org, i.e. the English Wikipedia. MediaWiki has a number of interface languages. These are selected through the preferences. They also have short language codes that identify them and which are used, e.g. in the localization of the code. en is used for English. My four questions: * are the language codes a proper superset of the site codes? * if the code is the same in both cases, does it always refer to the same language? * does the Wikimedia project identified by a specific site code also always use the same language code as its default interface language? * are the answers to the previous three questions accidental or by design, i.e. can we expect that this will stay like this in the future? I hope that the answers are yes,yes,yes,yes :) Sorry for the nitpicking questions, I hope someone knows the answers. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language codes vs site codes
(Siebrand, I am unsure if this will arrive at mediawiki-i18n, feel free to forward it you consider it interesting to them). OK, I've written a few lines of Python [1] which actually helped me answer my questions. Sorry to bother. And the answers are yes, yes, no, but close, and i hope so. There are a small number of wikis which use a different language code than their site code is, namely: crh - crh-latn als - gsw be-x-old - be-tarask roa-rup - rup simple - en But, at the same time, the given *site* codes exist as *language* codes as well, i.e. the languages/messages files exist for them, but they just fallback to the given language code (i.e. MessagesAls.php just names gsw as a fallback). I would not be surprised if each of these five examples would have an anecdote to explain why they are the way they are :) Thanks, Denny P.S.: There is one thing I do not understand though. According to https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the language of simple.wp is en, but MessagesSimple.php seems to be taken into account (instead of edit it has change in the UI, one of only two changes in MessagesSimple to MessagesEn). So it seems that the language is simple -- why does it say en in the siteinfo? [1] available here: http://pastebin.com/JpApSmNX 2012/5/9 Siebrand Mazeland s.mazel...@xs4all.nl Forwarded. Please cc Denny, as he is not on this list. Begin doorgestuurd bericht: *Van:* Denny Vrandečić denny.vrande...@wikimedia.de *Datum:* 9 mei 2012 01:36:01 GMT+02:00 *Aan:* MediaWiki Tech list wikitech-l@lists.wikimedia.org *Onderwerp:* *[Wikitech-l] Language codes vs site codes* *Antwoord aan:* Wikimedia developers wikitech-l@lists.wikimedia.org Hi, Wikimedia projects like Wikipedia exist for a large number of languages (280, wow). Let us call them the sites. They are identified by the site code used in the subdomain and in language and interwiki links, e.g. en for en.wikipedia.org, i.e. the English Wikipedia. MediaWiki has a number of interface languages. These are selected through the preferences. They also have short language codes that identify them and which are used, e.g. in the localization of the code. en is used for English. My four questions: * are the language codes a proper superset of the site codes? * if the code is the same in both cases, does it always refer to the same language? * does the Wikimedia project identified by a specific site code also always use the same language code as its default interface language? * are the answers to the previous three questions accidental or by design, i.e. can we expect that this will stay like this in the future? I hope that the answers are yes,yes,yes,yes :) Sorry for the nitpicking questions, I hope someone knows the answers. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Language codes vs site codes
On Tue, May 8, 2012 at 6:36 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: P.S.: There is one thing I do not understand though. According to https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the language of simple.wp is en, but MessagesSimple.php seems to be taken into account (instead of edit it has change in the UI, one of only two changes in MessagesSimple to MessagesEn). So it seems that the language is simple -- why does it say en in the siteinfo? Are you sure this isn't just because someone edited [[MediaWiki:whatever]] on simplewiki? Roan ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l