Re: [Wikitech-l] Language codes vs site codes

2012-05-09 Thread Siebrand Mazeland (WMF)
Hi Denny,

On Wed, May 9, 2012 at 3:36 AM, Denny Vrandečić
denny.vrande...@wikimedia.de wrote:
 (Siebrand, I am unsure if this will arrive at mediawiki-i18n, feel free to
 forward it you consider it interesting to them).

It does after the list admin approves it, but you may just want to subscribe[1].

 OK, I've written a few lines of Python [1] which actually helped me answer
 my questions. Sorry to bother.

 And the answers are yes, yes, no, but close, and i hope so.

No problem. We like people answering their own question. More time for
us to do other things :).

 There are a small number of wikis which use a different language code than
 their site code is, namely:

 crh - crh-latn
 als - gsw
 be-x-old - be-tarask
 roa-rup - rup
 simple - en

There are a few more, actually. See includes/DefaultSettings.php,
$wgDummyLanguageCodes.

 But, at the same time, the given *site* codes exist as *language* codes as
 well, i.e. the languages/messages files exist for them, but they just
 fallback to the given language code (i.e. MessagesAls.php just names gsw as
 a fallback).

That's an issue with the Wikimedia setup, I guess. If the language
code could be different from the subdomain name, that is what should
be done. It's probably not as simple as it looks. See next item.

 I would not be surprised if each of these five examples would have an
 anecdote to explain why they are the way they are :)

That, and a past in which there was less attention for trying to stick
to a particular standard. Doesn't really matter, we're stuck with it
for now, and should try to not make it worse, and fix it on the long
run. For the 5+ years that I'm involved in MediaWiki development,
there have been requests to rename wikis to more appropriate subdomain
names, but for some reason, no progress has been made on it yet. These
10+ requests are tracked in https://bugzilla.wikimedia.org/19986.

 P.S.: There is one thing I do not understand though. According
 to https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the
 language of simple.wp is en, but MessagesSimple.php seems to be taken into
 account (instead of edit it has change in the UI, one of only two
 changes in MessagesSimple to MessagesEn).

Thanks for mentioning. That shouldn't have been there. Fixed in
https://gerrit.wikimedia.org/r/#/c/7035/.

 So it seems that the language is
 simple -- why does it say en in the siteinfo?

Because it *is* English. It just should be English with a reduced
vocabulary. There have been many debates in the past over its
usefulness, and if possibly other languages should also get a simple
vocabulary Wikimedia project (and subdomain). This is just for
reference, please do not comment on this in this thread, but start a
new one if you'd wish to discuss simple language versions.

Not sure if this made your insight clearer, but at least I hope I was
able to add some details :).

[1] https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Cheers!

-- 
Siebrand Mazeland
Product Manager Localisation
Wikimedia Foundation

M: +31 6 50 69 1239
Skype: siebrand

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Language codes vs site codes

2012-05-09 Thread Denny Vrandečić
Thanks, the conversation indeed helped me!

Cheers,
Denny

2012/5/9 Siebrand Mazeland (WMF) smazel...@wikimedia.org

 Hi Denny,

 On Wed, May 9, 2012 at 3:36 AM, Denny Vrandečić
 denny.vrande...@wikimedia.de wrote:
  (Siebrand, I am unsure if this will arrive at mediawiki-i18n, feel free
 to
  forward it you consider it interesting to them).

 It does after the list admin approves it, but you may just want to
 subscribe[1].

  OK, I've written a few lines of Python [1] which actually helped me
 answer
  my questions. Sorry to bother.
 
  And the answers are yes, yes, no, but close, and i hope so.

 No problem. We like people answering their own question. More time for
 us to do other things :).

  There are a small number of wikis which use a different language code
 than
  their site code is, namely:
 
  crh - crh-latn
  als - gsw
  be-x-old - be-tarask
  roa-rup - rup
  simple - en

 There are a few more, actually. See includes/DefaultSettings.php,
 $wgDummyLanguageCodes.

  But, at the same time, the given *site* codes exist as *language* codes
 as
  well, i.e. the languages/messages files exist for them, but they just
  fallback to the given language code (i.e. MessagesAls.php just names gsw
 as
  a fallback).

 That's an issue with the Wikimedia setup, I guess. If the language
 code could be different from the subdomain name, that is what should
 be done. It's probably not as simple as it looks. See next item.

  I would not be surprised if each of these five examples would have an
  anecdote to explain why they are the way they are :)

 That, and a past in which there was less attention for trying to stick
 to a particular standard. Doesn't really matter, we're stuck with it
 for now, and should try to not make it worse, and fix it on the long
 run. For the 5+ years that I'm involved in MediaWiki development,
 there have been requests to rename wikis to more appropriate subdomain
 names, but for some reason, no progress has been made on it yet. These
 10+ requests are tracked in https://bugzilla.wikimedia.org/19986.

  P.S.: There is one thing I do not understand though. According
  to https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the
  language of simple.wp is en, but MessagesSimple.php seems to be taken
 into
  account (instead of edit it has change in the UI, one of only two
  changes in MessagesSimple to MessagesEn).

 Thanks for mentioning. That shouldn't have been there. Fixed in
 https://gerrit.wikimedia.org/r/#/c/7035/.

  So it seems that the language is
  simple -- why does it say en in the siteinfo?

 Because it *is* English. It just should be English with a reduced
 vocabulary. There have been many debates in the past over its
 usefulness, and if possibly other languages should also get a simple
 vocabulary Wikimedia project (and subdomain). This is just for
 reference, please do not comment on this in this thread, but start a
 new one if you'd wish to discuss simple language versions.

 Not sure if this made your insight clearer, but at least I hope I was
 able to add some details :).

 [1] https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

 Cheers!

 --
 Siebrand Mazeland
 Product Manager Localisation
 Wikimedia Foundation

 M: +31 6 50 69 1239
 Skype: siebrand

 Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate




-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Language codes vs site codes

2012-05-08 Thread Denny Vrandečić
Hi,

Wikimedia projects like Wikipedia exist for a large number of languages
(280, wow). Let us call them the sites. They are identified by the site
code used in the subdomain and in language and interwiki links, e.g. en
for en.wikipedia.org, i.e. the English Wikipedia.

MediaWiki has a number of interface languages. These are selected through
the preferences. They also have short language codes that identify them and
which are used, e.g. in the localization of the code. en is used for
English.

My four questions:
* are the language codes a proper superset of the site codes?
* if the code is the same in both cases, does it always refer to the same
language?
* does the Wikimedia project identified by a specific site code also always
use the same language code as its default interface language?
* are the answers to the previous three questions accidental or by design,
i.e. can we expect that this will stay like this in the future?

I hope that the answers are yes,yes,yes,yes :)

Sorry for the nitpicking questions, I hope someone knows the answers.

Cheers,
Denny

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Language codes vs site codes

2012-05-08 Thread Denny Vrandečić
(Siebrand, I am unsure if this will arrive at mediawiki-i18n, feel free to
forward it you consider it interesting to them).

OK, I've written a few lines of Python [1] which actually helped me answer
my questions. Sorry to bother.

And the answers are yes, yes, no, but close, and i hope so.

There are a small number of wikis which use a different language code than
their site code is, namely:

crh - crh-latn
als - gsw
be-x-old - be-tarask
roa-rup - rup
simple - en

But, at the same time, the given *site* codes exist as *language* codes as
well, i.e. the languages/messages files exist for them, but they just
fallback to the given language code (i.e. MessagesAls.php just names gsw as
a fallback).

I would not be surprised if each of these five examples would have an
anecdote to explain why they are the way they are :)

Thanks,
Denny

P.S.: There is one thing I do not understand though. According to
https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the
language of simple.wp is en, but MessagesSimple.php seems to be taken
into account (instead of edit it has change in the UI, one of only two
changes in MessagesSimple to MessagesEn). So it seems that the language is
simple -- why does it say en in the siteinfo?


[1] available here: http://pastebin.com/JpApSmNX

2012/5/9 Siebrand Mazeland s.mazel...@xs4all.nl

 Forwarded. Please cc Denny,  as he is not on this list.

 Begin doorgestuurd bericht:

 *Van:* Denny Vrandečić denny.vrande...@wikimedia.de
 *Datum:* 9 mei 2012 01:36:01 GMT+02:00
 *Aan:* MediaWiki Tech list wikitech-l@lists.wikimedia.org
 *Onderwerp:* *[Wikitech-l] Language codes vs site codes*
 *Antwoord aan:* Wikimedia developers wikitech-l@lists.wikimedia.org

 Hi,

 Wikimedia projects like Wikipedia exist for a large number of languages
 (280, wow). Let us call them the sites. They are identified by the site
 code used in the subdomain and in language and interwiki links, e.g. en
 for en.wikipedia.org, i.e. the English Wikipedia.

 MediaWiki has a number of interface languages. These are selected through
 the preferences. They also have short language codes that identify them and
 which are used, e.g. in the localization of the code. en is used for
 English.

 My four questions:
 * are the language codes a proper superset of the site codes?
 * if the code is the same in both cases, does it always refer to the same
 language?
 * does the Wikimedia project identified by a specific site code also always
 use the same language code as its default interface language?
 * are the answers to the previous three questions accidental or by design,
 i.e. can we expect that this will stay like this in the future?

 I hope that the answers are yes,yes,yes,yes :)

 Sorry for the nitpicking questions, I hope someone knows the answers.

 Cheers,
 Denny

 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de




-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 2 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Language codes vs site codes

2012-05-08 Thread Roan Kattouw
On Tue, May 8, 2012 at 6:36 PM, Denny Vrandečić
denny.vrande...@wikimedia.de wrote:
 P.S.: There is one thing I do not understand though. According to
 https://simple.wikipedia.org/w/api.php?action=querymeta=siteinfo the
 language of simple.wp is en, but MessagesSimple.php seems to be taken
 into account (instead of edit it has change in the UI, one of only two
 changes in MessagesSimple to MessagesEn). So it seems that the language is
 simple -- why does it say en in the siteinfo?

Are you sure this isn't just because someone edited
[[MediaWiki:whatever]] on simplewiki?

Roan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l