Re: Two Languages: One ISO-639-# code

2015-05-20 Thread toki
On 20/05/2015 20:01, Rob Weir wrote:

> 1. I have no idea what anyone in this thread is talking about, but it does 
> sound important.

It is about adding AOo support for minority languages that are
threatened, extinct, or dead.

Rephrased: Implementing full and complete support in AOo for languages
that have less than 1,000 (L1 + L2) users.

Unfortunately, a UI won't be available for most of those languages.  :(

jonathon



signature.asc
Description: OpenPGP digital signature


Re: Two Languages: One ISO-639-# code

2015-05-20 Thread Rob Weir
On Tue, May 19, 2015 at 7:14 AM, toki  wrote:

>
>
> On 19/05/2015 10:20, Dirk-Willem van Gulik wrote:
>
> > So in ISO 639-X the most accurate you can pinpoint it is xo and then xho.
> > And in glotolog; you have mpon1252 as its most precise denominator.
> >
> > Now as it *happens* - this language is spoken in an area fully covered
> by a single country - so you can use a 3166 as a country (-1, ZA) or (-2,
> ZA-EC, ZA-NL) region specifier; and then refine it.
> > As it happens that the region more or less maps to the language spoken
> there (and lets argue that in that region or country no other languages
> are spoken).
>
> However, Xhosa is currently included in AOo, and is spoken in the same
> country as mPondo. I _think_ that AOo currently uses ISO 3166-1 code
> (IE: ZA).
>
> >
> >> For a slightly different example, I give you Koine Greek and Attic Greek
> >> .
> >> Linguist-List codes them as grc-koi & grc-att, respectively.
> >> ISO 639-2 code is GRC. ISO 639-3 is GRC. No ISO 639-1 code.
> >>
> >> I wish all dialects/languages were as accommodating as:
> >> Gottolog lush1251
> >> ISO 639-1 none;
> >> ISO 639-2 none;
> >> ISO 639-3 LUT;
> >> ISO 639-3 SKA;
> >> ISO 639-3 SNO;
> >> ISO 639-3 SLH;
> >> (Note: AFAIK, there are no spell checkers or grammar checkers for those
> >> dialects, for any office suite.)
> >
> > So also good examples - and I think the same applies
> >
> > - you get broad specifiers on -1, -2 level.
> > - you may get granular specifiers in -3 and -5 for the rarer/older
> languages.
> > - for dialects and more refined pinpointing you hit the limits of
> 639(-5) and have
> >   two options; petition SIL/Library of Congress to add one (above
> examples are all in scope); or rely on glottolog.
> >
> > and
> >
> > - using regional coding; 3166; is not really helping you - as they
> do not define language.
>
> ISO 3166-2 & 3166-1 codes are useful for locales. Which is the
> difference between Xhosa, and mPondo. At least, if one accepts the legal
> fiction that the enclaves are part of KwaZulu, and not Eastern Cape, and
> also the debatable point that mPondo is either a distinct language or a
> dialect of Xhosa.
>
> I will grant that for the First Nation languages of Australia, ISO
> 3166-2 codes are not helpful, because the language changes at intervals
> of between five and twenty five miles. (One farm in either Northern
> Territories, or Western Australia can be the home of up to a dozen
> different First Nation languages.)
>
> > Pragmatically that means using an exact -3 if you have it (i.e. the
> exact language match);
>
> >relying on the nearest ‘above’ -5 language family identifier when there
> is no -3 match to be had; and ONLY in the -5 case add whatever you can,
> e.g. the glottolog identifier, to refine it.
>
> That helps with most minority languages. There are some that glottolog
> won't define a code for, on the grounds that they are, for all practical
> purposes, extinct.
>
> > or something along those lines. And discourage -1 and 3166 use; though
> permit it in :other if there is no glottolog entry
>
> That makes things easy.
>


Two things:

1. I have no idea what anyone in this thread is talking about, but it does
sound important.

2.  I am tremendously proud that we have such knowledge and talent in our
community helping us take care of i18n issues like this.

Thanks!

-Rob



>
> Now to delve into a couple of spelling and grammar checkers, and change
> them to those criteria.
>
> And then submit the RFEs for those language/locales.
>
> jonathon
>
>


Re: Two Languages: One ISO-639-# code

2015-05-19 Thread toki


On 19/05/2015 10:20, Dirk-Willem van Gulik wrote:

> So in ISO 639-X the most accurate you can pinpoint it is xo and then xho.
> And in glotolog; you have mpon1252 as its most precise denominator.
> 
> Now as it *happens* - this language is spoken in an area fully covered by a 
> single country - so you can use a 3166 as a country (-1, ZA) or (-2, ZA-EC, 
> ZA-NL) region specifier; and then refine it.
> As it happens that the region more or less maps to the language spoken
there (and lets argue that in that region or country no other languages
are spoken).

However, Xhosa is currently included in AOo, and is spoken in the same
country as mPondo. I _think_ that AOo currently uses ISO 3166-1 code
(IE: ZA).

> 
>> For a slightly different example, I give you Koine Greek and Attic Greek
>> .
>> Linguist-List codes them as grc-koi & grc-att, respectively.
>> ISO 639-2 code is GRC. ISO 639-3 is GRC. No ISO 639-1 code.
>>
>> I wish all dialects/languages were as accommodating as:
>> Gottolog lush1251
>> ISO 639-1 none;
>> ISO 639-2 none;
>> ISO 639-3 LUT;
>> ISO 639-3 SKA;
>> ISO 639-3 SNO;
>> ISO 639-3 SLH;
>> (Note: AFAIK, there are no spell checkers or grammar checkers for those
>> dialects, for any office suite.)
> 
> So also good examples - and I think the same applies
> 
> - you get broad specifiers on -1, -2 level.
> - you may get granular specifiers in -3 and -5 for the rarer/older 
> languages.
> - for dialects and more refined pinpointing you hit the limits of 639(-5) 
> and have
>   two options; petition SIL/Library of Congress to add one (above 
> examples are all in scope); or rely on glottolog.
> 
> and
> 
> - using regional coding; 3166; is not really helping you - as they do not 
> define language.

ISO 3166-2 & 3166-1 codes are useful for locales. Which is the
difference between Xhosa, and mPondo. At least, if one accepts the legal
fiction that the enclaves are part of KwaZulu, and not Eastern Cape, and
also the debatable point that mPondo is either a distinct language or a
dialect of Xhosa.

I will grant that for the First Nation languages of Australia, ISO
3166-2 codes are not helpful, because the language changes at intervals
of between five and twenty five miles. (One farm in either Northern
Territories, or Western Australia can be the home of up to a dozen
different First Nation languages.)

> Pragmatically that means using an exact -3 if you have it (i.e. the exact 
> language match); 

>relying on the nearest ‘above’ -5 language family identifier when there
is no -3 match to be had; and ONLY in the -5 case add whatever you can,
e.g. the glottolog identifier, to refine it.

That helps with most minority languages. There are some that glottolog
won't define a code for, on the grounds that they are, for all practical
purposes, extinct.

> or something along those lines. And discourage -1 and 3166 use; though permit 
> it in :other if there is no glottolog entry

That makes things easy.

Now to delve into a couple of spelling and grammar checkers, and change
them to those criteria.

And then submit the RFEs for those language/locales.

jonathon



signature.asc
Description: OpenPGP digital signature


Re: Two Languages: One ISO-639-# code

2015-05-19 Thread Dirk-Willem van Gulik

> On 19 May 2015, at 11:52, toki  wrote:
> 
> 
> 
> On 19/05/2015 08:05, Dirk-Willem van Gulik wrote:
> 
>>> In testing out various grammar and spell checkers, I've come across a
>>> couple of instances, where different languages/dialects share the same
>>> ISO-639-# code.
>> 
>> Can you give an example
> 
> ISO 639-1 is xo
> ISO 639-2 is xho
> ISO 639-3 is xho
> Glotolog is xhos1239
> ISO 3166-1 ZA / ZAF / 710
> ISO 3166-2 ZA-EC
> 
> and
> 
> ISO 639-1 is xo
> ISO 639-2 is xho
> ISO 639-3 is xho
> Glotolog is mpon1252
> ISO 3166-1 ZA / ZAF / 710
> ISO 3166-2 ZA-NL
> (Please skip the debate about whether or not the enclaves are KwaZulu,
> the Eastern Cape, or Lesotho.)

Ok - good examples. So the 639’s all map maps to

http://www.ethnologue.com/language/xho

2 map to the actual language in current use; 1 maps to the language families 
and group that xho and its dialects, like mpondo, belong to.

And:

-1 (xh equivalent)
-2 and -3: (xho)

to
http://www-01.sil.org/iso639-3/documentation.asp?id=xho

so the -1, -2 and -3 are equivalent. And 1:1 on xhos1239 in glotolog ? And -5 
is a white herring - it maps to the language families and group of xho 
languages.

Now as far as I can see - mpon1252 is a dialect (Mpondo) within xhos1239.

It has no entry of its own in -3 or within -5; so its closed is xo/xho/xho in 
-1, -2, -3; and it for sure belngs in -5 xho.

Or in otherwords; SIL.org  (or the US library of congress for -5) has not 
assigned it (yet).

So in ISO 639-X the most accurate you can pinpoint it is xo and then xho.

And in glotolog; you have mpon1252 as its most precise denominator.

Now as it *happens* - this language is spoken in an area fully covered by a 
single country - so you can use a 3166 as a country (-1, ZA) or (-2, ZA-EC, 
ZA-NL) region specifier; and then refine it. As it happens that the region more 
or less maps to the language spoken there (and lets argue that in that region 
or country no other languages are spoken).

> For a slightly different example, I give you Koine Greek and Attic Greek
> .
> Linguist-List codes them as grc-koi & grc-att, respectively.
> ISO 639-2 code is GRC. ISO 639-3 is GRC. No ISO 639-1 code.
> 
> I wish all dialects/languages were as accommodating as:
> Gottolog lush1251
> ISO 639-1 none;
> ISO 639-2 none;
> ISO 639-3 LUT;
> ISO 639-3 SKA;
> ISO 639-3 SNO;
> ISO 639-3 SLH;
> (Note: AFAIK, there are no spell checkers or grammar checkers for those
> dialects, for any office suite.)

So also good examples - and I think the same applies

-   you get broad specifiers on -1, -2 level.
-   you may get granular specifiers in -3 and -5 for the rarer/older 
languages.
-   for dialects and more refined pinpointing you hit the limits of 639(-5) 
and have
two options; petition SIL/Library of Congress to add one (above 
examples are all in scope); or rely on glottolog.

and

-   using regional coding; 3166; is not really helping you - as they do not 
define language.

Pragmatically that means using an exact -3 if you have it (i.e. the exact 
language match); relying on the nearest ‘above’ -5 language family identifier 
when there is no -3 match to be had; and ONLY in the -5 case add whatever you 
can, e.g. the glottolog identifier, to refine it.

And because -3 and -5 use similar identifiers for languages actually spoken 
(xho) and the language group (xho) to which mpo belongs; the identifier you 
expose should propably be something like


iso-639-3:lang  lang = alpha-3 language identifier
or
iso-639-5:langgroup[:other]
langgroup = alpha-3 language 
families and groups identifier
other = optional identifier; 
taken from glottlog when available.

or something along those lines. And discourage -1 and 3166 use; though permit 
it in :other if there is no glottolog entry

Dw.




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Two Languages: One ISO-639-# code

2015-05-19 Thread toki


On 19/05/2015 08:05, Dirk-Willem van Gulik wrote:

>> In testing out various grammar and spell checkers, I've come across a
>> couple of instances, where different languages/dialects share the sam
e
>> ISO-639-# code.
> 
> Can you give an example

ISO 639-1 is xo
ISO 639-2 is xho
ISO 639-3 is xho
Glotolog is xhos1239
ISO 3166-1 ZA / ZAF / 710
ISO 3166-2 ZA-EC

and

ISO 639-1 is xo
ISO 639-2 is xho
ISO 639-3 is xho
Glotolog is mpon1252
ISO 3166-1 ZA / ZAF / 710
ISO 3166-2 ZA-NL
(Please skip the debate about whether or not the enclaves are KwaZulu,
the Eastern Cape, or Lesotho.)

For a slightly different example, I give you Koine Greek and Attic Greek
.
Linguist-List codes them as grc-koi & grc-att, respectively.
ISO 639-2 code is GRC. ISO 639-3 is GRC. No ISO 639-1 code.

I wish all dialects/languages were as accommodating as:
Gottolog lush1251
ISO 639-1 none;
ISO 639-2 none;
ISO 639-3 LUT;
ISO 639-3 SKA;
ISO 639-3 SNO;
ISO 639-3 SLH;
(Note: AFAIK, there are no spell checkers or grammar checkers for those
dialects, for any office suite.)

jonathon



signature.asc
Description: OpenPGP digital signature


Re: Two Languages: One ISO-639-# code

2015-05-19 Thread Dirk-Willem van Gulik

> In testing out various grammar and spell checkers, I've come across a
> couple of instances, where different languages/dialects share the same
> ISO-639-# code.

Can you give an example - to understand this better ? Or do you mean collective 
-1 (e.g. zh) or -2 codes (e.g. chi or zho) v.s. -3 macro/individual codes that 
are in effect subcodes (zho, cmn, yue, nan) [ignoring the same language, 
different script stuff in some odd english islands, turkey, etc).

Dw.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Two Languages: One ISO-639-# code

2015-05-18 Thread toki
All:

In testing out various grammar and spell checkers, I've come across a
couple of instances, where different languages/dialects share the same
ISO code.

IOW:
The _current_ ISO 639-1, ISO 639-2, ISO 639-3, ISO 639-4, ISO 639-5, and
ISO 639-6 codes are the same. They do have different Glottolog Codes.

The only solutions I found from Google searches were:
* Use "User-1" for one language, "User-2" for the other language;
* Use a completely different language and locale for one language;

The issue with "User-#", is that it is no longer found in standard
 builds.

The issue with "use a completely different language", is that that
results in a language collision, when a user has to use both languages.

Question:
* What is the recommended practice for this type of situation.

###

Currently, this is an unusual case, but as the project extends into more
languages that are threatened, endangered, extinct, or dead, it will
become much more common.

###

I do have complete locale data for one or two of these conflicting
languages. However, since they share the same ISO 639-#, ISO 15924, and
 ISO 3166-1 Codes, I don't see how any program could
correctly differentiate between them. As a general rule, they do have
different ISO 3166-2 Codes.

ISO 3166-3 Codes are not of much use here, because they aren't old
enough for the languages that need them. (Chinese, Greek, and Hebrew,
amongst others.)

jonathon



signature.asc
Description: OpenPGP digital signature