Re: switch to ISO 639-2 codes for languages?

Benson Margulies Tue, 17 May 2011 13:32:50 -0700

there are important distinctions missing in the twos. Farsi / Dari/
etc and others.


On May 17, 2011, at 4:25 PM, "Jörn Kottmann" <[email protected]> wrote:

> Is there support for -3 in java? Currently all we do is a check that the
> language is
> a valid 2 letter code. The idea was when we added it that we will be able
> to have language dependent feature generation one day, but up to today we
> only do something special in the sentence detector for thai.
>
> Jörn
>
> On 5/17/11 8:50 PM, Benson Margulies wrote:
>> -2 is pretty useless. Use -3 if you want to switch.
>>
>> On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov<[email protected]>  wrote:
>>> My two cents, tesseract-ocr also uses ISO 639-3 and it would be great for
>>> those who builds the solutions such as openNLP + tesseract.
>>>
>>> -Oleg
>>>
>>> On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge
>>> <[email protected]>wrote:
>>>
>>>> I think we should change to the three character convention for language
>>>> specific materials, e.g. "eng" rather than "en" for English.
>>>>
>>>> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
>>>>
>>>> Do others agree?
>>>>
>>>> --
>>>> Jason Baldridge
>>>> Assistant Professor, Department of Linguistics
>>>> The University of Texas at Austin
>>>> http://www.jasonbaldridge.com
>>>> http://twitter.com/jasonbaldridge
>>>>
>

Re: switch to ISO 639-2 codes for languages?

Reply via email to