Re: switch to ISO 639-2 codes for languages?

Jörn Kottmann Tue, 17 May 2011 13:39:47 -0700

I can see that, so switching the language codes I think should be something
that should be done when we do bigger changes anyway. Maybe for 1.6 together

with a switch to opennlp-ml and maybe bigger changes in our featuregeneration

code.


Jörn

On 5/17/11 10:32 PM, Benson Margulies wrote:

there are important distinctions missing in the twos. Farsi / Dari/
etc and others.

On May 17, 2011, at 4:25 PM, "Jörn Kottmann"<[email protected]>  wrote:

Is there support for -3 in java? Currently all we do is a check that the
language is
a valid 2 letter code. The idea was when we added it that we will be able
to have language dependent feature generation one day, but up to today we
only do something special in the sentence detector for thai.

Jörn

On 5/17/11 8:50 PM, Benson Margulies wrote:

-2 is pretty useless. Use -3 if you want to switch.

On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov<[email protected]>   wrote:

My two cents, tesseract-ocr also uses ISO 639-3 and it would be great for
those who builds the solutions such as openNLP + tesseract.

-Oleg

On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge
<[email protected]>wrote:

I think we should change to the three character convention for language
specific materials, e.g. "eng" rather than "en" for English.

http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

Do others agree?

--
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: switch to ISO 639-2 codes for languages?

Reply via email to