there are important distinctions missing in the twos. Farsi / Dari/ etc and others.
On May 17, 2011, at 4:25 PM, "Jörn Kottmann" <[email protected]> wrote: > Is there support for -3 in java? Currently all we do is a check that the > language is > a valid 2 letter code. The idea was when we added it that we will be able > to have language dependent feature generation one day, but up to today we > only do something special in the sentence detector for thai. > > Jörn > > On 5/17/11 8:50 PM, Benson Margulies wrote: >> -2 is pretty useless. Use -3 if you want to switch. >> >> On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov<[email protected]> wrote: >>> My two cents, tesseract-ocr also uses ISO 639-3 and it would be great for >>> those who builds the solutions such as openNLP + tesseract. >>> >>> -Oleg >>> >>> On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge >>> <[email protected]>wrote: >>> >>>> I think we should change to the three character convention for language >>>> specific materials, e.g. "eng" rather than "en" for English. >>>> >>>> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes >>>> >>>> Do others agree? >>>> >>>> -- >>>> Jason Baldridge >>>> Assistant Professor, Department of Linguistics >>>> The University of Texas at Austin >>>> http://www.jasonbaldridge.com >>>> http://twitter.com/jasonbaldridge >>>> >
