+1 On Tue, May 17, 2011 at 3:39 PM, Jörn Kottmann <[email protected]> wrote:
> I can see that, so switching the language codes I think should be something > that should be done when we do bigger changes anyway. Maybe for 1.6 > together > with a switch to opennlp-ml and maybe bigger changes in our feature > generation > code. > > Jörn > > > On 5/17/11 10:32 PM, Benson Margulies wrote: > >> there are important distinctions missing in the twos. Farsi / Dari/ >> etc and others. >> >> On May 17, 2011, at 4:25 PM, "Jörn Kottmann"<[email protected]> wrote: >> >> Is there support for -3 in java? Currently all we do is a check that the >>> language is >>> a valid 2 letter code. The idea was when we added it that we will be able >>> to have language dependent feature generation one day, but up to today we >>> only do something special in the sentence detector for thai. >>> >>> Jörn >>> >>> On 5/17/11 8:50 PM, Benson Margulies wrote: >>> >>>> -2 is pretty useless. Use -3 if you want to switch. >>>> >>>> On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov<[email protected]> >>>> wrote: >>>> >>>>> My two cents, tesseract-ocr also uses ISO 639-3 and it would be great >>>>> for >>>>> those who builds the solutions such as openNLP + tesseract. >>>>> >>>>> -Oleg >>>>> >>>>> On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge >>>>> <[email protected]>wrote: >>>>> >>>>> I think we should change to the three character convention for >>>>>> language >>>>>> specific materials, e.g. "eng" rather than "en" for English. >>>>>> >>>>>> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes >>>>>> >>>>>> Do others agree? >>>>>> >>>>>> -- >>>>>> Jason Baldridge >>>>>> Assistant Professor, Department of Linguistics >>>>>> The University of Texas at Austin >>>>>> http://www.jasonbaldridge.com >>>>>> http://twitter.com/jasonbaldridge >>>>>> >>>>>> > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
