+1

On Tue, May 17, 2011 at 3:39 PM, Jörn Kottmann <[email protected]> wrote:

> I can see that, so switching the language codes I think should be something
> that should be done when we do bigger changes anyway. Maybe for 1.6
> together
> with a switch to opennlp-ml and maybe bigger changes in our feature
> generation
> code.
>
> Jörn
>
>
> On 5/17/11 10:32 PM, Benson Margulies wrote:
>
>> there are important distinctions missing in the twos. Farsi / Dari/
>> etc and others.
>>
>> On May 17, 2011, at 4:25 PM, "Jörn Kottmann"<[email protected]>  wrote:
>>
>>  Is there support for -3 in java? Currently all we do is a check that the
>>> language is
>>> a valid 2 letter code. The idea was when we added it that we will be able
>>> to have language dependent feature generation one day, but up to today we
>>> only do something special in the sentence detector for thai.
>>>
>>> Jörn
>>>
>>> On 5/17/11 8:50 PM, Benson Margulies wrote:
>>>
>>>> -2 is pretty useless. Use -3 if you want to switch.
>>>>
>>>> On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov<[email protected]>
>>>> wrote:
>>>>
>>>>> My two cents, tesseract-ocr also uses ISO 639-3 and it would be great
>>>>> for
>>>>> those who builds the solutions such as openNLP + tesseract.
>>>>>
>>>>> -Oleg
>>>>>
>>>>> On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge
>>>>> <[email protected]>wrote:
>>>>>
>>>>>  I think we should change to the three character convention for
>>>>>> language
>>>>>> specific materials, e.g. "eng" rather than "en" for English.
>>>>>>
>>>>>> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
>>>>>>
>>>>>> Do others agree?
>>>>>>
>>>>>> --
>>>>>> Jason Baldridge
>>>>>> Assistant Professor, Department of Linguistics
>>>>>> The University of Texas at Austin
>>>>>> http://www.jasonbaldridge.com
>>>>>> http://twitter.com/jasonbaldridge
>>>>>>
>>>>>>
>


-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to