Re: Linux: UTF8 text file fed to opennlp comes back as ANSI

György Chityil Wed, 12 Oct 2011 01:38:14 -0700

Thanks Jörn, I checked, but it is utf.

Attaching a simple one liner utf sample. Are you succeeding with utf encoded
files and special utf chars in your file for the sentence detection tool?
For example German has umlauts that are utf as I understand.
I have been using this with the English sentence detection tool, but I doubt
this should have any effect on the output encoding.


On Wed, Oct 12, 2011 at 10:32 AM, Jörn Kottmann <kottm...@gmail.com> wrote:

> I am not sure what this output means. I get something which looks
> more or less the same on my MacBook.
>
> Maybe your input file is not encoded in UTF-8?
>
> Jörn
>
>
> On 10/10/11 8:42 PM, György Chityil wrote:
>
>> Hello Jörn,
>>>
>> I was unable to find default encoding, somewere I read for linux it is
>> utf8
>>
>> This is what I get when I type locale in ssh bash:
>>
>>>
>>> -bash-3.2$ locale
>>> LANG=C
>>> LC_CTYPE="C"
>>> LC_NUMERIC="C"
>>> LC_TIME="C"
>>> LC_COLLATE="C"
>>> LC_MONETARY="C"
>>> LC_MESSAGES="C"
>>> LC_PAPER="C"
>>> LC_NAME="C"
>>> LC_ADDRESS="C"
>>> LC_TELEPHONE="C"
>>> LC_MEASUREMENT="C"
>>> LC_IDENTIFICATION="C"
>>> LC_ALL=
>>> -bash-3.2$
>>>
>>>
>


-- 
Gyuri
274 44 98
06 30 5888 744

Sz??val... ??n itt ??ltem.. ez meg egyszercsak felrobbant!

Szóval... én itt ültem.. ez meg egyszercsak felrobbant!

Re: Linux: UTF8 text file fed to opennlp comes back as ANSI

Reply via email to