Jorn,

I found this article and several other references on how to fix these. 
We may need to refactor the general output to the same encoding as the
input files to fix this on the terminal.

http://hints.macworld.com/article.php?story=20050208053951714

James

On 10/12/2011 4:37 AM, György Chityil wrote:
> Thanks Jörn, I checked, but it is utf.
>
> Attaching a simple one liner utf sample. Are you succeeding with utf
> encoded files and special utf chars in your file for the sentence
> detection tool? For example German has umlauts that are utf as I
> understand.
> I have been using this with the English sentence detection tool, but I
> doubt this should have any effect on the output encoding.
>
> On Wed, Oct 12, 2011 at 10:32 AM, Jörn Kottmann <kottm...@gmail.com
> <mailto:kottm...@gmail.com>> wrote:
>
>     I am not sure what this output means. I get something which looks
>     more or less the same on my MacBook.
>
>     Maybe your input file is not encoded in UTF-8?
>
>     Jörn
>
>
>     On 10/10/11 8:42 PM, György Chityil wrote:
>
>             Hello Jörn,
>
>         I was unable to find default encoding, somewere I read for
>         linux it is utf8
>
>         This is what I get when I type locale in ssh bash:
>
>
>             -bash-3.2$ locale
>             LANG=C
>             LC_CTYPE="C"
>             LC_NUMERIC="C"
>             LC_TIME="C"
>             LC_COLLATE="C"
>             LC_MONETARY="C"
>             LC_MESSAGES="C"
>             LC_PAPER="C"
>             LC_NAME="C"
>             LC_ADDRESS="C"
>             LC_TELEPHONE="C"
>             LC_MEASUREMENT="C"
>             LC_IDENTIFICATION="C"
>             LC_ALL=
>             -bash-3.2$
>
>
>
>
>
> -- 
> Gyuri
> 274 44 98
> 06 30 5888 744
>

Reply via email to