Thanks Jörn, I checked, but it is utf. Attaching a simple one liner utf sample. Are you succeeding with utf encoded files and special utf chars in your file for the sentence detection tool? For example German has umlauts that are utf as I understand. I have been using this with the English sentence detection tool, but I doubt this should have any effect on the output encoding.
On Wed, Oct 12, 2011 at 10:32 AM, Jörn Kottmann <kottm...@gmail.com> wrote: > I am not sure what this output means. I get something which looks > more or less the same on my MacBook. > > Maybe your input file is not encoded in UTF-8? > > Jörn > > > On 10/10/11 8:42 PM, György Chityil wrote: > >> Hello Jörn, >>> >> I was unable to find default encoding, somewere I read for linux it is >> utf8 >> >> This is what I get when I type locale in ssh bash: >> >>> >>> -bash-3.2$ locale >>> LANG=C >>> LC_CTYPE="C" >>> LC_NUMERIC="C" >>> LC_TIME="C" >>> LC_COLLATE="C" >>> LC_MONETARY="C" >>> LC_MESSAGES="C" >>> LC_PAPER="C" >>> LC_NAME="C" >>> LC_ADDRESS="C" >>> LC_TELEPHONE="C" >>> LC_MEASUREMENT="C" >>> LC_IDENTIFICATION="C" >>> LC_ALL= >>> -bash-3.2$ >>> >>> > -- Gyuri 274 44 98 06 30 5888 744
Sz??val... ??n itt ??ltem.. ez meg egyszercsak felrobbant!
Szóval... én itt ültem.. ez meg egyszercsak felrobbant!