On 10/13/2011 4:23 AM, Jörn Kottmann wrote:
> On 10/13/11 4:54 AM, James Kosin wrote:
>> I found this article and several other references on how to fix these.
>> We may need to refactor the general output to the same encoding as the
>> input files to fix this on the terminal.
>>
>> http://hints.macworld.com/article.php?story=20050208053951714
>
> Doesn't the console need to know in which encoding characters are
> printed?
> I wonder if it works to use UTF-8 on windows.
>
> And the linux system might already had an UTF-8 default encoding.
> I will try it with the data we got on my Ubuntu test system
>
> Jörn
>
Just verified some more.  The problem with Windows is it is unable to
display the UTF-8 characters properly.  If I put the output to a file
like this "opennlp TokenizerME models\en-token.zip < utf-input.txt >
utf-output.txt" then open the resulting file in Notepad++  Everything is
fine.
James

Reply via email to