On 2/28/2013 12:29 PM, Stefan Matheis wrote:

On Thursday, February 28, 2013 at 5:26 PM, Jörn Kottmann wrote:

Hmm, pretty sure there is an encoding mismatch, do you know which
encoding is used by
your JVM? I would guess that is not UTF-8. You can probably get around
the issue by re-encoding the input
file to the encoding the JVM is using.
Have a look here:
http://stackoverflow.com/questions/1749064/how-to-find-default-charset-encoding-in-java
Would be nice if you can run the println statements there. Jörn
Where ever this comes from ..

$ java CharsetTest
Default Charset=US-ASCII
file.encoding=Latin-1
Default Charset=US-ASCII
Default Charset in Use=ASCII

$ echo $JAVA_TOOL_OPTIONS
(empty)

$ export JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8'

$ java CharsetTest
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
Default Charset=UTF-8
file.encoding=Latin-1
Default Charset=UTF-8
Default Charset in Use=UTF8



But this change itself didn't help .. output remains unchanged, so i took the 
road down to dirty-hack-land, applying the following change to bin/opennlp - 
for sure not how it should be .. but works at least for the moment:

-$JAVACMD -Xmx1024m -jar $OPENNLP_HOME/lib/opennlp-tools-*.jar $@
+$JAVACMD -Xmx1024m -Dfile.encoding=UTF8 -jar 
$OPENNLP_HOME/lib/opennlp-tools-*.jar $@




Stefan,

What is the output of:
    set | grep LANG

James

Reply via email to