On 2/28/2013 12:29 PM, Stefan Matheis wrote:
On Thursday, February 28, 2013 at 5:26 PM, Jörn Kottmann wrote:
Hmm, pretty sure there is an encoding mismatch, do you know which
encoding is used by
your JVM? I would guess that is not UTF-8. You can probably get around
the issue by re-encoding the input
file to the encoding the JVM is using.
Have a look here:
http://stackoverflow.com/questions/1749064/how-to-find-default-charset-encoding-in-java
Would be nice if you can run the println statements there.
Jörn
Where ever this comes from ..
$ java CharsetTest
Default Charset=US-ASCII
file.encoding=Latin-1
Default Charset=US-ASCII
Default Charset in Use=ASCII
$ echo $JAVA_TOOL_OPTIONS
(empty)
$ export JAVA_TOOL_OPTIONS='-Dfile.encoding=UTF8'
$ java CharsetTest
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
Default Charset=UTF-8
file.encoding=Latin-1
Default Charset=UTF-8
Default Charset in Use=UTF8
But this change itself didn't help .. output remains unchanged, so i took the
road down to dirty-hack-land, applying the following change to bin/opennlp -
for sure not how it should be .. but works at least for the moment:
-$JAVACMD -Xmx1024m -jar $OPENNLP_HOME/lib/opennlp-tools-*.jar $@
+$JAVACMD -Xmx1024m -Dfile.encoding=UTF8 -jar
$OPENNLP_HOME/lib/opennlp-tools-*.jar $@
Stefan,
What is the output of:
set | grep LANG
James