Hello,

Chinese and Japanese use different  punctuation characters but passing them to 
the trainer tool (using -eosChars '。!?.!?') does not seem do anything, the 
trained models have abysmal scores when using the SentenceDetectorEvaluator 
tool.

When i transform the 。 to . in the training data using sed, and then train, the 
models have acceptable scores.

I did notice the eosChars do not seem to end up well in the manifest.properties 
file, it becomes:
eosCharacters=\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD.\!?

When i manually update the file to list 。!?.!?, nothing changes.

What am i doing wrong?

Many thanks,
Markus

Reply via email to