Our command line actually does not work when the data
which is processed can not be represented in the platforms
default encoding. I don't know if we could support an encoding
flag here like we do in the trainer and evaluators.
What is your default encoding and platform local?
Jörn
On 10/8/11 8:46 AM, György Chityil wrote:
Sure, Nothing Fancy, in putty I just go like this:
"opennlp SentenceDetector en-sent.bin< texts/71122.txt"
Or
"opennlp SentenceDetector en-sent.bin< texts/71122.txt> sentences.txt"
I verified that the input text file is proper utf8.
I wonder if the command line works for anyone for utf8, since in that case
maybe it is my java installation.
Are you using the command line tools? Can you send an example of how you are
invoking the tool?
Thanks
William
On Fri, Oct 7, 2011 at 11:45 AM, György Chityil
<gyorgy.chit...@gmail.com>wrote:
Sorry, one typo:
" it comes back in ANSI format with UTF8 chars stripped out."
So characters like éáűúűóü come back as ??? ????
On Fri, Oct 7, 2011 at 3:59 PM, György Chityil<gyorgy.chit...@gmail.com
wrote:
Hello,
I am not sure if this is an opennlp issue, but what I notice is that if I
feed a utf8 file to opennlp, it comes back in ANSI format with UTF8 files
stripped out. Could this be an issue with opennlp?
--
Gyuri
274 44 98
06 30 5888 744