[
https://issues.apache.org/jira/browse/OPENNLP-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Kosin updated OPENNLP-367:
--------------------------------
Attachment: encoding.patch
I've attached a small patch to only a few files to get everyone's opinion on
the problem. I say small because some of the converters are either relying on
the default system encoding...
Anyway. What the 3-4 files patched here do are (1) set a new System.out
printer with a new encoding ... I've specified the same as the input encoding
described for the class. (2) you will notice one of the files ConllXPOS... is
using the default system level encoding by using a PlainTextByLine(in) instead
of the other PlainTextByLine(in, "encoding").
Basically, I need to review all the encoding usages and try to determine if the
are all proper. Some may be and some may need to be adjusted.
Just trying to give everyone a heads up on the issue.
> File Encoding Issues
> --------------------
>
> Key: OPENNLP-367
> URL: https://issues.apache.org/jira/browse/OPENNLP-367
> Project: OpenNLP
> Issue Type: Bug
> Components: Command Line Interface
> Affects Versions: tools-1.5.2-incubating
> Environment: All
> Reporter: James Kosin
> Assignee: James Kosin
> Labels: encoding, rework, training
> Attachments: encoding.patch
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> The input and output encodings are not working correctly or are not properly
> handled. A good example is the CoNLL 2002 data if correctly encoded in UTF-8
> does not correctly work for training without specifying -Dfile.encoding=UTF-8
> for the Java Command.
> We already specify the input and expected output encoding on the cmdline
> interface with the -encoding paramter. For some reason this isn't being
> followed.
> I'll work on fixing this for the next major release... :-)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira