[
https://issues.apache.org/jira/browse/OPENNLP-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152547#comment-13152547
]
James Kosin commented on OPENNLP-367:
-------------------------------------
I think I have all the converters... anyone see any that are using the default
system encoding for the input or output let me know, or submit a patch to this.
I'm going to ask on the dev list now on weather we need encoding on the input /
output streams for the tools that are expecting to pipe the output to a file or
to another model as in the examples. It might have been nice to be able to get
a class setup. But for now we just have the System.setOut() and System.setIn()
functions to change the encoding.
> File Encoding Issues
> --------------------
>
> Key: OPENNLP-367
> URL: https://issues.apache.org/jira/browse/OPENNLP-367
> Project: OpenNLP
> Issue Type: Bug
> Components: Command Line Interface
> Affects Versions: tools-1.5.2-incubating
> Environment: All
> Reporter: James Kosin
> Assignee: James Kosin
> Labels: encoding, rework, training
> Attachments: encoding.patch
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> The input and output encodings are not working correctly or are not properly
> handled. A good example is the CoNLL 2002 data if correctly encoded in UTF-8
> does not correctly work for training without specifying -Dfile.encoding=UTF-8
> for the Java Command.
> We already specify the input and expected output encoding on the cmdline
> interface with the -encoding paramter. For some reason this isn't being
> followed.
> I'll work on fixing this for the next major release... :-)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira