[jira] [Commented] (OPENNLP-367) File Encoding Issues

James Kosin (Commented) (JIRA) Wed, 16 Nov 2011 19:39:19 -0800

    [ 
https://issues.apache.org/jira/browse/OPENNLP-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151773#comment-13151773
 ]


James Kosin commented on OPENNLP-367:
-------------------------------------

I did testing with the CoNLL 02 data and the encoding is working now without 
the -Dfile.encoding=UTF-8 ... we can document that as a possible workaround 
until it is fixed.

I also have to research the areas where we accept the file piped or redirected 
to the parsers and tokenizers on the CLI.

                
> File Encoding Issues
> --------------------
>
>                 Key: OPENNLP-367
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-367
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>         Environment: All
>            Reporter: James Kosin
>            Assignee: James Kosin
>              Labels: encoding, rework, training
>         Attachments: encoding.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The input and output encodings are not working correctly or are not properly 
> handled.  A good example is the CoNLL 2002 data if correctly encoded in UTF-8 
> does not correctly work for training without specifying -Dfile.encoding=UTF-8 
> for the Java Command.
> We already specify the input and expected output encoding on the cmdline 
> interface with the -encoding paramter.  For some reason this isn't being 
> followed.
> I'll work on fixing this for the next major release...  :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-367) File Encoding Issues

Reply via email to