[ 
https://issues.apache.org/jira/browse/OPENNLP-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983415#action_12983415
 ] 

Jörn Kottmann commented on OPENNLP-53:
--------------------------------------

The Parse object has a text field of type String and the span field contains a 
Span object which contains the character offset and character length of the 
parse.

Would it be possible to replace this text String with an String array which 
contains the individual tokens ?
The replaced text String could be created from the String array to maintain 
backward compatibility for
the next few releases.

> Parser should have simple interface to process a tokenized input sentence
> -------------------------------------------------------------------------
>
>                 Key: OPENNLP-53
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-53
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Parser
>            Reporter: Jörn Kottmann
>
> The parser expects a tokenized sentence as input, but currently it must be 
> converted to a string where each
> token is separated by a white space.
> This interface turned out to be inconvenient if the input if the input 
> sentence is
> provided as a list of strings or a string with a token span list. In both case
> a new string must be created. In this new string the offsets of the 
> individual tokens
> must be remember in order to retrieve the parse tree out of the Parse objects.
> Create a more convenient way of interacting with an already tokenized 
> sentence which
> is not in a whitespace separated format. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to