[ 
https://issues.apache.org/jira/browse/OPENNLP-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974456#action_12974456
 ] 

James Kosin commented on OPENNLP-15:
------------------------------------

Jorn,

I noticed one of the comments you put into Conll03 Parser.

    while ((line = lineStream.read()) != null && !StringUtil.isEmpty(line)) {

      if (LANGUAGE.EN.equals(lang) && 
line.startsWith(Conll02NameSampleStream.DOCSTART)) {
        isClearAdaptiveData = true;
        // english data has a blank line after DOCSTART tag
        lineStream.read(); // TODO: Why isn't that caught by isEmpty ?!
        continue;
      }

The TODO here is explained simply.

1)  The while statement is while ! empty.  The continue in the if() statement 
causes a branch to the test.  The next line after DOCSTART is blank... 
isClearAdaptiveData is set to true.

The problem is the statements below here check the blank line condition and 
performs a secondary read() and return that value.

    else if (line != null) {
      // Just filter out empty events, if two lines in a row are empty
      return read();
    }

This secondary read() call clears the isClearAdaptiveData to false again and 
returns the sentence correctly.  Unfortunately, as a side effect the flag is 
inadvertantly cleared unintentionally I believe with this second call.

To fix this, would mean we would have to refactor the entire function.

-James

> Add support for the CoNLL 03 data format
> ----------------------------------------
>
>                 Key: OPENNLP-15
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-15
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Name Finder
>    Affects Versions: tools-1.5.0-sourceforge
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>             Fix For: tools-1.5.1-incubating
>
>
> Adding support to convert CoNLL 03 Reurters Support to NameFinder.
> Work on this issue began over at sourceforge:
> http://sourceforge.net/tracker/?func=detail&aid=3081785&group_id=3368&atid=353368

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to