[ 
https://issues.apache.org/jira/browse/OPENNLP-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060438#comment-13060438
 ] 

Jörn Kottmann commented on OPENNLP-78:
--------------------------------------

Should we defer the issue, or come up with a few new features in the InSpan 
generator? What do you think James?

Changing the feature generation might be difficult in a way which maintains 
backward compatibility, on the other side is this rarely used and doesn't 
really work right now.

I think we should add features which take the context and other things into 
account, like the following features:

        features.add(prefix + ":w=dic=" + tokens[index]);
        
        
        features.add(prefix + ":w=dic");
        features.add(prefix + ":w=dic" + "+wc:" + 
FeatureGeneratorUtil.tokenFeature(tokens[index]));
        
        if (index > 0) {
          features.add(prefix + ":w=dic" + "+po:" + preds[index -1]);
          features.add(prefix + ":w=dic" + "+pw:" + tokens[index -1]);
          features.add(prefix + ":w=dic" + "+pwc:" + 
FeatureGeneratorUtil.tokenFeature(tokens[index -1]));
        }
        
        if (index +1 <tokens.length) {
          features.add(prefix + ":w=dic" + "+nw:" + tokens[index +1]);
          features.add(prefix + ":w=dic" + "+nwc:" + 
FeatureGeneratorUtil.tokenFeature(tokens[index +1]));
        }

> NameFinder and Dictionary Integration
> -------------------------------------
>
>                 Key: OPENNLP-78
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-78
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>         Environment: Windows 7
>            Reporter: James Kosin
>            Assignee: James Kosin
>             Fix For: tools-1.5.2-incubating
>
>
> Now that we have a NameFinder Dictionary and improved NameFinder tools; it 
> would be nice to be able to integrate the dictionary and model to help 
> improve the finding of names.
> This way, the name finder could be trained more on the surrounding text 
> instead of attempting to memorize common names in the news that occur 
> frequently.
> I've already got the name finder corpus, created the dictionaries with the 
> data from the US Census.
> I just need to implement some method to help train the model; or be able to 
> use the dictionaries post model creation to help with the finding of names.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to