[ 
https://issues.apache.org/jira/browse/OPENNLP-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027642#comment-13027642
 ] 

Jörn Kottmann commented on OPENNLP-17:
--------------------------------------

Solution 2. is already implemented but was never released. With the current 
implementation it is possible
to define the feature generation with an xml file like this one:

<generators>
   <charngram min= "2" max= "5"/>
   <definition/>
   <cache>
     <window prevLength = "3" nextLength = "3">
       <generators>
         <prevmap/>
         <sentence/>
         <tokenclass/>
         <tokenpattern/>
       </generators>
     </window>
   </cache> 
</generators>

Each xml element adds one feature generator, and a feature generator
could also embed multiple generators as the window-generator does in the sample
above.

Configuring a custom feature generator would also be possible with an element 
which uses
reflection to load a user implemented one.

> Add support for custom feature generator configuration embedded in the model 
> package
> ------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-17
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-17
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Chunker, Name Finder, POS Tagger
>    Affects Versions: tools-1.5.0-sourceforge
>            Reporter: Jörn Kottmann
>
> Add support for custom feature generator configuration embedded in the model 
> package.
> The configuration of the feature generators for the name finder component can 
> be quite complex and the configuration must
> be always done twice once for training and once for tagging. Doing it twice 
> at two different points in time makes
> the feature generation very error prone. Small mistakes lead to a drop in 
> detection performance which might
> be difficult to notice. 
> To solve this issue add the configuration to the model, then it must only be 
> specified during training and
> can be loaded from the model during tagging.
> Another advantage is that custom feature generation is difficult to use 
> otherwise, because the integration
> code must deal itself with setting up the feature generators. In some cases 
> the user even does not have control
> over the code, or does not want to change it, e.g. in the UIMA wrappers.
> The same logic should be used for the POS Tagger and Chunker.
> The issues is migrated from SourceForge:
> https://sourceforge.net/tracker/?func=detail&aid=1941380&group_id=3368&atid=353368

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to