[
https://issues.apache.org/jira/browse/OPENNLP-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027642#comment-13027642
]
Jörn Kottmann commented on OPENNLP-17:
--------------------------------------
Solution 2. is already implemented but was never released. With the current
implementation it is possible
to define the feature generation with an xml file like this one:
<generators>
<charngram min= "2" max= "5"/>
<definition/>
<cache>
<window prevLength = "3" nextLength = "3">
<generators>
<prevmap/>
<sentence/>
<tokenclass/>
<tokenpattern/>
</generators>
</window>
</cache>
</generators>
Each xml element adds one feature generator, and a feature generator
could also embed multiple generators as the window-generator does in the sample
above.
Configuring a custom feature generator would also be possible with an element
which uses
reflection to load a user implemented one.
> Add support for custom feature generator configuration embedded in the model
> package
> ------------------------------------------------------------------------------------
>
> Key: OPENNLP-17
> URL: https://issues.apache.org/jira/browse/OPENNLP-17
> Project: OpenNLP
> Issue Type: Improvement
> Components: Chunker, Name Finder, POS Tagger
> Affects Versions: tools-1.5.0-sourceforge
> Reporter: Jörn Kottmann
>
> Add support for custom feature generator configuration embedded in the model
> package.
> The configuration of the feature generators for the name finder component can
> be quite complex and the configuration must
> be always done twice once for training and once for tagging. Doing it twice
> at two different points in time makes
> the feature generation very error prone. Small mistakes lead to a drop in
> detection performance which might
> be difficult to notice.
> To solve this issue add the configuration to the model, then it must only be
> specified during training and
> can be loaded from the model during tagging.
> Another advantage is that custom feature generation is difficult to use
> otherwise, because the integration
> code must deal itself with setting up the feature generators. In some cases
> the user even does not have control
> over the code, or does not want to change it, e.g. in the UIMA wrappers.
> The same logic should be used for the POS Tagger and Chunker.
> The issues is migrated from SourceForge:
> https://sourceforge.net/tracker/?func=detail&aid=1941380&group_id=3368&atid=353368
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira