[ 
https://issues.apache.org/jira/browse/OPENNLP-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989176#comment-12989176
 ] 

Jörn Kottmann commented on OPENNLP-17:
--------------------------------------

Previously we came up with a few ways to solve this issue, but could never 
agree on which way to go.

1. Dependency injection. The model contains a file which describes how the 
feature generators are constructed and the features generators are instantiated 
and put together by a standard dependency injection framework. Back then it was 
proposed to use spring. Advantage is that the format is well known by our 
users. Disadvantage is the addition of an external dependency.  

2. Embed a custom xml descriptor which describes how to put the feature 
generators together. Advantage is that we do not need to depend on an external 
dependency injection framework. Disadvantage is that we need to define, 
document and maintain a custom xml format.

3. Place a javascript file in the model which is capable of constructing the 
feature generators. Disadvantage security might be a problem and it depends on 
Java6. Advantage user can implement simple feature generators for research 
purposes in javascript.

> Add support for custom feature generator configuration embedded in the model 
> package
> ------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-17
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-17
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Chunker, Name Finder, POS Tagger
>    Affects Versions: tools-1.5.0-sourceforge
>            Reporter: Jörn Kottmann
>
> Add support for custom feature generator configuration embedded in the model 
> package.
> The configuration of the feature generators for the name finder component can 
> be quite complex and the configuration must
> be always done twice once for training and once for tagging. Doing it twice 
> at two different points in time makes
> the feature generation very error prone. Small mistakes lead to a drop in 
> detection performance which might
> be difficult to notice. 
> To solve this issue add the configuration to the model, then it must only be 
> specified during training and
> can be loaded from the model during tagging.
> Another advantage is that custom feature generation is difficult to use 
> otherwise, because the integration
> code must deal itself with setting up the feature generators. In some cases 
> the user even does not have control
> over the code, or does not want to change it, e.g. in the UIMA wrappers.
> The same logic should be used for the POS Tagger and Chunker.
> The issues is migrated from SourceForge:
> https://sourceforge.net/tracker/?func=detail&aid=1941380&group_id=3368&atid=353368

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to