Dear all,
 
I am trying to train the NameFinderME using a custom set of feature generators. 
However, I am not able to add the feature generators to the name finder.
 
Here is what I do:
As described in the documentation 
(https://opennlp.apache.org/documentation/1.7.0/manual/opennlp.html#tools.namefind.training.featuregen),
 I used the following code to set up the list of feature generators:
 
   AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
           new AdaptiveFeatureGenerator[]{
           new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 2),
           new WindowFeatureGenerator(new TokenClassFeatureGenerator(true), 2, 
2),
           new OutcomePriorFeatureGenerator(),
           new PreviousMapFeatureGenerator(),
           new BigramNameFeatureGenerator(),
           new SentenceFeatureGenerator(true, false),
           new BrownTokenFeatureGenerator(BrownCluster dictResource)
           });
 
Afterwards, in the documentation it is explained, that "the 
TokenNameFinderFactory allows to specify a custom feature generator".
However, I don't know how to do this, since there is no add-Method or any 
parameter of type AdaptiveFeatureGenerator in the constructor.
 
   TokenNameFinderFactory factory = new TokenNameFinderFactory()
   ... //how to add the FeatureGenerator?
   model = NameFinderME.train("en", "default", sampleStream, 
TrainingParameters.defaultParams(), factory);
 
In an older release of OpenNlp, it was possible to add the featureGenerators 
via the train-Method like this:
 
   train(String languageCode, String type, ObjectStream<NameSample> samples,
       TrainingParameters trainParams, AdaptiveFeatureGenerator generator, 
final Map<String, Object> resources)
 
But this not possible any longer. Can anybody describe the new way to implement 
this? An example would be great!
 
I only found this:

   public TokenNameFinderFactory(byte[] featureGeneratorBytes,
                              Map<String,Object> resources,
                              SequenceCodec<String> seqCodec)
 
But I don’t know what parameters to pass (why a byte array? SequenceCodec?)...
 
Any help is appreciated,
Thanks in advance!
 
Best,
Markus

Reply via email to