On 07/03/2012 02:17 PM, Daniel wrote:
I understand, but I still have some problems understanding what is the
utility of create a custom feature generator, could you give me any
example or any link with more information about it?
In my opinion you should always start with the default feature generation.
This gives you a good baseline which you can use to compare your
modifications
against. To do the actual measurement we added evaluation support directly
to the name finder, you can either evaluate on a test set or do cross
validation.
See here for more details:
http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.eval
If you use the API where you pass in the feature generator you need to
ensure its
put together the same way during runtime when you process your data. To
make this
easier we added a xml based DSL you can use to describe your feature
generation.
A sample of that can be found in our documentation:
http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.training.featuregen
To use it you need to call this train method:
NameFinderME.train(String languageCode, String type,
ObjectStream<NameSample> samples, TrainingParameters trainParams,
byte[] featureGeneratorBytes, final Map<String, Object> resources)
The xml descriptor is passed in as a byte array.
Hope that helps,
Jörn