Exactly, the OpenNLP Name Finder defines a default feature generation. The code which
does it can be found in the NameFinderME.createFeatureGenerator method.

There we instantiate the following feature generators.

TokenFeatureGenerator:
+ lower cased token, with a window of 2

A window of two means that the feature is generated
for two previous and two next words also.

TokenClassFeatureGenerator:
+ token class (that contains things like first letter is capital)
+ token class combined the the lower cased token

Both featurs are generated with a window length of 2

PreviousMapFeatureGenerator:
+ previous decision features, if the word has been seen before
    in the document

BigramNameFeatureGenerator:
+ token bigram feature, with previous word
+ token bigram feature with previous token class
+ token bigram feature with next word
+ token bigram feature with next token class

SentenceFeatureGenerator
+ Sentence begin feature

OutcomePriorFeatureGenerator
+ always generates a default feature

Hope this helps,
Jörn

On 8/1/11 7:22 AM, Amal Elmah wrote:
Hi there ,
I am currently using Opennlp tool to train a new model for detection names 
using a specific-domain corpus. I managed to get a relatively good performance 
After reading the documentation, I know that Opennlp defines a default feature 
generation but what are these features? do they include initial capitalization 
and lower case Or what are they exactly ??and how Opennlp tool uses them with 
maximum entropy to detect the names. I really want to participate in the 
opennlp project but I am currently busy! Once I finished the work under my hand 
I will contribute to Opennlp since I have spent approximately 3 months reading 
about it and using its tools.
thanks in advance,Amal                                  

Reply via email to