2011/6/20 Amal Elmah <amalalthougha...@hotmail.com>: > > Hi OpenNLP team, > > I used the command line training tool for NameFinder .So, I used the > following command: > $bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data > en-ner-person.train -model en-ner-person.bin > > I do not know from where can I get the en-ner-person.train . So, I made a > trining file (training.txt) and add training data as follows: > > <START:person> Pierre Vinken <END> , 61 years old , will join the board as a > nonexecutive director Nov. 29 . > Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch > publishing group . > > My Questions are: > 1- How can I add features if I want to use the command line training tool not > API? Can you please give me an example if this is possible!
AFAIK in the current state feature extraction is only customizable through the API. > 2- Can we add features to the training data I mean with the annotation > <START: person feature=value> No. What would be the use case? Can you give a concrete example of such a manual feature annotation? What goal do you want to achieve with such annotations? > 3- Does Opennlp tool have a way to generate these features automatically from > the training data? OpenNLP already generates its feature automatically by combining several feature extractors as in: https://svn.apache.org/repos/asf/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/DefaultNameContextGenerator.java All those feature extractors do not expect any kind of many annotations. This is expected since in general the text you want to analyze with a NameFinde instance will not have any kind of annotations. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel