Ok thanks
On Wed, Oct 12, 2011 at 2:46 PM, Jörn Kottmann <kottm...@gmail.com> wrote: > On 10/12/11 2:36 PM, Nicolas Hernandez wrote: >> >> Looking at the the Name Finder and the chunker tool, I wonder why they >> do not use the same training format? >> >> For exemple, this >> >> Mr.<START:person> Pierre Vinken<END> is chairman >> >> may also be represented like this >> >> Mr. NNP O >> Pierre NNP B-person >> Vinken NNP I-person >> is VBZ O >> chairman NN O >> >> I have noted that the Name Finder API offers the possibility to custom >> the feature generation to consider for the training, but both the Name >> Finder and the chunker use the same implementation of the learning >> algorithm don't they ? > > That has historical reasons, the name finder development was inspired by > the MUC shared tasks, and the chunker development was inspired by the CONLL > 2000 > shared task. > > The implementations are actually different, and the biggest difference is > the way features > are generated. The chunker can use pos tags, and the name finder cannot. > > We have plans to use the feature generation framework which was created for > the name finder > also in the POS tagger and chunker. > > Anyway the reasons why we have different components for sequence tagging is > that it makes it easier to integrate them if there is one component per > task. > > Everything in OpenNLP uses maxent or perceptron, yes. > > Jörn >