Dear Jörn,

Thank you very much, I still have not really understood what these
generators are, but I will dive into it :-)

I'm aware of the Conll data, but because the (German) data is not free
available, I thought it wouldn't be a bad idea to have at least a small,
but free available trained NE model.

Best,

Tom


Am 10.10.2013 14:39, schrieb Jörn Kottmann:
> On 10/10/2013 11:58 AM, Thomas Zastrow wrote:
>> Hello,
>>
>> There seems to be no free German NE model available, so I started to
>> think about creating one - just using free resources like Wikipedia etc.
>>
>> I still have some questions:
>>
>> Somewhere in the documnetation, I read about a dictionary driven NE
>> recognizer in OpenNLP. But I didn't found any further information
>> about it. Anyway, would it be possible to combine the statistic
>> approach with dictionaries? For example, having a list of country
>> names would be useful.
>>
> 
> Yes that is possible, we have a DictionaryFeatureGenerator which can
> lookup names in a dictionary and produces features for them.
> There is an xml file you can create to describe how the feature
> generation should be setup for training, the file is then stored in the
> model
> to be able to reproduce the exact same feature generation when the model
> is loaded later.
> 
> See our documentation:
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training.featuregen
> 
> 
> What are the feature you would like to generate via the dictionary?
> 
> The Name Finder can be extended with custom feature generators, in case
> you have some ideas or just want to experiment a bit.
> 
>> As far as I understood, the name finder is at the moment only stable
>> for one property, like person names. I would like to have the
>> traditional divison into persons, locations, organizations and misc.
>> When creating manually the training data, would it be OK to add all
>> four kinds already to the text and then, maybe create later 4 models
>> for the different properties?
> 
> The name finder trainer by default trains a model for all name types
> occurring in the training data, the -nameTypes option can reduce the
> used types
> to one or multiple. I often use this, it works great.
> 
>> The name finder uses as input sentences and tokens. Would it be OK to
>> also have POS tags assigned to the training data? That would make it
>> much easier to manually annotate the data when e.g. NEs are already
>> marked by the POS tagger.
>>
> 
> Passing in pos tags is currently not supported by our API. The easiest
> way to get around that limitation is probably
> to run the pos taggger as part of the name finder feature generation.
> 
> There is German CONLL training data you could use to train a name finder
> model:
> http://www.cnts.ua.ac.be/conll2003/ner/
> 
> The OpenNLP Name Finder can be directly trained on the CONLL2003 data.
> 
> HTH,
> Jörn

Reply via email to