Re: Technical information about adding new Entity to Stanbol

Olivier Grisel Thu, 27 Oct 2011 07:03:31 -0700

> From what I understand in *NEREngineCore*, I have to build a new OpenNLP
> model  that will allowing the extraction of the new type of entities from
> the content. Is that right ?


You could but this is rather complicated at the moment. The NER
enhancement engine need to be re-factored to make it more easily
configurable (both to work in a multi-lingual setup and to handle a
configurable list of entity types and models).

My advice is the following:

If:

1- your entities have non ambiguous names (e.g. protein names that
cannot be confused with people or place names
2- you have an comprehensive list of all the possible instances (with
possible multi-valued, multi-lingual names) of a given entity type
that you can load in the EntityHub

Then:

  You should not bother building an OpenNLP NER model and directly use
the KeywordLinkingEngine instead. This models will use generic
Sentence Segmentation, Part-of-Speech tagging or Chunking OpenNLP
models that are not dependent on the type of the entity your are
looking for.

 
https://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html

If conditions 1 and 2 do not hold, then you should go and read about
how to build custom OpenNLP NER models. However do not under-estimate
the cost of annotating a large corpus of representative sentences.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Technical information about adding new Entity to Stanbol

Reply via email to