> From what I understand in *NEREngineCore*, I have to build a new OpenNLP > model that will allowing the extraction of the new type of entities from > the content. Is that right ?
You could but this is rather complicated at the moment. The NER enhancement engine need to be re-factored to make it more easily configurable (both to work in a multi-lingual setup and to handle a configurable list of entity types and models). My advice is the following: If: 1- your entities have non ambiguous names (e.g. protein names that cannot be confused with people or place names 2- you have an comprehensive list of all the possible instances (with possible multi-valued, multi-lingual names) of a given entity type that you can load in the EntityHub Then: You should not bother building an OpenNLP NER model and directly use the KeywordLinkingEngine instead. This models will use generic Sentence Segmentation, Part-of-Speech tagging or Chunking OpenNLP models that are not dependent on the type of the entity your are looking for. https://incubator.apache.org/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.html If conditions 1 and 2 do not hold, then you should go and read about how to build custom OpenNLP NER models. However do not under-estimate the cost of annotating a large corpus of representative sentences. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
