2011/8/16 <[email protected]>: > Hi Stanbol devs, > > I've been working with Stanbol since two weeks ago. I need some information > about how can I define a new ontology about products, users... in Apache > Stanbol and how can i extract entities from reasoners to tag automatically a > PDF document (i need to avoid tags from another sources, only the entities > in my own OWL file). > > The targets i've reached are the following: > > 1-. I've put a OWL file containing the ontology in ontonet module with its > curl call. > > 2-. I created scopes and recipes. > > I don't know how configure the environment to analyze a plain text and use > this customised ontology to extract tags. I need some global vision about > the problem and if it's possible some examples. > > If anyone can help me don't hesitate to answer me.
Hi, You most probably don't need the reasoners to process *unstructured data* such as natural language text content. Text analysis is achieved thanks to Enhancement engines that can rely on the EntityHub to as a domain specific knowledge base. If the names of your entities are very specific to your domain (not ambiguous) then the TaxonomyLinkingEngine coupled with a dedicated referenced site in the EntityHub that indexes your knowledge base sounds like the right approach. To index your knowledge base within the EntityHub you can take example on the following examples (for DBpedia and DBLP respectively): https://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/README.md https://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dblp/README.md However I don't have any usage example for configuring the TaxonomyLinkingEngine and Rupert who is the original developer of this module is off for a couple of weeks AFAIK. Note: reasoners are useful to process *structured* data (a.k.a. knowledge): converting assertions already expressed in one RDF vocabulary (e.g. dbpedia.org) into another (e.g. schema.org), checking integrity constraints, reifying transitive and reflexive relationships prior to indexing... -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
