Hi all,

I have been playing around with DBPedia Spotlight engines these days. With Rupert's help (thanks again) I was able to successfully install and configure it as default engine. My next step was to create a custom index in ContentHub to extract some data about the detected entities and store it in Solr. Specifically, I want to store in Solr the labels of each entities and its types (rdf:types). For example, for the entity President Obama I would get:

Labels:

Presidency of Barack Obama
Présidence de Barack Obama
Barack Obama

Types:
foaf:Person
dbpedia-owl:Person
dbpedia-owl:OfficeHolder
dbpedia-owl:Agent

In order to achieve this, I have tried to extend default ContentHub LDPath Program with this line:

concepts = fn:concat(rdfs:label[@en]," ", rdf:type) :: xsd:string;

I know that it might give me exactly what I want, but it was just a first test. Anyway, I found some issues when I submitted a document to store it in my new index:

1. Recognized entities weren't exactly the same that you can get using DBPedia Spotlight demo (http://dbpedia-spotlight.github.com/demo/index.html), which results are more accurate. I think that's because the 'No common words' feature in the demo. I have been trying to configure it in the engine, but I wasn't able to.

2. The LDPath program is executed also for entities that are not recognized by the engine. For example, using the following text:

" /Orange is a tropical to semitropical, evergreen, small flowering tree growing to about 5 to 8 m tall and bears seasonal fruits that measure about 3 inches in diameter and weighs about 100-150 g. Oranges are classified into two general categories, sweet and bitter, with the former being the type most commonly consumed. Popular varieties of the sweet orange include Valencia, Navel, Persian variety, and blood orange./"

The enhancer only recognized Orange (fruit) but, when I submit the text to the content hub I also get results for Orange, Texas (Place). I would need to store only the information of the disambiguated entity.

Thanks. Regards

This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.

Reply via email to