Hi all,
I have been playing around with DBPedia Spotlight engines these days.
With Rupert's help (thanks again) I was able to successfully install and
configure it as default engine. My next step was to create a custom
index in ContentHub to extract some data about the detected entities and
store it in Solr. Specifically, I want to store in Solr the labels of
each entities and its types (rdf:types). For example, for the entity
President Obama I would get:
Labels:
Presidency of Barack Obama
Présidence de Barack Obama
Barack Obama
Types:
foaf:Person
dbpedia-owl:Person
dbpedia-owl:OfficeHolder
dbpedia-owl:Agent
In order to achieve this, I have tried to extend default ContentHub
LDPath Program with this line:
concepts = fn:concat(rdfs:label[@en]," ", rdf:type) :: xsd:string;
I know that it might give me exactly what I want, but it was just a
first test. Anyway, I found some issues when I submitted a document to
store it in my new index:
1. Recognized entities weren't exactly the same that you can get using
DBPedia Spotlight demo
(http://dbpedia-spotlight.github.com/demo/index.html), which results are
more accurate. I think that's because the 'No common words' feature in
the demo. I have been trying to configure it in the engine, but I wasn't
able to.
2. The LDPath program is executed also for entities that are not
recognized by the engine. For example, using the following text:
" /Orange is a tropical to semitropical, evergreen, small flowering tree
growing to about 5 to 8 m tall and bears seasonal fruits that measure
about 3 inches in diameter and weighs about 100-150 g. Oranges are
classified into two general categories, sweet and bitter, with the
former being the type most commonly consumed. Popular varieties of the
sweet orange include Valencia, Navel, Persian variety, and blood orange./"
The enhancer only recognized Orange (fruit) but, when I submit the text
to the content hub I also get results for Orange, Texas (Place). I would
need to store only the information of the disambiguated entity.
Thanks. Regards
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately. Statements
of intent shall only become binding when confirmed in hard copy by an
authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.