Hi Pablo,
Thanks for your help. I just configured the engine using
CoOccurrenceBasedSelector like spotter at OSGi console and now I'm
getting the same results than your demo. I'm still having the same issue
with ContentHub, and now also there are some extracted concepts that
seems not to be passed to LDPath Program.
Regards
El 14/09/12 16:29, Pablo N. Mendes escribió:
Hi Rafa,
Recognized entities weren't exactly the same ... I think that's because
the 'No common words' feature in the demo. I have been trying to configure
it in the engine, but I wasn't able to.
Thanks for your interest in DBpedia Spotlight. In order to enable "No
common words", you need to use &spotter=CoOccurrenceBasedSelector. We need
better names for these spotters, we are expanding the documentation [1] and
we hope to organize these things before the end of the year.
For the engine in Stanbol, you need to change the Spotter configuration [2].
[1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Spotting
[2]
http://blog.iks-project.eu/dbpedia-spotlight-integration-in-apache-stanbol-2/
Cheers,
Pablo
On Fri, Sep 14, 2012 at 4:03 PM, Suat Gönül <suatgo...@gmail.com> wrote:
Hi Rafa,
Could you please open an issue and attach the file there? In any case you
can send it also to my email. I will look into that in the next week.
Best,
Suat
On Fri, Sep 14, 2012 at 4:55 PM, Rafa Haro <rh...@zaizi.com> wrote:
Hi Suat,
I'm pretty sure. I can send you the Enhancement graph in RDF if you want
to check by your own. I was to post here but is pretty large.
Regards
El 14/09/12 15:27, Suat Gonul escribió:
Hi Rafa,
Are you sure the enhancements of this text do not contain other
entities. The contexts(URIs) on which the LDPath program is executed are
obtained as follows:
Iterator<Triple> it = sci.getMetadata().filter(null,
Properties.ENHANCER_ENTITY_**REFERENCE, null);
In other words, the source of the URIs is the metadata of the
ContentItem, could you please look into the enhancement graph of your
ContentItem whether there exists any other Orange related entities?
Best,
Suat
On 09/14/2012 04:15 PM, Rafa Haro wrote:
Hi all,
I have been playing around with DBPedia Spotlight engines these days.
With Rupert's help (thanks again) I was able to successfully install
and configure it as default engine. My next step was to create a
custom index in ContentHub to extract some data about the detected
entities and store it in Solr. Specifically, I want to store in Solr
the labels of each entities and its types (rdf:types). For example,
for the entity President Obama I would get:
Labels:
Presidency of Barack Obama
Présidence de Barack Obama
Barack Obama
Types:
foaf:Person
dbpedia-owl:Person
dbpedia-owl:OfficeHolder
dbpedia-owl:Agent
In order to achieve this, I have tried to extend default ContentHub
LDPath Program with this line:
concepts = fn:concat(rdfs:label[@en]," ", rdf:type) :: xsd:string;
I know that it might give me exactly what I want, but it was just a
first test. Anyway, I found some issues when I submitted a document to
store it in my new index:
1. Recognized entities weren't exactly the same that you can get using
DBPedia Spotlight demo
(http://dbpedia-spotlight.**github.com/demo/index.html<
http://dbpedia-spotlight.github.com/demo/index.html>),
which results
are more accurate. I think that's because the 'No common words'
feature in the demo. I have been trying to configure it in the engine,
but I wasn't able to.
2. The LDPath program is executed also for entities that are not
recognized by the engine. For example, using the following text:
" /Orange is a tropical to semitropical, evergreen, small flowering
tree growing to about 5 to 8 m tall and bears seasonal fruits that
measure about 3 inches in diameter and weighs about 100-150 g. Oranges
are classified into two general categories, sweet and bitter, with the
former being the type most commonly consumed. Popular varieties of the
sweet orange include Valencia, Navel, Persian variety, and blood
orange./"
The enhancer only recognized Orange (fruit) but, when I submit the
text to the content hub I also get results for Orange, Texas (Place).
I would need to store only the information of the disambiguated entity.
Thanks. Regards
This message should be regarded as confidential. If you have received
this email in error please notify the sender and destroy it
immediately. Statements of intent shall only become binding when
confirmed in hard copy by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration
number 6440931. The Registered Office is 222 Westbourne Studios, 242
Acklam Road, London W10 5JJ, UK.
This message should be regarded as confidential. If you have received
this
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard
copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
Road,
London W10 5JJ, UK.
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately. Statements
of intent shall only become binding when confirmed in hard copy by an
authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.