Re: ContentHub Custom index for DBPedia-spotlight enhancer

Pablo N. Mendes Fri, 14 Sep 2012 07:30:05 -0700

Hi Rafa,


> Recognized entities weren't exactly the same ... I think that's because
> the 'No common words' feature in the demo. I have been trying to configure
> it in the engine, but I wasn't able to.


Thanks for your interest in DBpedia Spotlight. In order to enable "No
common words", you need to use &spotter=CoOccurrenceBasedSelector. We need
better names for these spotters, we are expanding the documentation [1] and
we hope to organize these things before the end of the year.

For the engine in Stanbol, you need to change the Spotter configuration [2].

[1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Spotting
[2]
http://blog.iks-project.eu/dbpedia-spotlight-integration-in-apache-stanbol-2/

Cheers,
Pablo

On Fri, Sep 14, 2012 at 4:03 PM, Suat Gönül <suatgo...@gmail.com> wrote:

> Hi Rafa,
>
> Could you please open an issue and attach the file there? In any case you
> can send it also to my email. I will look into that in the next week.
>
> Best,
> Suat
>
>
> On Fri, Sep 14, 2012 at 4:55 PM, Rafa Haro <rh...@zaizi.com> wrote:
>
> > Hi Suat,
> >
> > I'm pretty sure. I can send you the Enhancement graph in RDF if you want
> > to check by your own. I was to post here but is pretty large.
> >
> > Regards
> >
> > El 14/09/12 15:27, Suat Gonul escribió:
> >
> >  Hi Rafa,
> >>
> >> Are you sure the enhancements of this text do not contain other
> >> entities. The contexts(URIs) on which the LDPath program is executed are
> >> obtained as follows:
> >>
> >> Iterator<Triple> it = sci.getMetadata().filter(null,
> >> Properties.ENHANCER_ENTITY_**REFERENCE, null);
> >>
> >> In other words, the source of the URIs is the metadata of the
> >> ContentItem, could you please look into the enhancement graph of your
> >> ContentItem whether there exists any other Orange related entities?
> >>
> >> Best,
> >> Suat
> >>
> >>
> >> On 09/14/2012 04:15 PM, Rafa Haro wrote:
> >>
> >>> Hi all,
> >>>
> >>> I have been playing around with DBPedia Spotlight engines these days.
> >>> With Rupert's help (thanks again) I was able to successfully install
> >>> and configure it as default engine. My next step was to create a
> >>> custom index in ContentHub to extract some data about the detected
> >>> entities and store it in Solr. Specifically, I want to store in Solr
> >>> the labels of each entities and its types (rdf:types). For example,
> >>> for the entity President Obama I would get:
> >>>
> >>> Labels:
> >>>
> >>> Presidency of Barack Obama
> >>> Présidence de Barack Obama
> >>> Barack Obama
> >>>
> >>> Types:
> >>> foaf:Person
> >>> dbpedia-owl:Person
> >>> dbpedia-owl:OfficeHolder
> >>> dbpedia-owl:Agent
> >>>
> >>> In order to achieve this, I have tried to extend default ContentHub
> >>> LDPath Program with this line:
> >>>
> >>> concepts = fn:concat(rdfs:label[@en]," ", rdf:type) :: xsd:string;
> >>>
> >>> I know that it might give me exactly what I want, but it was just a
> >>> first test. Anyway, I found some issues when I submitted a document to
> >>> store it in my new index:
> >>>
> >>> 1. Recognized entities weren't exactly the same that you can get using
> >>> DBPedia Spotlight demo
> >>> (http://dbpedia-spotlight.**github.com/demo/index.html<
> http://dbpedia-spotlight.github.com/demo/index.html>),
> >>> which results
> >>> are more accurate. I think that's because the 'No common words'
> >>> feature in the demo. I have been trying to configure it in the engine,
> >>> but I wasn't able to.
> >>>
> >>> 2. The LDPath program is executed also for entities that are not
> >>> recognized by the engine. For example, using the following text:
> >>>
> >>> " /Orange is a tropical to semitropical, evergreen, small flowering
> >>> tree growing to about 5 to 8 m tall and bears seasonal fruits that
> >>> measure about 3 inches in diameter and weighs about 100-150 g. Oranges
> >>> are classified into two general categories, sweet and bitter, with the
> >>> former being the type most commonly consumed. Popular varieties of the
> >>> sweet orange include Valencia, Navel, Persian variety, and blood
> >>> orange./"
> >>>
> >>> The enhancer only recognized Orange (fruit) but, when I submit the
> >>> text to the content hub I also get results for Orange, Texas (Place).
> >>> I would need to store only the information of the disambiguated entity.
> >>>
> >>> Thanks. Regards
> >>>
> >>> This message should be regarded as confidential. If you have received
> >>> this email in error please notify the sender and destroy it
> >>> immediately. Statements of intent shall only become binding when
> >>> confirmed in hard copy by an authorised signatory.
> >>>
> >>> Zaizi Ltd is registered in England and Wales with the registration
> >>> number 6440931. The Registered Office is 222 Westbourne Studios, 242
> >>> Acklam Road, London W10 5JJ, UK.
> >>>
> >>>
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
> Road,
> > London W10 5JJ, UK.
> >
> >
>



-- 
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr

Re: ContentHub Custom index for DBPedia-spotlight enhancer

Reply via email to