Hi Rafa,
> Recognized entities weren't exactly the same ... I think that's because > the 'No common words' feature in the demo. I have been trying to configure > it in the engine, but I wasn't able to. Thanks for your interest in DBpedia Spotlight. In order to enable "No common words", you need to use &spotter=CoOccurrenceBasedSelector. We need better names for these spotters, we are expanding the documentation [1] and we hope to organize these things before the end of the year. For the engine in Stanbol, you need to change the Spotter configuration [2]. [1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Spotting [2] http://blog.iks-project.eu/dbpedia-spotlight-integration-in-apache-stanbol-2/ Cheers, Pablo On Fri, Sep 14, 2012 at 4:03 PM, Suat Gönül <suatgo...@gmail.com> wrote: > Hi Rafa, > > Could you please open an issue and attach the file there? In any case you > can send it also to my email. I will look into that in the next week. > > Best, > Suat > > > On Fri, Sep 14, 2012 at 4:55 PM, Rafa Haro <rh...@zaizi.com> wrote: > > > Hi Suat, > > > > I'm pretty sure. I can send you the Enhancement graph in RDF if you want > > to check by your own. I was to post here but is pretty large. > > > > Regards > > > > El 14/09/12 15:27, Suat Gonul escribió: > > > > Hi Rafa, > >> > >> Are you sure the enhancements of this text do not contain other > >> entities. The contexts(URIs) on which the LDPath program is executed are > >> obtained as follows: > >> > >> Iterator<Triple> it = sci.getMetadata().filter(null, > >> Properties.ENHANCER_ENTITY_**REFERENCE, null); > >> > >> In other words, the source of the URIs is the metadata of the > >> ContentItem, could you please look into the enhancement graph of your > >> ContentItem whether there exists any other Orange related entities? > >> > >> Best, > >> Suat > >> > >> > >> On 09/14/2012 04:15 PM, Rafa Haro wrote: > >> > >>> Hi all, > >>> > >>> I have been playing around with DBPedia Spotlight engines these days. > >>> With Rupert's help (thanks again) I was able to successfully install > >>> and configure it as default engine. My next step was to create a > >>> custom index in ContentHub to extract some data about the detected > >>> entities and store it in Solr. Specifically, I want to store in Solr > >>> the labels of each entities and its types (rdf:types). For example, > >>> for the entity President Obama I would get: > >>> > >>> Labels: > >>> > >>> Presidency of Barack Obama > >>> Présidence de Barack Obama > >>> Barack Obama > >>> > >>> Types: > >>> foaf:Person > >>> dbpedia-owl:Person > >>> dbpedia-owl:OfficeHolder > >>> dbpedia-owl:Agent > >>> > >>> In order to achieve this, I have tried to extend default ContentHub > >>> LDPath Program with this line: > >>> > >>> concepts = fn:concat(rdfs:label[@en]," ", rdf:type) :: xsd:string; > >>> > >>> I know that it might give me exactly what I want, but it was just a > >>> first test. Anyway, I found some issues when I submitted a document to > >>> store it in my new index: > >>> > >>> 1. Recognized entities weren't exactly the same that you can get using > >>> DBPedia Spotlight demo > >>> (http://dbpedia-spotlight.**github.com/demo/index.html< > http://dbpedia-spotlight.github.com/demo/index.html>), > >>> which results > >>> are more accurate. I think that's because the 'No common words' > >>> feature in the demo. I have been trying to configure it in the engine, > >>> but I wasn't able to. > >>> > >>> 2. The LDPath program is executed also for entities that are not > >>> recognized by the engine. For example, using the following text: > >>> > >>> " /Orange is a tropical to semitropical, evergreen, small flowering > >>> tree growing to about 5 to 8 m tall and bears seasonal fruits that > >>> measure about 3 inches in diameter and weighs about 100-150 g. Oranges > >>> are classified into two general categories, sweet and bitter, with the > >>> former being the type most commonly consumed. Popular varieties of the > >>> sweet orange include Valencia, Navel, Persian variety, and blood > >>> orange./" > >>> > >>> The enhancer only recognized Orange (fruit) but, when I submit the > >>> text to the content hub I also get results for Orange, Texas (Place). > >>> I would need to store only the information of the disambiguated entity. > >>> > >>> Thanks. Regards > >>> > >>> This message should be regarded as confidential. If you have received > >>> this email in error please notify the sender and destroy it > >>> immediately. Statements of intent shall only become binding when > >>> confirmed in hard copy by an authorised signatory. > >>> > >>> Zaizi Ltd is registered in England and Wales with the registration > >>> number 6440931. The Registered Office is 222 Westbourne Studios, 242 > >>> Acklam Road, London W10 5JJ, UK. > >>> > >>> > > This message should be regarded as confidential. If you have received > this > > email in error please notify the sender and destroy it immediately. > > Statements of intent shall only become binding when confirmed in hard > copy > > by an authorised signatory. > > > > Zaizi Ltd is registered in England and Wales with the registration number > > 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam > Road, > > London W10 5JJ, UK. > > > > > -- --- Pablo N. Mendes http://pablomendes.com Events: http://wole2012.eurecom.fr