Suat, Thanks. I will get back to you later after understanding this. -harish -harish
On Tue, Aug 21, 2012 at 1:29 PM, Suat Gönül <suatgo...@gmail.com> wrote: > Hello Harish, > > In the current implementation of Contenthub, the process of gathering > additional information process that you mention is realized as follows: > > 1) If you submit documents to default index of Contenthub, i.e contenthub, > values for a few hard-coded properties of the recognized named entities are > queried from the enhancement graph regarding to the document using SPARQL. > The hard-coded properties can be found in [1] and how they are queried can > found in [2]. So, if one of the entities has a " > http://dbpedia.org/property/knownFor" property and value of "swimming", > the > swimming keyword should also be indexed along with the actual content. > > 2) If you define an LDPath[3] program, create an index using the program > and submit documents to that index, additional information for named > entities is gathered from the LDPath backend of Entityhub. Each named > entitiy is queried with the LDPath program which was used to create the > index. So, if the "swimming" value is among the information retrieved from > Entityhub, it is indexed along with the actual content. > > These are valid for the current implemention. In the scope of STANBOL-471, > this structure has changed a bit. We are working on a new 2-layered > structure of Contenthub in contenthub-two-layered-structure branch. In this > structure, you submit a ContentItem to the Store part and other > SemanticIndex instances index that document according to their own > configurations. Again, it is possible to create new instances of > SemanticIndex based on an LDPath program. There is a default index of > Contenthub which has also its own LDPath program which is: > > @prefix dbp-ont: <http://dbpedia.org/ontology/>; > @prefix dbp-prop: <http://dbpedia.org/property/>; > @prefix foaf : <http://xmlns.com/foaf/0.1/>; > > persons = .[rdf:type is dbp-ont:Person] :: xsd:anyURI (termVectors="true"); > persons_known_fors = dbp-prop:knownFor | dbp-prop:knownFor/rdfs:label :: > xsd:string; > persons_birth_places = dbp-ont:birthPlace/rdfs:label | > dbp-prop:placeOfBirth/rdfs:label :: xsd:string; > persons_work_instutions = dbp-prop:workInstitutions/rdfs:label :: > xsd:string; > > organizations = .[rdf:type is dbp-ont:Organization] :: xsd:anyURI > (termVectors="true"); > > places = .[rdf:type is dbp-ont:Place] :: xsd:anyURI (termVectors="true"); > place_countries = dbp-ont:country/rdfs:label :: xsd:string; > place_regions = dbp-ont:region/rdfs:label | ^dbp-ont:region/rdfs:label | > dbp-prop:region/rdfs:label | ^dbp-prop:region/rdfs:label:: xsd:string; > place_capitals = dbp-ont:capital/rdfs:label :: xsd:string; > place_governors = dbp-prop:governer/rdfs:label :: xsd:string; > place_largest_cities = dbp-ont:largestCity/rdfs:label :: xsd:string; > place_leaders = dbp-prop:leaderName/rdfs:label | dbp-ont:leader/rdfs:label > :: xsd:string; > > entity_given_names = foaf:givenName :: xsd:string; > entity_captions = dbp-prop:caption :: xsd:string; > > So, each named entity in a ContentItem is queried with that LDPath via the > Entityhub. And gathered values are indexed along with the content. > > If you would have any further questions on this, I would be glad answer > them. But currently, I am in holiday and I won't be able to answer them > before August 27. > > Best, > Suat > > > [1] > > http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/vocabulary/SolrVocabulary.java > > > [2] > > http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/store/solr/src/main/java/org/apache/stanbol/contenthub/store/solr/util/QueryGenerator.java > > > [3] https://code.google.com/p/ldpath/ > > On Tue, Aug 21, 2012 at 3:03 AM, harish suvarna <hsuva...@gmail.com> > wrote: > > > Using English Stanbol, I played with contenthub. I took a small text > > as follows. > > ============== > > United States produced an Olympic-record time to win gold in the women's > > 200m freestyle relay final. A brilliant final leg from Allison Schmitt > > led > > the Americans home, ahead of Australia, in a time of seven minutes > 42.92 > > seconds. Missy Franklin gave them a great start, while Dana Vollmer and > > Shannon Vreeland also produced fast times. > > ============================== > > ======================================= > > > > The above text is properly processed and I get the dbpedia links for all > > persons, countries in the above. Hoewver, the above piece is related to > > 'swimming' and this word does not appear at all in the text. In the > > dbpedia > > link of Allison Scmitt, the dbpedia categories do tell us that it is in > > swimming category. Did anyone try to process the categories inside the > > link > > and add them as metadata for this content. If we add this, then we add > more > > value than a simple solr based search in content store. Some one in IKS > > conference demoed this as a semantic search. Any hints/clues on this > work > > ? > > > > > > -- > > Thanks > > Harish > > > -- Thanks Harish