Hello Harish,

In the current implementation of Contenthub, the process of gathering
additional information process that you mention is realized as follows:

1) If you submit documents to default index of Contenthub, i.e contenthub,
values for a few hard-coded properties of the recognized named entities are
queried from the enhancement graph regarding to the document using SPARQL.
The hard-coded properties can be found in [1] and how they are queried can
found in [2]. So, if one of the entities has a "
http://dbpedia.org/property/knownFor"; property and value of "swimming", the
swimming keyword should also be indexed along with the actual content.

2) If you define an LDPath[3] program, create an index using the program
and submit documents to that index, additional information for named
entities is gathered from the LDPath backend of Entityhub. Each named
entitiy is queried with the LDPath program which was used to create the
index. So, if the "swimming" value is among the information retrieved from
Entityhub, it is indexed along with the actual content.

These are valid for the current implemention. In the scope of STANBOL-471,
this structure has changed a bit. We are working on a new 2-layered
structure of Contenthub in contenthub-two-layered-structure branch. In this
structure, you submit a ContentItem to the Store part and other
SemanticIndex instances index that document according to their own
configurations. Again, it is possible to create new instances of
SemanticIndex based on an LDPath program. There is a default index of
Contenthub which has also its own LDPath program which is:

@prefix dbp-ont: <http://dbpedia.org/ontology/>;
@prefix dbp-prop: <http://dbpedia.org/property/>;
@prefix foaf : <http://xmlns.com/foaf/0.1/>;

persons = .[rdf:type is dbp-ont:Person] :: xsd:anyURI (termVectors="true");
persons_known_fors = dbp-prop:knownFor | dbp-prop:knownFor/rdfs:label ::
xsd:string;
persons_birth_places = dbp-ont:birthPlace/rdfs:label |
dbp-prop:placeOfBirth/rdfs:label :: xsd:string;
persons_work_instutions = dbp-prop:workInstitutions/rdfs:label ::
xsd:string;

organizations = .[rdf:type is dbp-ont:Organization] :: xsd:anyURI
(termVectors="true");

places = .[rdf:type is dbp-ont:Place] :: xsd:anyURI (termVectors="true");
place_countries = dbp-ont:country/rdfs:label :: xsd:string;
place_regions = dbp-ont:region/rdfs:label | ^dbp-ont:region/rdfs:label |
dbp-prop:region/rdfs:label | ^dbp-prop:region/rdfs:label:: xsd:string;
place_capitals = dbp-ont:capital/rdfs:label :: xsd:string;
place_governors = dbp-prop:governer/rdfs:label :: xsd:string;
place_largest_cities = dbp-ont:largestCity/rdfs:label :: xsd:string;
place_leaders = dbp-prop:leaderName/rdfs:label | dbp-ont:leader/rdfs:label
:: xsd:string;

entity_given_names = foaf:givenName :: xsd:string;
entity_captions = dbp-prop:caption :: xsd:string;

So, each named entity in a ContentItem is queried with that LDPath via the
Entityhub. And gathered values are indexed along with the content.

If you would have any further questions on this, I would be glad answer
them. But currently, I am in holiday and I won't be able to answer them
before August 27.

Best,
Suat


[1]
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/vocabulary/SolrVocabulary.java


[2]
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/store/solr/src/main/java/org/apache/stanbol/contenthub/store/solr/util/QueryGenerator.java


[3]  https://code.google.com/p/ldpath/

On Tue, Aug 21, 2012 at 3:03 AM, harish suvarna <hsuva...@gmail.com> wrote:

> Using English Stanbol, I played with contenthub. I took a small text
>   as follows.
> ==============
> United States produced an Olympic-record time to win gold in the women's
>   200m freestyle relay final. A brilliant final leg from Allison Schmitt
> led
>   the Americans home, ahead of Australia, in a time of seven minutes 42.92
>   seconds. Missy Franklin gave them a great start, while Dana Vollmer and
> Shannon Vreeland also produced fast times.
> ==============================
> =======================================
>
> The above text is properly processed and I get the dbpedia links for all
>   persons, countries in the above. Hoewver, the above piece is related to
>   'swimming' and this word does not appear at all in the text. In the
> dbpedia
>   link of Allison Scmitt, the dbpedia categories do tell us that it is in
>   swimming category. Did anyone try to process the categories inside the
> link
> and add them as metadata for this content. If we add this, then we add more
>   value than a simple solr based search in content store. Some one in IKS
>   conference demoed this as a semantic search. Any hints/clues on this work
> ?
>
>
> --
> Thanks
> Harish
>

Reply via email to