Re: Stanbol contenthub

harish suvarna Thu, 23 Aug 2012 06:01:15 -0700

Suat, Thanks. I will get back to you later after understanding this.
-harish
-harish


On Tue, Aug 21, 2012 at 1:29 PM, Suat Gönül <suatgo...@gmail.com> wrote:

> Hello Harish,
>
> In the current implementation of Contenthub, the process of gathering
> additional information process that you mention is realized as follows:
>
> 1) If you submit documents to default index of Contenthub, i.e contenthub,
> values for a few hard-coded properties of the recognized named entities are
> queried from the enhancement graph regarding to the document using SPARQL.
> The hard-coded properties can be found in [1] and how they are queried can
> found in [2]. So, if one of the entities has a "
> http://dbpedia.org/property/knownFor"; property and value of "swimming",
> the
> swimming keyword should also be indexed along with the actual content.
>
> 2) If you define an LDPath[3] program, create an index using the program
> and submit documents to that index, additional information for named
> entities is gathered from the LDPath backend of Entityhub. Each named
> entitiy is queried with the LDPath program which was used to create the
> index. So, if the "swimming" value is among the information retrieved from
> Entityhub, it is indexed along with the actual content.
>
> These are valid for the current implemention. In the scope of STANBOL-471,
> this structure has changed a bit. We are working on a new 2-layered
> structure of Contenthub in contenthub-two-layered-structure branch. In this
> structure, you submit a ContentItem to the Store part and other
> SemanticIndex instances index that document according to their own
> configurations. Again, it is possible to create new instances of
> SemanticIndex based on an LDPath program. There is a default index of
> Contenthub which has also its own LDPath program which is:
>
> @prefix dbp-ont: <http://dbpedia.org/ontology/>;
> @prefix dbp-prop: <http://dbpedia.org/property/>;
> @prefix foaf : <http://xmlns.com/foaf/0.1/>;
>
> persons = .[rdf:type is dbp-ont:Person] :: xsd:anyURI (termVectors="true");
> persons_known_fors = dbp-prop:knownFor | dbp-prop:knownFor/rdfs:label ::
> xsd:string;
> persons_birth_places = dbp-ont:birthPlace/rdfs:label |
> dbp-prop:placeOfBirth/rdfs:label :: xsd:string;
> persons_work_instutions = dbp-prop:workInstitutions/rdfs:label ::
> xsd:string;
>
> organizations = .[rdf:type is dbp-ont:Organization] :: xsd:anyURI
> (termVectors="true");
>
> places = .[rdf:type is dbp-ont:Place] :: xsd:anyURI (termVectors="true");
> place_countries = dbp-ont:country/rdfs:label :: xsd:string;
> place_regions = dbp-ont:region/rdfs:label | ^dbp-ont:region/rdfs:label |
> dbp-prop:region/rdfs:label | ^dbp-prop:region/rdfs:label:: xsd:string;
> place_capitals = dbp-ont:capital/rdfs:label :: xsd:string;
> place_governors = dbp-prop:governer/rdfs:label :: xsd:string;
> place_largest_cities = dbp-ont:largestCity/rdfs:label :: xsd:string;
> place_leaders = dbp-prop:leaderName/rdfs:label | dbp-ont:leader/rdfs:label
> :: xsd:string;
>
> entity_given_names = foaf:givenName :: xsd:string;
> entity_captions = dbp-prop:caption :: xsd:string;
>
> So, each named entity in a ContentItem is queried with that LDPath via the
> Entityhub. And gathered values are indexed along with the content.
>
> If you would have any further questions on this, I would be glad answer
> them. But currently, I am in holiday and I won't be able to answer them
> before August 27.
>
> Best,
> Suat
>
>
> [1]
>
> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/vocabulary/SolrVocabulary.java
>
>
> [2]
>
> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/store/solr/src/main/java/org/apache/stanbol/contenthub/store/solr/util/QueryGenerator.java
>
>
> [3]  https://code.google.com/p/ldpath/
>
> On Tue, Aug 21, 2012 at 3:03 AM, harish suvarna <hsuva...@gmail.com>
> wrote:
>
> > Using English Stanbol, I played with contenthub. I took a small text
> >   as follows.
> > ==============
> > United States produced an Olympic-record time to win gold in the women's
> >   200m freestyle relay final. A brilliant final leg from Allison Schmitt
> > led
> >   the Americans home, ahead of Australia, in a time of seven minutes
> 42.92
> >   seconds. Missy Franklin gave them a great start, while Dana Vollmer and
> > Shannon Vreeland also produced fast times.
> > ==============================
> > =======================================
> >
> > The above text is properly processed and I get the dbpedia links for all
> >   persons, countries in the above. Hoewver, the above piece is related to
> >   'swimming' and this word does not appear at all in the text. In the
> > dbpedia
> >   link of Allison Scmitt, the dbpedia categories do tell us that it is in
> >   swimming category. Did anyone try to process the categories inside the
> > link
> > and add them as metadata for this content. If we add this, then we add
> more
> >   value than a simple solr based search in content store. Some one in IKS
> >   conference demoed this as a semantic search. Any hints/clues on this
> work
> > ?
> >
> >
> > --
> > Thanks
> > Harish
> >
>



-- 
Thanks
Harish

Re: Stanbol contenthub

Reply via email to