The page
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSPARQLReasoningTutorial#Step6.:
SPARQL Inference Queries
demonstrates how to query for transitive relationships using property paths
and
transitivity inferencing.  As I am interested in transitive relationships in
a taxonomic tree, I attempted to apply these techniques to my dataset, where
taxons are relationships are defined using the rdfs:subClassOf predicate.
 In particular, I am interested in getting a set of taxons (organisms) that
within a taxonomic clade (branch).

I have already posted about the mixed success I had attempting to do this
using property paths.

Here I will show how to successfully do this using the rdfs:subClassOf
property without relying on the built in rdf:type inferencing describe at
http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html. That is because
this built in rdf:type inferencing does not work in general for
rdfs:subClassOf transitivity.  To make this more concrete, if you have a
dataset like

<a> rdfs:subClassOf <b> .
<b> rdfs: subClassOf <c> .
<foo> rdf:type <a> .

then you can match an instance to a superclass type:

ASK { <foo> rdf:type <c> . }

but you can not match a class to a non-directed superclass:

ASK { <a> rdfs:subClassOf <c> . }


My first attempt to get transitivity to work for rdfs:subClassOf was to
create a rule set out of the taxonomy data that defined all the
rdfs:subClassOf relationships between the taxons.  Then I tried this query:

    time curl \
    --data-urlencode "query=
        DEFINE input:inference 'http://example.com/rule_set/uniprot_taxonomy
'
        PREFIX qfo:<http://purl.questfororthologs.org/>
        PREFIX dcterms:<http://purl.org/dc/terms/>
        PREFIX obo:<http://purl.org/obo/owl/obo#>
        PREFIX up:<http://purl.uniprot.org/core/>
        PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
        PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
        PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
        SELECT (count(?taxon) as ?n)
        WHERE {
        # taxons that are a subclass of Primates
        ?taxon rdfs:subClassOf taxon:9443 .
        }
        LIMIT 1000
    " http://localhost:8890/sparql

    ...
    <binding name="n"><literal datatype="
http://www.w3.org/2001/XMLSchema#integer";>3</literal></binding>
    ...

There are 3 direct subclasses of Primates, so inferencing is not working
with this approach.  After thinking about it for a while, I figured that I
might need to explicitly load the RDF schema, where subClassOf is defined,
into Virtuoso.  So I created a rule set:

rdfs_rule_set('http://example.com/rule_set/rdfs', '
http://www.w3.org/2000/01/rdf-schema')

And then I ran my query using that ruleset:

    time curl \
    --data-urlencode "query=
        DEFINE input:inference 'http://example.com/rule_set/rdfs'
        PREFIX qfo:<http://purl.questfororthologs.org/>
        PREFIX dcterms:<http://purl.org/dc/terms/>
        PREFIX obo:<http://purl.org/obo/owl/obo#>
        PREFIX up:<http://purl.uniprot.org/core/>
        PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
        PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
        PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
        SELECT (count(?taxon) as ?n)
        WHERE {
        # taxons that are a subclass of Primates
        ?taxon rdfs:subClassOf taxon:9443 .
        }
        LIMIT 1000
    " http://localhost:8890/sparql

    ...
    <binding name="n"><literal datatype="
http://www.w3.org/2001/XMLSchema#integer";>3</literal></binding>
    ...
    real    0m0.015s

Again, only 3 matches were found, indicating that transitivity is not
working.  It should work because the page
http://www.w3.org/TR/rdf-schema/#ch_subclassof says that subClassOf is
transitive:

> The rdfs:subClassOf property is transitive.

However the definition of subClassOf in
http://www.w3.org/2000/01/rdf-schemadoes not mention that it is a
transitive property:

> <rdf:Property rdf:about="http://www.w3.org/2000/01/rdf-schema#subClassOf";>
>   <rdfs:isDefinedBy rdf:resource="http://www.w3.org/2000/01/rdf-schema#"/>
>   <rdfs:label>subClassOf</rdfs:label>
>   <rdfs:comment>The subject is a subclass of a class.</rdfs:comment>
>   <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
>   <rdfs:domain rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
> </rdf:Property>

Googling around, I found this page,
http://answers.semanticweb.com/questions/9997/rdfssubclassof-not-asserted-to-be-of-type-transitiveproperty-why,
where someone asks in the same question I am asking, "rdfs:subClassOf not
asserted to
be of type TransitiveProperty. why?"  Unfortunately the answers are mostly
in the airy realm of semantics that is inaccessible to the lay programmer.
 For example, "from a purely mathematical point of view, it is not
necessary to assert the transitivity because it can be derived. The
'semantic conditions' of subclass say that a model for RDFS is supposed to
map it on subsets."  Read that page for some lengthy and fascinating
explanations of the topic.

After more googling I found this page,
http://boards.openlinksw.com/phpBB3/viewtopic.php?f=21&t=1201, from January
2011, where
someone is having the exact problem I am having and comes up with a simple,
practical solution.  Ingeval says:

> If I add
>
>     rdfs:subClassOf rdf:type owl:TransitiveProperty .
>
> to the data set, i.e. inform Virtuoso to treat rdfs:subClassOf as
transitive
> property, then everything worked as expected.

This is great!  It adds the explicit transitive property annotation missing
from the definition in http://www.w3.org/2000/01/rdf-schema.  To test this
solution, I created a file with the transitive property annotation:

    <http://www.w3.org/2000/01/rdf-schema#subClassOf> <
http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
http://www.w3.org/2002/07/owl#TransitiveProperty> .

Then I loaded it into graph 'http://example.com/w3/rdfs/tweaks' and created
the rule set 'http://example.com/rule_set/rdfs_tweaks'.  Here is a test
with the new ruleset:

    time curl \
    --data-urlencode "query=
        DEFINE input:inference 'http://example.com/rule_set/rdfs_tweaks'
        PREFIX qfo:<http://purl.questfororthologs.org/>
        PREFIX dcterms:<http://purl.org/dc/terms/>
        PREFIX obo:<http://purl.org/obo/owl/obo#>
        PREFIX up:<http://purl.uniprot.org/core/>
        PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
        PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
        PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
        SELECT (count(?taxon) as ?n)
        WHERE {
        # taxons that are a subclass of Primates
        ?taxon rdfs:subClassOf taxon:9443 .
        }
        LIMIT 1000
    " http://localhost:8890/sparql

    <binding name="n"><literal datatype="
http://www.w3.org/2001/XMLSchema#integer";>798</literal></binding>
    real    0m0.020s

It works!

And here I am using transitivity inference to get the 8 QfO Taxons in the
Primate clade, something I could not get to work with property paths:

    time curl \
    --data-urlencode "query=
        DEFINE input:inference 'http://example.com/rule_set/rdfs_tweaks'
        PREFIX qfo:<http://purl.questfororthologs.org/>
        PREFIX dcterms:<http://purl.org/dc/terms/>
        PREFIX obo:<http://purl.org/obo/owl/obo#>
        PREFIX up:<http://purl.uniprot.org/core/>
        PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
        PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
        PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
        SELECT (count(?taxon) as ?n)
        WHERE {
          {
            SELECT DISTINCT ?taxon
            WHERE {
            # taxons are in QfO reference proteomes
            ?taxon dcterms:isPartOf qfo:reference_proteomes-2013_04 .
            # taxons that are a subclass of Primates
            ?taxon rdfs:subClassOf taxon:9443 .
            }
          }
        }
        LIMIT 1000
    " http://localhost:8890/sparql

    <binding name="n"><literal datatype="
http://www.w3.org/2001/XMLSchema#integer";>8</literal></binding>
    real    0m0.081s

I hope this helps save someone else some time smashing their heads into
their desks wondering why transitive inferencing does not work when so many
pages suggest that it should work.

Best regards,
Todd

-- 
Todd DeLuca
Scientific Programmer
Wall Lab, CBMI, Harvard Medical School
http://todddeluca.com
http://wall.hms.harvard.edu/
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to