On 28/03/14 23:26, Adeeb Noor wrote:
thanks Dave for the very useful answers. I have to check my KB and then
decide which way to go.
Another silly question: how can I remove the duplicate in my result below
SELECT DISTINCT *
WHERE {
?s ddids:x-kegg.pathway ?o.
?s2 ddids:x-kegg.pathway ?o.
FILTER (?s != ?s2 ) }
There needs to be a way to impose an arbitrary order on ?s and ?s2 so
that ?s is different from ?s2 in some way you can choose one over the other
FILTER ( str(?s) < str(?s2) )
Or, and this is less general as you compose patterns, project the column
and do DISTINCT
SELECT DISTINCT ?s ?o
Andy
------------------------------------------------------------------------------------
| s | o | s2
|
====================================================================================
| ddidd:C1514505 | <http://identifiers.org/kegg.pathway/hsa00590> |
ddidd:C1879725 |
| ddidd:C1879725 | <http://identifiers.org/kegg.pathway/hsa00590> |
ddidd:C1514505 |
------------------------------------------------------------------------------------
On Fri, Mar 28, 2014 at 9:15 AM, Dave Reynolds <[email protected]>wrote:
On 27/03/14 17:52, Adeeb Noor wrote:
Hi Dave:
Thank you so much for the very helpful comments, it is now more clear to
me
than before.
I totally agree that I need to figure out why I need to use something over
the other.
In my case for example, I have this huge TDB with 16GB that has lots of
biomedical data. I would like for example to find a gene that associated
with at least 3 different phenotype. Therefore, I can do this with the
following:
1- OWL (pellet)
<owl:Class rdf:about="
https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
DDID.owl#multiDiseases
">
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource="
https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
DDID.owl#gene_associated_with_disease
"/>
<owl:onClass rdf:resource="
https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.owl#Disease
"/>
<owl:minQualifiedCardinality
rdf:datatype="&xsd;nonNegativeInteger">3</owl:minQualifiedCardinality>
</owl:Restriction>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="
https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.rdf#DDID"/>
</owl:Class>
2- Construct:
CONSTRUCT {
?s ddids:gene_associated_with_disease ?o .
?s ddids:gene_associated_with_disease ?o1 .
?s ddids:gene_associated_with_disease ?o2 .}
WHERE {
?s ddids:gene_associated_with_disease ?o .
?s ddids:gene_associated_with_disease ?o1 .
?s ddids:gene_associated_with_disease ?o2 .
FILTER (?o != ?o1 )
FILTER (?o != ?o2 )
FILTER (?o1 != ?o2 )
}
and store the result of construct into new TDB and work on it.
3- sparql update
INSERT {
?s ddids:gene_has_multiple_association ?o
WHERE {
?s ddids:gene_associated_with_disease ?o .
?s ddids:gene_associated_with_disease ?o1 .
?s ddids:gene_associated_with_disease ?o2 .
FILTER (?o != ?o1 )
FILTER (?o != ?o2 )
FILTER (?o1 != ?o2 )
}
The three methods will at the end give me the same answer, but the
performance is different.
Not necessarily the same answer.
Your test is making a unique name assumption so that just because the
three disease values have different URIs then are different diseases.
I seem to recall that Pellet can be asked to make a default-UNA (i.e. go
outside the specs) so you could arrange for Pellet to generate similar
results but it should do so by default.
Imagine what would happen if you now add:
:d1 owl:sameAs :d2 .
:d2 owl:sameAs :d3 .
where :d1-3 are all associated with the same gene.
If I want to do this test in owl, it takes around
14 hours to complete, in construct 2 mins, and sparql updates less than a
minute.
What do you think Dave ?
Like I say, it depends what you want.
If that's the only question you want to answer and you can justify making
a strong unique name assumption then you can use #3 and it's certainly more
scalable.
If you need better guarantees of correctness but only need to operate over
parts of the data then you could use things like #2 to pull relevant data
into a small in-memory store and then do OWL reasoning over that. Though
you would have to pull in all relevant statements involving the resources
in your core query (c.f. the sameAs example) and even then there can be
indirect consequences that you miss.
Dave