thanks Dave for the very useful answers. I have to check my KB and then
decide which way to go.

Another silly question: how can I remove the duplicate in my result below

SELECT DISTINCT *

WHERE {

?s ddids:x-kegg.pathway ?o.

?s2 ddids:x-kegg.pathway ?o.

FILTER (?s != ?s2 ) }

------------------------------------------------------------------------------------

| s              | o                                              | s2
        |

====================================================================================

| ddidd:C1514505 | <http://identifiers.org/kegg.pathway/hsa00590> |
ddidd:C1879725 |

| ddidd:C1879725 | <http://identifiers.org/kegg.pathway/hsa00590> |
ddidd:C1514505 |

------------------------------------------------------------------------------------




On Fri, Mar 28, 2014 at 9:15 AM, Dave Reynolds <[email protected]>wrote:

> On 27/03/14 17:52, Adeeb Noor wrote:
>
>> Hi Dave:
>>
>> Thank you so much for the very helpful comments, it is now more clear to
>> me
>> than before.
>>
>> I totally agree that I need to figure out why I need to use something over
>> the other.
>>
>> In my case for example, I have this huge TDB with 16GB that has lots of
>> biomedical data. I would like for example to find a gene that associated
>> with at least 3 different phenotype. Therefore,  I can do this with the
>> following:
>>
>> 1- OWL (pellet)
>> <owl:Class rdf:about="
>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
>> DDID.owl#multiDiseases
>> ">
>>          <owl:equivalentClass>
>>              <owl:Restriction>
>>                  <owl:onProperty rdf:resource="
>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
>> DDID.owl#gene_associated_with_disease
>> "/>
>>                  <owl:onClass rdf:resource="
>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.owl#Disease
>> "/>
>>                  <owl:minQualifiedCardinality
>> rdf:datatype="&xsd;nonNegativeInteger">3</owl:minQualifiedCardinality>
>>              </owl:Restriction>
>>          </owl:equivalentClass>
>>          <rdfs:subClassOf rdf:resource="
>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.rdf#DDID"/>
>>      </owl:Class>
>>
>> 2- Construct:
>>
>> CONSTRUCT {
>>
>> ?s ddids:gene_associated_with_disease ?o .
>>
>> ?s ddids:gene_associated_with_disease ?o1 .
>>
>> ?s ddids:gene_associated_with_disease ?o2 .}
>>
>> WHERE {
>>
>> ?s ddids:gene_associated_with_disease ?o .
>>
>> ?s ddids:gene_associated_with_disease ?o1 .
>>
>> ?s ddids:gene_associated_with_disease ?o2 .
>>
>> FILTER (?o != ?o1 )
>>
>> FILTER (?o != ?o2 )
>>
>> FILTER (?o1 != ?o2 )
>>
>> }
>> and store the result of construct into new TDB and work on it.
>>
>> 3- sparql update
>>
>> INSERT {
>> ?s ddids:gene_has_multiple_association ?o
>>
>> WHERE {
>>
>> ?s ddids:gene_associated_with_disease ?o .
>>
>> ?s ddids:gene_associated_with_disease ?o1 .
>>
>> ?s ddids:gene_associated_with_disease ?o2 .
>>
>> FILTER (?o != ?o1 )
>>
>> FILTER (?o != ?o2 )
>>
>> FILTER (?o1 != ?o2 )
>>
>> }
>> The three methods will at the end give me the same answer, but the
>> performance is different.
>>
>
> Not necessarily the same answer.
>
> Your test is making a unique name assumption so that just because the
> three disease values have different URIs then are different diseases.
>
> I seem to recall that Pellet can be asked to make a default-UNA (i.e. go
> outside the specs) so you could arrange for Pellet to generate similar
> results but it should do so by default.
>
> Imagine what would happen if you now add:
>
>   :d1   owl:sameAs   :d2 .
>   :d2   owl:sameAs   :d3 .
>
> where :d1-3 are all associated with the same gene.
>
>
>  If I want to do this test in owl, it takes around
>> 14 hours to complete, in construct 2 mins, and sparql updates less than a
>> minute.
>>
>> What do you think Dave ?
>>
>
> Like I say, it depends what you want.
>
> If that's the only question you want to answer and you can justify making
> a strong unique name assumption then you can use #3 and it's certainly more
> scalable.
>
> If you need better guarantees of correctness but only need to operate over
> parts of the data then you could use things like #2 to pull relevant data
> into a small in-memory store and then do OWL reasoning over that. Though
> you would have to pull in all relevant statements involving the resources
> in your core query (c.f. the sameAs example) and even then there can be
> indirect consequences that you miss.
>
> Dave
>
>


-- 
Adeeb Noor
Ph.D. Candidate
Dept of Computer Science
University of Colorado at Boulder
Cell: 571-484-3303
Email: [email protected]

Reply via email to