Thanks, Andy it works.
Another question please, I have this very long query with too many filters
and my tdb is 15G. I have the problem with time to get the results back.
For instance, when I put limit it returns fast but with no limit, it takes
me for ever. I was wondering if there is a way to do cashing in the query.
Below is the example:
SELECT DISTINCT *
WHERE {
?d ddids:has_pharmgkb_variantLocation_association ?snp1 .
FILTER (replace(str(?snp1),"([^_]*_){3}","") =
replace(str(?snp2),"([^_]*_){3}","") )
FILTER (replace(str(?snp1),"([^_]*_){3}","") =
replace(str(?snp3),"([^_]*_){3}","") )
ddidd:C0123931 ddids:has_pharmgkb_variantLocation_association ?snp2 .
?drug2 ddids:has_pharmgkb_variantLocation_association ?snp3 .
FILTER (replace(str(?snp2),"([^_]*_){3}","") =
replace(str(?snp3),"([^_]*_){3}","") )
ddidd:C0123931 ddids:drugBank_enzyme ?enzyme .
?drug2 ddids:drugBank_enzyme ?enzyme .
ddidd:C0123931 ddids:has_pharmgkb_gene_association ?gene1 .
?drug2 ddids:has_pharmgkb_gene_association ?gene2 .
FILTER (replace(str(?gene1),"([^_]*_){3}","") =
replace(str(?gene2),"([^_]*_){3}","") )
?drug2 ddids:drugBank_category "Approved"^^xsd:string .
?enzyme ddids:label ?lenzyme.
?drug2 rdfs:label ?ldrug2.
ddidd:C0123931 ddids:label ?ldrug .
?d a ddids:Disease .
?d ddids:label ?disease.
FILTER ( str (ddidd:C0123931) < str (?drug2) )
FILTER ( str (?gene1) < str (?gene2) )
FILTER ( str (?snp2) < str (?snp3) )
}
On Sat, Mar 29, 2014 at 4:02 AM, Andy Seaborne <[email protected]> wrote:
> On 28/03/14 23:26, Adeeb Noor wrote:
>
>> thanks Dave for the very useful answers. I have to check my KB and then
>> decide which way to go.
>>
>> Another silly question: how can I remove the duplicate in my result below
>>
>> SELECT DISTINCT *
>>
>> WHERE {
>>
>> ?s ddids:x-kegg.pathway ?o.
>>
>> ?s2 ddids:x-kegg.pathway ?o.
>>
>> FILTER (?s != ?s2 ) }
>>
>
> There needs to be a way to impose an arbitrary order on ?s and ?s2 so that
> ?s is different from ?s2 in some way you can choose one over the other
>
> FILTER ( str(?s) < str(?s2) )
>
> Or, and this is less general as you compose patterns, project the column
> and do DISTINCT
>
> SELECT DISTINCT ?s ?o
>
> Andy
>
>
>
>> ------------------------------------------------------------
>> ------------------------
>>
>> | s | o | s2
>> |
>>
>> ============================================================
>> ========================
>>
>> | ddidd:C1514505 | <http://identifiers.org/kegg.pathway/hsa00590> |
>> ddidd:C1879725 |
>>
>> | ddidd:C1879725 | <http://identifiers.org/kegg.pathway/hsa00590> |
>> ddidd:C1514505 |
>>
>> ------------------------------------------------------------
>> ------------------------
>>
>>
>>
>>
>> On Fri, Mar 28, 2014 at 9:15 AM, Dave Reynolds <[email protected]
>> >wrote:
>>
>> On 27/03/14 17:52, Adeeb Noor wrote:
>>>
>>> Hi Dave:
>>>>
>>>> Thank you so much for the very helpful comments, it is now more clear to
>>>> me
>>>> than before.
>>>>
>>>> I totally agree that I need to figure out why I need to use something
>>>> over
>>>> the other.
>>>>
>>>> In my case for example, I have this huge TDB with 16GB that has lots of
>>>> biomedical data. I would like for example to find a gene that associated
>>>> with at least 3 different phenotype. Therefore, I can do this with the
>>>> following:
>>>>
>>>> 1- OWL (pellet)
>>>> <owl:Class rdf:about="
>>>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
>>>> DDID.owl#multiDiseases
>>>> ">
>>>> <owl:equivalentClass>
>>>> <owl:Restriction>
>>>> <owl:onProperty rdf:resource="
>>>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
>>>> DDID.owl#gene_associated_with_disease
>>>> "/>
>>>> <owl:onClass rdf:resource="
>>>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/
>>>> DDID.owl#Disease
>>>> "/>
>>>> <owl:minQualifiedCardinality
>>>> rdf:datatype="&xsd;nonNegativeInteger">3</owl:minQualifiedCardinality>
>>>> </owl:Restriction>
>>>> </owl:equivalentClass>
>>>> <rdfs:subClassOf rdf:resource="
>>>> https://csel.cs.colorado.edu/~noor/Drug_Disease_ontology/DDID.rdf#DDID
>>>> "/>
>>>> </owl:Class>
>>>>
>>>> 2- Construct:
>>>>
>>>> CONSTRUCT {
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o .
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o1 .
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o2 .}
>>>>
>>>> WHERE {
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o .
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o1 .
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o2 .
>>>>
>>>> FILTER (?o != ?o1 )
>>>>
>>>> FILTER (?o != ?o2 )
>>>>
>>>> FILTER (?o1 != ?o2 )
>>>>
>>>> }
>>>> and store the result of construct into new TDB and work on it.
>>>>
>>>> 3- sparql update
>>>>
>>>> INSERT {
>>>> ?s ddids:gene_has_multiple_association ?o
>>>>
>>>> WHERE {
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o .
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o1 .
>>>>
>>>> ?s ddids:gene_associated_with_disease ?o2 .
>>>>
>>>> FILTER (?o != ?o1 )
>>>>
>>>> FILTER (?o != ?o2 )
>>>>
>>>> FILTER (?o1 != ?o2 )
>>>>
>>>> }
>>>> The three methods will at the end give me the same answer, but the
>>>> performance is different.
>>>>
>>>>
>>> Not necessarily the same answer.
>>>
>>> Your test is making a unique name assumption so that just because the
>>> three disease values have different URIs then are different diseases.
>>>
>>> I seem to recall that Pellet can be asked to make a default-UNA (i.e. go
>>> outside the specs) so you could arrange for Pellet to generate similar
>>> results but it should do so by default.
>>>
>>> Imagine what would happen if you now add:
>>>
>>> :d1 owl:sameAs :d2 .
>>> :d2 owl:sameAs :d3 .
>>>
>>> where :d1-3 are all associated with the same gene.
>>>
>>>
>>> If I want to do this test in owl, it takes around
>>>
>>>> 14 hours to complete, in construct 2 mins, and sparql updates less than
>>>> a
>>>> minute.
>>>>
>>>> What do you think Dave ?
>>>>
>>>>
>>> Like I say, it depends what you want.
>>>
>>> If that's the only question you want to answer and you can justify making
>>> a strong unique name assumption then you can use #3 and it's certainly
>>> more
>>> scalable.
>>>
>>> If you need better guarantees of correctness but only need to operate
>>> over
>>> parts of the data then you could use things like #2 to pull relevant data
>>> into a small in-memory store and then do OWL reasoning over that. Though
>>> you would have to pull in all relevant statements involving the resources
>>> in your core query (c.f. the sameAs example) and even then there can be
>>> indirect consequences that you miss.
>>>
>>> Dave
>>>
>>>
>>>
>>
>>
>
--
Adeeb Noor
Ph.D. Candidate
Dept of Computer Science
University of Colorado at Boulder
Cell: 571-484-3303
Email: [email protected]