Dear Virtuosans,
   I noticed that queries can be much slower when using variables as
properties (even if the variable can only have one value) e.g.:

DATA (extract):
:display :startRelation :posStart.
:gene0 rdf:type :Gene.
:gene0 :posStart "53416"^^xsd:numeric.
:gene1 rdf:type :Gene.
:gene1 :posStart "29513"^^xsd:numeric.
...

SIMPLE QUERY:
?gene :posStart ?start.

SLOWER QUERY:
:display :startRelation ?startRel.
?gene ?startRel ?start.

   I assume that the query engine first tries to match the "?gene
?startRel ?start" constraint, whereas begining by ":display
:startRelation ?startRel." would define the value of ?startRel which
would be used to find the start positions of the genes.
   I can live with the simple query, but the second one would make the
development of our application easier. Is there anything we could do to
improve the performance of the second query?

Thank you!
kind regards
Olivier Dameron

NB: for the record, below is the script I used for generating the
dataset, as well as the two queries

===== generateDataSet.py
#! /usr/bin/env python

import random

nbGenes = 50000
geneLengthMax = 100
chromosomeLength = 100000

with open("debugDataset.ttl", "w") as dataFile:
        dataFile.write("""
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/>.

:display :startRelation :posStart.
:display :stopRelation :posStop.

""")
        for i in range(nbGenes):
                geneIdent = ":gene" + str(i)
                dataFile.write("\n" + geneIdent + " rdf:type :Gene.\n")
                posStart = random.randint(1, chromosomeLength-geneLengthMax)
                posStop = posStart + random.randint(1, geneLengthMax)
                dataFile.write(geneIdent + " :posStart \"" + str(posStart) +
"\"^^xsd:numeric.\n")
                dataFile.write(geneIdent + " :posStop \"" + str(posStop) +
"\"^^xsd:numeric.\n")


===== getOverlap.sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/>

SELECT (count(*) as ?nbOverlap)

WHERE {
  ?gene1 a :Gene;
    :posStart ?start1;
    :posStop ?stop1.

  ?gene2 a :Gene;
    :posStart ?start2;
    :posStop ?stop2.
  FILTER (?start1 < ?start2 && ?start1 < ?stop2 && ?start2 < ?stop1)
}


===== getOverlapSLOW.sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/>

SELECT (count(*) as ?nbOverlap)

WHERE {
  :display :startRelation ?startRel.
  :display :stopRelation ?stopRel.

  ?gene1 a :Gene;
    ?startRel ?start1;
    ?stopRel ?stop1.

  ?gene2 a :Gene;
    ?startRel ?start2;
    ?stopRel ?stop2.
  FILTER (?start1 < ?start2 && ?start1 < ?stop2 && ?start2 < ?stop1)
}



------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to