Dear Virtuosans, I noticed that queries can be much slower when using variables as properties (even if the variable can only have one value) e.g.:
DATA (extract): :display :startRelation :posStart. :gene0 rdf:type :Gene. :gene0 :posStart "53416"^^xsd:numeric. :gene1 rdf:type :Gene. :gene1 :posStart "29513"^^xsd:numeric. ... SIMPLE QUERY: ?gene :posStart ?start. SLOWER QUERY: :display :startRelation ?startRel. ?gene ?startRel ?start. I assume that the query engine first tries to match the "?gene ?startRel ?start" constraint, whereas begining by ":display :startRelation ?startRel." would define the value of ?startRel which would be used to find the start positions of the genes. I can live with the simple query, but the second one would make the development of our application easier. Is there anything we could do to improve the performance of the second query? Thank you! kind regards Olivier Dameron NB: for the record, below is the script I used for generating the dataset, as well as the two queries ===== generateDataSet.py #! /usr/bin/env python import random nbGenes = 50000 geneLengthMax = 100 chromosomeLength = 100000 with open("debugDataset.ttl", "w") as dataFile: dataFile.write(""" @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/>. :display :startRelation :posStart. :display :stopRelation :posStop. """) for i in range(nbGenes): geneIdent = ":gene" + str(i) dataFile.write("\n" + geneIdent + " rdf:type :Gene.\n") posStart = random.randint(1, chromosomeLength-geneLengthMax) posStop = posStart + random.randint(1, geneLengthMax) dataFile.write(geneIdent + " :posStart \"" + str(posStart) + "\"^^xsd:numeric.\n") dataFile.write(geneIdent + " :posStop \"" + str(posStop) + "\"^^xsd:numeric.\n") ===== getOverlap.sparql PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/> SELECT (count(*) as ?nbOverlap) WHERE { ?gene1 a :Gene; :posStart ?start1; :posStop ?stop1. ?gene2 a :Gene; :posStart ?start2; :posStop ?stop2. FILTER (?start1 < ?start2 && ?start1 < ?stop2 && ?start2 < ?stop1) } ===== getOverlapSLOW.sparql PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX : <http://www.univ-rennes1.fr/odameron/debugVirtuoso/> SELECT (count(*) as ?nbOverlap) WHERE { :display :startRelation ?startRel. :display :stopRelation ?stopRel. ?gene1 a :Gene; ?startRel ?start1; ?stopRel ?stop1. ?gene2 a :Gene; ?startRel ?start2; ?stopRel ?stop2. FILTER (?start1 < ?start2 && ?start1 < ?stop2 && ?start2 < ?stop1) } ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports.http://sdm.link/zohodev2dev _______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users