Here is an actual query, partially obfuscated. It returns about 18K nodes in 40 seconds, from a dataset of about 17M triples. (The nodes are not necessarily distinct.)
The predominant graph structure is like: ?node <- ?lsu -> ?detail -> LSUPROPERTYVALUE Thanks for your attention and any suggestions for improvement. prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix lsu: <http://rules.example.org/ns/lsu#> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT (count(?node) as ?cnt) WHERE { ?detail lsu:source "XYZ". ?detail lsu:length-type "Ltype". ?detail lsu:max-length-exclusive ?maxe_len; lsu:max-length-inclusive ?maxi_len; lsu:min-length-inclusive ?mine_len; lsu:min-length-exclusive ?mini_len. FILTER ( (?maxe_len = rdf:nil || ?maxe_len > "95"^^xsd:decimal) && (?maxi_len = rdf:nil || ?maxi_len >= "95"^^xsd:decimal) && (?mine_len = rdf:nil || ?mine_len < "95"^^xsd:decimal) && (?mini_len = rdf:nil || ?mini_len <= "95"^^xsd:decimal) ) ?detail lsu:date-type "Date type 1". {{ ?detail lsu:retroactive true; lsu:end-date rdf:nil . } UNION { ?detail lsu:retroactive false; lsu:start-date ?start ; lsu:end-date ?end . FILTER (?start <= "2006-08-11"^^xsd:date && (?end = rdf:nil || ?end >= "2006-08-11"^^xsd:date)) }} ?detail lsu:minimum-age ?min_age; lsu:maximum-age ?max_age. FILTER ((?max_age = rdf:nil || ?max_age >= 8) && (?min_age = 0 || ?min_age < 8)) ?detail lsu:applicable-for "adfsda" . ?detail lsu:v-type ?v_type. FILTER (?v_type in (rdf:nil, <http://www.example.org/2015/7/abc>)) ?detail lsu:s-type ?s_type. FILTER (?s_type in (rdf:nil, <http://www.example.org/2015/7/dsfgdsa>)) ?detail lsu:max-gg-exclusive ?maxe_gg; lsu:max-gg-inclusive ?maxi_gg; lsu:min-gg-inclusive ?mine_gg; lsu:min-gg-exclusive ?mini_gg. FILTER ( (?maxe_gg = rdf:nil || ?maxe_gg > "50"^^xsd:decimal) && (?maxi_gg = rdf:nil || ?maxi_gg >= "50"^^xsd:decimal) && (?mine_gg = rdf:nil || ?mine_gg < "50"^^xsd:decimal) && (?mini_gg = rdf:nil || ?mini_gg <= "50"^^xsd:decimal) ) ?detail lsu:h-m ?h_m. FILTER (?h_m in (rdf:nil, <http://www.example.org/2015/7/hm1>)) {{ ?detail lsu:v-func ?v_func. FILTER (?v_func in (<http://www.example.org/2015/7/vf1>,<http://www.example.org/2015/7/vf2>)) } UNION { ?detail lsu:c-n ?c_n. FILTER (?c_n in (<http://www.example.org/2015/7/cn1>,<http://www.example.org/2015/7/cn2>,<http://www.example.org/2015/7/cn3>,<http://www.example.org/2015/7/cn4>)) }} ?lsu lsu:lsu-d ?detail. ?lsu lsu:aF ?node. } On Thu, 2016-01-07 at 12:36 +0000, Andy Seaborne wrote: > It looks like it is the query cost and not the > > > So I conclude we are seeing the best performance possible unless there > > is something terribly wrong with my queries. They are essentially of the > > form: > > > > Details matter here - can you show a real query? > > > select ?s > > where { > > ?nd :prop1 <uri1>; > > :prop2 "lit1"; > > :prop3 ?var1; > > :prop4 ?var2; > > # more properties of ?s > > ?s doesn't appear until later. > > There is a chance there are cross products in the real query. > > > filter (?var1 > N1 && ?var1 < N2) > > filter (?var2 in (<uriA>,<uriB>,...)) > > This usually gets optimized - maybe something else in your query is > blocking that. > > Filter order can matter as well. > > > #more filters on ?nd properties > > ?s :p1/:p2 ?nd. > > } > > > > Some of the filters get a little more complicated. And there is at least > > one, possibly 2, UNION clauses. No OPTIONAL clauses. I've dissected the > > queries and run each individual piece (triple + filter), and it seems to > > be the more complicated filters that start to slow things down, as might > > be expected. > > > > Thanks for your comments and interest. The performance we're seeing is > > unacceptable for our application requirements, so I wanted to see if > > there were any other performance factors I had missed. > > Andy > > On 07/01/16 08:48, Håvard Mikkelsen Ottestad wrote: > > Hi, > > > > Reordering the filters might help. > > > > Also, maybe a stats file would reorder your query to be faster. I dunno how > > often (or if) fuseki generates a stats file. You can try to generate one by > > hand when fuseki is shutdown: > > https://jena.apache.org/documentation/tdb/optimizer.html > > > > Also I’m wondering what the performance is like if you take this line away: > > ?s :p1/:p2 ?nd. > > > > > > One major performance drain I have seen in the past is filters on string > > literals. Especially if you are doing anything like CONTAINS or LOWERCASE. > > Do you have any of that? > > > > Håvard > > > > > > > > > > On 07/01/16 03:51, "Paul Tyson" <[email protected]> wrote: > > > >> On Wed, 2016-01-06 at 18:52 +0000, Andy Seaborne wrote: > >>> Hi Paul, > >>> > >>> > My question is: is total query time limited by search execution speed, > >>> > or by marshaling and serialization of search results? > >>> > >>> Costs are a bit of both but normally mainly query. It also depends on > >>> the client processing. > >>> > >>> Some context please: > >>> 1/ What's the storage layer? > >> TDB behind fuseki 2.3.1 > >> > >>> 2/ What result set format are you getting? > >> text/csv > >> > >>> 3/ How are you handling the results on receipt in the client? > >> Just writing them to file for testing. > >> > >>> > >>> (Håvard point about seeing data and query also applies) > >> Sorry, not easy to share the data. > >> > >>> > >>> The important point is that output is streamed. > >>> > >>> Result sent while the query is execution; it is not the case that the > >>> query executes,. all the results calculated and then results are produced. > >>> > >>> To investigate, modify the query to do something like this > >>> > >>> SELECT (count(*) AS ?C) { ... } > >>> > >>> because then the result set cost is low and all the query is executed > >>> before a result can be produced. > >>> > >> Yes, I did that, and the time is very nearly the same. > >> > >> So I conclude we are seeing the best performance possible unless there > >> is something terribly wrong with my queries. They are essentially of the > >> form: > >> > >> select ?s > >> where { > >> ?nd :prop1 <uri1>; > >> :prop2 "lit1"; > >> :prop3 ?var1; > >> :prop4 ?var2; > >> # more properties of ?s > >> filter (?var1 > N1 && ?var1 < N2) > >> filter (?var2 in (<uriA>,<uriB>,...)) > >> #more filters on ?nd properties > >> ?s :p1/:p2 ?nd. > >> } > >> > >> Some of the filters get a little more complicated. And there is at least > >> one, possibly 2, UNION clauses. No OPTIONAL clauses. I've dissected the > >> queries and run each individual piece (triple + filter), and it seems to > >> be the more complicated filters that start to slow things down, as might > >> be expected. > >> > >> Thanks for your comments and interest. The performance we're seeing is > >> unacceptable for our application requirements, so I wanted to see if > >> there were any other performance factors I had missed. > >> > >> Regards, > >> --Paul > >> > >>> Andy > >>> > >>> > >>> On 06/01/16 16:17, Paul Tyson wrote: > >>>> I have a modest (17M triple) dataset, fairly flat graph. I run some > >>>> queries selecting nodes with anywhere from 12-20 different property > >>>> values. > >>>> > >>>> Result set counts are anywhere from 10,000 to 30,000 nodes. Total > >>>> execution time measured at client are in the 30-40 second range. > >>>> > >>>> The web request begins streaming results immediately, but seems to take > >>>> longer than it should (based on the number of results and size of data > >>>> transfer). I also notice that the time is roughly linear with the size > >>>> of dataset--halving the dataset size halves the result set and the > >>>> execution time. I wouldn't have expected this behavior if all the time > >>>> was due to an indexed search. > >>>> > >>>> My question is: is total query time limited by search execution speed, > >>>> or by marshaling and serialization of search results? > >>>> > >>>> I have tried different query patterns, and believe I have the best > >>>> queries possible for the use case. > >>>> > >>>> I'm looking for other suggestions to reduce overall execution time. The > >>>> performance does not improve drastically going from 4Gb to 8 or 16Gb > >>>> RAM. My test platforms are 64-bit Windows, ranging from small server > >>>> (16Gb RAM, 4 CPU) to laptops with 4Gb RAM. > >>>> > >>>> Thanks, > >>>> --Paul > >>>> > >>> > >> > >> >
