Re: puzzling performance issue
Because it isn't a valid semantics preserving optimization. ARQ only applies optimizations that preserve the semantics of the query, the fact that this is ultimately a CONSTRUCT query doesn't change the semantics of the query evaluation itself, merely the final RDF produced. Making the rewrite you suggest changes the semantics of the query significantly so ARQ is never going to automatically apply that for you. ARQ allows you to write and use custom optimizers/query rewriters/query engines if you so wish. If you know your data and the types of queries that should/shouldn't be being written then you can provide your own custom components to handle this "optimisation" for your use case. Thanks, Rob On 19/10/2021, 10:09, "Élie Roux" wrote: > As others pointed out semantically the evaluation of the two query forms yields very different intermediate results. It's only the presence of the post-processing CONSTRUCT stage that happens to strip out the duplicates/unusable results. Any optimizations MUST preserve the overall semantics of the query to ensure they yield the same results, so no you couldn't apply an optimization in this case. In this specific case CONSTRUCT happens as a post-processing step as part of producing the final query results, it's not actually within the portion of query evaluation that is subject to ARQs optimizer. Thanks! I still don't understand though... is there a reason why the ARQs optimizer can't have a boolean flag telling it it can optimize things for a CONSTRUCT? Best, -- Elie
Re: puzzling performance issue
> As others pointed out semantically the evaluation of the two query forms > yields very different intermediate results. It's only the presence of the > post-processing CONSTRUCT stage that happens to strip out the > duplicates/unusable results. Any optimizations MUST preserve the overall > semantics of the query to ensure they yield the same results, so no you > couldn't apply an optimization in this case. In this specific case CONSTRUCT > happens as a post-processing step as part of producing the final query > results, it's not actually within the portion of query evaluation that is > subject to ARQs optimizer. Thanks! I still don't understand though... is there a reason why the ARQs optimizer can't have a boolean flag telling it it can optimize things for a CONSTRUCT? Best, -- Elie
Re: puzzling performance issue
Not really This pattern of unconnected BGPs has legitimate use cases. A common one is doing similarity calculations where you use unconnected BGPs to create every possible combination of results and then use BIND and/or FILTER to compute some metric and use that to filter/rank the combinations. As others pointed out semantically the evaluation of the two query forms yields very different intermediate results. It's only the presence of the post-processing CONSTRUCT stage that happens to strip out the duplicates/unusable results. Any optimizations MUST preserve the overall semantics of the query to ensure they yield the same results, so no you couldn't apply an optimization in this case. In this specific case CONSTRUCT happens as a post-processing step as part of producing the final query results, it's not actually within the portion of query evaluation that is subject to ARQs optimizer. You could write a query analyzer that would highlight these kinds of potential issues but ultimately it comes down to the query author understanding what they are trying to achieve and making the appropriate choice for their use cases. As I said at the start this pattern has legitimate use cases even if it has performance implications. Rob On 07/10/2021, 12:57, "Élie Roux" wrote: > Overall, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or > 16+26+2636 rows (union). Yes, I understand better now, thanks! Do you think there might be some optimization at some point for that case? I suspect this is very common in SPARQL queries out there... Best, -- Elie
Re: puzzling performance issue
> Overall, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or > 16+26+2636 rows (union). Yes, I understand better now, thanks! Do you think there might be some optimization at some point for that case? I suspect this is very common in SPARQL queries out there... Best, -- Elie
Re: puzzling performance issue
On 07/10/2021 12:30, Élie Roux wrote: if you take this expression WHERE { { bdr:MW23703_1183 ?instp ?insto . # 200ms alone } union { bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone } union { bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart ?ancestorPart . # 200ms alone } } separate it into three distinct queries and aggregate the results with respective (count(*) as ?count) select bindings, what numbers result? first bgp: 16 second: 26 third: 2636 (same with count unique) The patterns are not connected. Ovaerll, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or 16+26+2636 rows (union). Andy Best,
Re: puzzling performance issue
When there are different parts of pattern going to make up different parts of the CONSTRUCT template, splitting it up into UNION makes sense. It is using the fact that in a CONSTRUCT template, if variables are unbound, the triple pattern isn't substantiated but the rest of the triples from the template are still generated. Following on from Richard, usin UNION turns a expontial growth into linear growth. A better solution is to support the idiom better. That is have: a single request that is CONSTRUCT { template1 } WHERE { pattern1 } ; CONSTRUCT { template2 } WHERE { pattern2 } ; CONSTRUCT { template3 } WHERE { pattern3 } ... (c.f SPARQL Update). to form one single result graph. Andy On 07/10/2021 11:57, Élie Roux wrote: Thanks a lot for your very informative answer Richard, it's really helpful to know when writing queries! It seems this is a case where some optimizations might be implemented? (I'm afraid this isn't something I could contribute though, sorry) Best,
Re: puzzling performance issue
> if you take this expression > > WHERE > { > { > bdr:MW23703_1183 ?instp ?insto . # 200ms alone > } union { > bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone > } union { > bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart > ?ancestorPart . # 200ms alone > } > } > > separate it into three distinct queries and aggregate the results with > respective (count(*) as ?count) select bindings, what numbers result? first bgp: 16 second: 26 third: 2636 (same with count unique) Best, -- Elie
Re: puzzling performance issue
Thanks a lot for your very informative answer Richard, it's really helpful to know when writing queries! It seems this is a case where some optimizations might be implemented? (I'm afraid this isn't something I could contribute though, sorry) Best, -- Elie
Re: puzzling performance issue
Queries of the form CONSTRUCT {...} WHERE {...} are evaluated with a three-stage pipeline. First, the query SELECT * WHERE {...} is executed. Second, the CONSTRUCT template is applied to each result row (producing no triple for any triple pattern that has a variable without value in that row). Third, any duplicate triples produced in step 2 are removed. If you run both versions of your query in the SELECT * WHERE {...} form, you will see that the version without UNION produces a much larger intermediate result. It is only due to the duplicate removal in step 3 that you get the same final result from both versions. This larger intermediate result explains why the version without UNION is much slower. Hope that helps, Richard > On 6 Oct 2021, at 21:59, Élie Roux wrote: > > After long hours of anxiety, I discovered that using unions as in > > CONSTRUCT > { > bdr:MW23703_1183 ?instp ?insto . > ?t ?tp ?to . > ?ancestor :hasPart ?ancestorPart . > } > WHERE > { > { > bdr:MW23703_1183 ?instp ?insto . # 200ms alone > } union { > bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone > } union { > bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart > ?ancestorPart . # 200ms alone > } > } > is much faster than using the bgp in the same group... is there any > reason for that? > > Best, > -- > Elie
Re: puzzling performance issue
After long hours of anxiety, I discovered that using unions as in CONSTRUCT { bdr:MW23703_1183 ?instp ?insto . ?t ?tp ?to . ?ancestor :hasPart ?ancestorPart . } WHERE { { bdr:MW23703_1183 ?instp ?insto . # 200ms alone } union { bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone } union { bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart ?ancestorPart . # 200ms alone } } is much faster than using the bgp in the same group... is there any reason for that? Best, -- Elie