Re: puzzling performance issue

2021-10-19 Thread Rob Vesse
Because it isn't a valid semantics preserving optimization.  ARQ only applies 
optimizations that preserve the semantics of the query, the fact that this is 
ultimately a CONSTRUCT query doesn't change the semantics of the query 
evaluation itself, merely the final RDF produced.  Making the rewrite you 
suggest changes the semantics of the query significantly so ARQ is never going 
to automatically apply that for you.

ARQ allows you to write and use custom optimizers/query rewriters/query engines 
if you so wish.  If you know your data and the types of queries that 
should/shouldn't be being written then you can provide your own custom 
components to handle this "optimisation" for your use case.

Thanks,

Rob

On 19/10/2021, 10:09, "Élie Roux"  wrote:

> As others pointed out semantically the evaluation of the two query forms 
yields very different intermediate results.  It's only the presence of the 
post-processing CONSTRUCT stage that happens to strip out the 
duplicates/unusable results.  Any optimizations MUST preserve the overall 
semantics of the query to ensure they yield the same results, so no you 
couldn't apply an optimization in this case.  In this specific case CONSTRUCT 
happens as a post-processing step as part of producing the final query results, 
it's not actually within the portion of query evaluation that is subject to 
ARQs optimizer.

Thanks! I still don't understand though... is there a reason why the
ARQs optimizer can't have a boolean flag telling it it can optimize
things for a CONSTRUCT?

Best,
-- 
Elie






Re: puzzling performance issue

2021-10-19 Thread Élie Roux
> As others pointed out semantically the evaluation of the two query forms 
> yields very different intermediate results.  It's only the presence of the 
> post-processing CONSTRUCT stage that happens to strip out the 
> duplicates/unusable results.  Any optimizations MUST preserve the overall 
> semantics of the query to ensure they yield the same results, so no you 
> couldn't apply an optimization in this case.  In this specific case CONSTRUCT 
> happens as a post-processing step as part of producing the final query 
> results, it's not actually within the portion of query evaluation that is 
> subject to ARQs optimizer.

Thanks! I still don't understand though... is there a reason why the
ARQs optimizer can't have a boolean flag telling it it can optimize
things for a CONSTRUCT?

Best,
-- 
Elie


Re: puzzling performance issue

2021-10-19 Thread Rob Vesse
Not really

This pattern of unconnected BGPs has legitimate use cases.  A common one is 
doing similarity calculations where you use unconnected BGPs to create every 
possible combination of results and then use BIND and/or FILTER to compute some 
metric and use that to filter/rank the combinations.

As others pointed out semantically the evaluation of the two query forms yields 
very different intermediate results.  It's only the presence of the 
post-processing CONSTRUCT stage that happens to strip out the 
duplicates/unusable results.  Any optimizations MUST preserve the overall 
semantics of the query to ensure they yield the same results, so no you 
couldn't apply an optimization in this case.  In this specific case CONSTRUCT 
happens as a post-processing step as part of producing the final query results, 
it's not actually within the portion of query evaluation that is subject to 
ARQs optimizer.

You could write a query analyzer that would highlight these kinds of potential 
issues but ultimately it comes down to the query author understanding what they 
are trying to achieve and making the appropriate choice for their use cases.  
As I said at the start this pattern has legitimate use cases even if it has 
performance implications.

Rob

On 07/10/2021, 12:57, "Élie Roux"  wrote:

> Overall, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or
> 16+26+2636 rows (union).

Yes, I understand better now, thanks!

Do you think there might be some optimization at some point for that
case? I suspect this is very common in SPARQL queries out there...

Best,
-- 
Elie






Re: puzzling performance issue

2021-10-07 Thread Élie Roux
> Overall, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or
> 16+26+2636 rows (union).

Yes, I understand better now, thanks!

Do you think there might be some optimization at some point for that
case? I suspect this is very common in SPARQL queries out there...

Best,
-- 
Elie


Re: puzzling performance issue

2021-10-07 Thread Andy Seaborne




On 07/10/2021 12:30, Élie Roux wrote:

if you take this expression

WHERE
{
  {
 bdr:MW23703_1183 ?instp ?insto . # 200ms alone
  } union {
 bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone
} union {
   bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart
?ancestorPart . # 200ms alone
  }
}

separate it into three distinct queries and aggregate the results with 
respective (count(*) as ?count) select bindings, what numbers result?


first bgp: 16
second: 26
third: 2636 (same with count unique)


The patterns are not connected.

Ovaerll, it whether the WHERE answer is 16*26*2636 rows (all one BGP) or 
16+26+2636 rows (union).


Andy



Best,



Re: puzzling performance issue

2021-10-07 Thread Andy Seaborne
When there are different parts of pattern going to make up different 
parts of the CONSTRUCT template, splitting it up into UNION makes sense. 
It is using the fact that in a CONSTRUCT template, if variables are 
unbound, the triple pattern isn't substantiated but the rest of the 
triples from the template are still generated.


Following on from Richard, usin UNION turns a expontial growth into 
linear growth.


A better solution is to support the idiom better.

That is have: a single request that is

CONSTRUCT { template1 } WHERE { pattern1 } ;
CONSTRUCT { template2 } WHERE { pattern2 } ;
CONSTRUCT { template3 } WHERE { pattern3 }
...

(c.f SPARQL Update).

to form one single result graph.

Andy

On 07/10/2021 11:57, Élie Roux wrote:

Thanks a lot for your very informative answer Richard, it's really
helpful to know when writing queries!

It seems this is a case where some optimizations might be implemented?
(I'm afraid this isn't something I could contribute though, sorry)

Best,



Re: puzzling performance issue

2021-10-07 Thread Élie Roux
> if you take this expression
>
> WHERE
> {
>  {
> bdr:MW23703_1183 ?instp ?insto . # 200ms alone
>  } union {
> bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone
> } union {
>   bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart
> ?ancestorPart . # 200ms alone
>  }
> }
>
> separate it into three distinct queries and aggregate the results with 
> respective (count(*) as ?count) select bindings, what numbers result?

first bgp: 16
second: 26
third: 2636 (same with count unique)

Best,
-- 
Elie


Re: puzzling performance issue

2021-10-07 Thread Élie Roux
Thanks a lot for your very informative answer Richard, it's really
helpful to know when writing queries!

It seems this is a case where some optimizations might be implemented?
(I'm afraid this isn't something I could contribute though, sorry)

Best,
-- 
Elie


Re: puzzling performance issue

2021-10-07 Thread Richard Cyganiak
Queries of the form

CONSTRUCT {...} WHERE {...}

are evaluated with a three-stage pipeline. First, the query

SELECT * WHERE {...}

is executed. Second, the CONSTRUCT template is applied to each result row 
(producing no triple for any triple pattern that has a variable without value 
in that row). Third, any duplicate triples produced in step 2 are removed.

If you run both versions of your query in the SELECT * WHERE {...} form, you 
will see that the version without UNION produces a much larger intermediate 
result. It is only due to the duplicate removal in step 3 that you get the same 
final result from both versions. This larger intermediate result explains why 
the version without UNION is much slower.

Hope that helps,
Richard



> On 6 Oct 2021, at 21:59, Élie Roux  wrote:
> 
> After long hours of anxiety, I discovered that using unions as in
> 
> CONSTRUCT
> {
>   bdr:MW23703_1183 ?instp ?insto .
>   ?t ?tp ?to .
>   ?ancestor :hasPart ?ancestorPart .
> }
> WHERE
> {
>  {
> bdr:MW23703_1183 ?instp ?insto . # 200ms alone
>  } union {
> bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone
> } union {
>   bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart
> ?ancestorPart . # 200ms alone
>  }
> }
> is much faster than using the bgp in the same group... is there any
> reason for that?
> 
> Best,
> -- 
> Elie



Re: puzzling performance issue

2021-10-06 Thread Élie Roux
After long hours of anxiety, I discovered that using unions as in

CONSTRUCT
{
   bdr:MW23703_1183 ?instp ?insto .
   ?t ?tp ?to .
   ?ancestor :hasPart ?ancestorPart .
}
WHERE
{
  {
 bdr:MW23703_1183 ?instp ?insto . # 200ms alone
  } union {
 bdr:MW23703_1183 :hasTitle ?t . ?t ?tp ?to . #245ms alone
 } union {
   bdr:MW23703_1183 :partOf+ ?ancestor . ?ancestor :hasPart
?ancestorPart . # 200ms alone
  }
}
is much faster than using the bgp in the same group... is there any
reason for that?

Best,
-- 
Elie