Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Andy Seaborne

JENA-2332 and PR 1364.

Andy

https://issues.apache.org/jira/browse/JENA-2332

https://github.com/apache/jena/pull/1364

On 03/06/2022 18:29, Andy Seaborne wrote:

Probably a bug then.

Are you going to be making improvements to query 
tranformation/optimization as part of your work on the enhanced SERVICE 
handling on the active PR?


     Andy

On 03/06/2022 10:39, Claus Stadler wrote:

Hi again,


I think the point was missed; what I was actually after is that in the 
following query a "join" is optimized into a "sequence"


and I wonder whether this is the correct behavior if a LIMIT/OFFSET is 
present.


So running the following query with optimize enabled/disabled gives 
different results:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


➜  bin ./arq --query service-query.rq

   (sequence !

 (service 
   (slice _ 5
 (bgp (triple ?s 
 


 (service 
   (slice _ 1
 (bgp (triple ?s  
?x)


--- 

| s   | 
x |
=== 

|    | "Aarti 
Mukherjee"@en  |
|     | "Abatte 
Barihun"@en   |
|     | "Abby 
Abadi"@en   |
|  | "Abd al 
Malik"@de |
|   | "Abdul Wahid 
Khan"@en |
--- 




./arq --explain --optimize=no --query service-query.rq
   (join !
 (service 
   (slice _ 5
 (bgp (triple ?s 
 


 (service 
   (slice _ 1
 (bgp (triple ?s  
?x)

-
| s | x |
=
-


Cheers,

Claus


On 03.06.22 10:22, Andy Seaborne wrote:



On 02/06/2022 21:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub 
query with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }

   SERVICE  { BIND(?s AS ?x) }
}


The query plan for that is:

(join
  (service 
    (slice _ 5
  (bgp (triple ?s 
 


  (service 
    (extend ((?x ?s))
  (table unit

which has not had any optimization applied.  ARQ checks scopes before 
doing any transfomation.


Change BIND(?s AS ?x) to BIND(?s1 AS ?x)

and it will have (join) replaced by (sequence)

---
| s   | x |
===
|    |   |
|     |   |
|     |   |
|  |   |
|   |   |
---

LIMIT 1 is a no-op - the second SERVICE always evals to one row of no 
columns. Which makes the second SERVICE the join identity and the 
result is the first SERVICE.


Column ?x is only in the display because it is in "SELECT *"

Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a 
Musical Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation 
(without result set limits) - just much faster.


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { ?s 
 ?x }

}


The main point:

However, the following query with ARQ interestingly yields one 
binding for every musical artist - which 

Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Andy Seaborne

In the email headers:

List-Unsubscribe: 

More details:
https://jena.apache.org/help_and_support/index.html

Andy


On 03/06/2022 18:32, Adrian Walker wrote:

unsubscribe me please.


Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Adrian Walker
unsubscribe me please.

On Fri, Jun 3, 2022 at 10:30 AM Andy Seaborne  wrote:

> Probably a bug then.
>
> Are you going to be making improvements to query
> tranformation/optimization as part of your work on the enhanced SERVICE
> handling on the active PR?
>
>  Andy
>
> On 03/06/2022 10:39, Claus Stadler wrote:
> > Hi again,
> >
> >
> > I think the point was missed; what I was actually after is that in the
> > following query a "join" is optimized into a "sequence"
> >
> > and I wonder whether this is the correct behavior if a LIMIT/OFFSET is
> > present.
> >
> > So running the following query with optimize enabled/disabled gives
> > different results:
> >
> > SELECT * {
> >SERVICE  { SELECT * { ?s a
> >  } LIMIT 5 }
> >SERVICE  { SELECT * { ?s
> >  ?x } LIMIT 1 }
> > }
> >
> >
> > ➜  bin ./arq --query service-query.rq
> >
> >(sequence !
> >
> >  (service 
> >(slice _ 5
> >  (bgp (triple ?s
> > 
> > 
> >  (service 
> >(slice _ 1
> >  (bgp (triple ?s 
> > ?x)
> >
> >
> ---
>
> >
> > | s   |
> > x |
> >
> ===
>
> >
> > |    | "Aarti
> > Mukherjee"@en  |
> > | | "Abatte
> > Barihun"@en   |
> > | | "Abby
> > Abadi"@en   |
> > |  | "Abd al
> > Malik"@de |
> > |   | "Abdul Wahid
> > Khan"@en |
> >
> ---
>
> >
> >
> >
> > ./arq --explain --optimize=no --query service-query.rq
> >(join !
> >  (service 
> >(slice _ 5
> >  (bgp (triple ?s
> > 
> > 
> >  (service 
> >(slice _ 1
> >  (bgp (triple ?s 
> > ?x)
> > -
> > | s | x |
> > =
> > -
> >
> >
> > Cheers,
> >
> > Claus
> >
> >
> > On 03.06.22 10:22, Andy Seaborne wrote:
> >>
> >>
> >> On 02/06/2022 21:19, Claus Stadler wrote:
> >>> Hi,
> >>>
> >>> I noticed some interesting results when using SERVICE with a sub
> >>> query with a slice (limit / offset).
> >>>
> >>>
> >>> Preliminary Remark:
> >>>
> >>> Because SPARQL semantics is bottom up, a query such as the following
> >>> will not yield bindings for ?x:
> >>>
> >>> SELECT * {
> >>>SERVICE  { SELECT * { ?s a
> >>>  } LIMIT 5 }
> >>>SERVICE  { BIND(?s AS ?x) }
> >>> }
> >>
> >> The query plan for that is:
> >>
> >> (join
> >>   (service 
> >> (slice _ 5
> >>   (bgp (triple ?s
> >> 
> >> 
> >>   (service 
> >> (extend ((?x ?s))
> >>   (table unit
> >>
> >> which has not had any optimization applied.  ARQ checks scopes before
> >> doing any transfomation.
> >>
> >> Change BIND(?s AS ?x) to BIND(?s1 AS ?x)
> >>
> >> and it will have (join) replaced by (sequence)
> >>
> >> ---
> >> | s   | x |
> >> ===
> >> |    |   |
> >> | |   |
> >> | |   |
> >> |  |   |
> >> |   |   |
> >> ---
> >>
> >> LIMIT 1 is a no-op - the second SERVICE always evals to one row of no
> >> columns. Which makes the second SERVICE the join identity and the
> >> result is the first SERVICE.
> >>
> >> Column ?x is only in the display because it is in "SELECT *"
> >>
> >>> Query engines, such as Jena, attempt to optimize execution. For
> >>> instance, in the following query,
> >>>
> >>> instead of retrieving all labels, jena uses each binding for a
> >>> Musical Artist to perform a lookup at the 

Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Andy Seaborne

Probably a bug then.

Are you going to be making improvements to query 
tranformation/optimization as part of your work on the enhanced SERVICE 
handling on the active PR?


Andy

On 03/06/2022 10:39, Claus Stadler wrote:

Hi again,


I think the point was missed; what I was actually after is that in the 
following query a "join" is optimized into a "sequence"


and I wonder whether this is the correct behavior if a LIMIT/OFFSET is 
present.


So running the following query with optimize enabled/disabled gives 
different results:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


➜  bin ./arq --query service-query.rq

   (sequence !

     (service 
   (slice _ 5
     (bgp (triple ?s 
 


     (service 
   (slice _ 1
     (bgp (triple ?s  
?x)


--- 

| s   | 
x |
=== 

|    | "Aarti 
Mukherjee"@en  |
|     | "Abatte 
Barihun"@en   |
|     | "Abby 
Abadi"@en   |
|  | "Abd al 
Malik"@de |
|   | "Abdul Wahid 
Khan"@en |
--- 




./arq --explain --optimize=no --query service-query.rq
   (join !
     (service 
   (slice _ 5
     (bgp (triple ?s 
 


     (service 
   (slice _ 1
     (bgp (triple ?s  
?x)

-
| s | x |
=
-


Cheers,

Claus


On 03.06.22 10:22, Andy Seaborne wrote:



On 02/06/2022 21:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub 
query with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }

   SERVICE  { BIND(?s AS ?x) }
}


The query plan for that is:

(join
  (service 
    (slice _ 5
  (bgp (triple ?s 
 


  (service 
    (extend ((?x ?s))
  (table unit

which has not had any optimization applied.  ARQ checks scopes before 
doing any transfomation.


Change BIND(?s AS ?x) to BIND(?s1 AS ?x)

and it will have (join) replaced by (sequence)

---
| s   | x |
===
|    |   |
|     |   |
|     |   |
|  |   |
|   |   |
---

LIMIT 1 is a no-op - the second SERVICE always evals to one row of no 
columns. Which makes the second SERVICE the join identity and the 
result is the first SERVICE.


Column ?x is only in the display because it is in "SELECT *"

Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a 
Musical Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation 
(without result set limits) - just much faster.


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { ?s 
 ?x }

}


The main point:

However, the following query with ARQ interestingly yields one 
binding for every musical artist - which contradicts the bottom-up 
paradigm:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE 

Re: Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Claus Stadler

Hi again,


I think the point was missed; what I was actually after is that in the 
following query a "join" is optimized into a "sequence"


and I wonder whether this is the correct behavior if a LIMIT/OFFSET is 
present.


So running the following query with optimize enabled/disabled gives 
different results:


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
  SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


➜  bin ./arq --query service-query.rq

  (sequence !

    (service 
  (slice _ 5
    (bgp (triple ?s 
 


    (service 
  (slice _ 1
    (bgp (triple ?s  
?x)


---
| s   | 
x |

===
|    | "Aarti 
Mukherjee"@en  |
|     | "Abatte 
Barihun"@en   |
|     | "Abby 
Abadi"@en   |
|  | "Abd al 
Malik"@de |
|   | "Abdul Wahid 
Khan"@en |

---


./arq --explain --optimize=no --query service-query.rq
  (join !
    (service 
  (slice _ 5
    (bgp (triple ?s 
 


    (service 
  (slice _ 1
    (bgp (triple ?s  
?x)

-
| s | x |
=
-


Cheers,

Claus


On 03.06.22 10:22, Andy Seaborne wrote:



On 02/06/2022 21:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub 
query with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }

   SERVICE  { BIND(?s AS ?x) }
}


The query plan for that is:

(join
  (service 
    (slice _ 5
  (bgp (triple ?s 
 


  (service 
    (extend ((?x ?s))
  (table unit

which has not had any optimization applied.  ARQ checks scopes before 
doing any transfomation.


Change BIND(?s AS ?x) to BIND(?s1 AS ?x)

and it will have (join) replaced by (sequence)

---
| s   | x |
===
|    |   |
|     |   |
|     |   |
|  |   |
|   |   |
---

LIMIT 1 is a no-op - the second SERVICE always evals to one row of no 
columns. Which makes the second SERVICE the join identity and the 
result is the first SERVICE.


Column ?x is only in the display because it is in "SELECT *"

Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a 
Musical Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation 
(without result set limits) - just much faster.


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { ?s 
 ?x }

}


The main point:

However, the following query with ARQ interestingly yields one 
binding for every musical artist - which contradicts the bottom-up 
paradigm:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


 "Aarti Mukherjee"@en
 "Abatte Barihun"@en
... 3 

Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Andy Seaborne




On 02/06/2022 21:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub query 
with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }

   SERVICE  { BIND(?s AS ?x) }
}


The query plan for that is:

(join
  (service 
(slice _ 5
  (bgp (triple ?s  


  (service 
(extend ((?x ?s))
  (table unit

which has not had any optimization applied.  ARQ checks scopes before 
doing any transfomation.


Change BIND(?s AS ?x) to BIND(?s1 AS ?x)

and it will have (join) replaced by (sequence)

---
| s   | x |
===
|    |   |
| |   |
| |   |
|  |   |
|   |   |
---

LIMIT 1 is a no-op - the second SERVICE always evals to one row of no 
columns. Which makes the second SERVICE the join identity and the result 
is the first SERVICE.


Column ?x is only in the display because it is in "SELECT *"

Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a Musical 
Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation (without 
result set limits) - just much faster.


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { ?s 
 ?x }

}


The main point:

However, the following query with ARQ interestingly yields one binding 
for every musical artist - which contradicts the bottom-up paradigm:


SELECT * {
   SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
   SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


 "Aarti Mukherjee"@en
 "Abatte Barihun"@en
... 3 more results ...


With bottom-up semantics, the second service clause would only fetch a 
single binding so in the unlikely event that it happens to join with a 
musical artist I'd expect at most one binding


in the overall result set.

Now I wonder whether this is a bug or a feature.

I know that Jena's VarFinder is used to decide whether to perform a 
bottom-up evaluation using OpJoin or a correlated join using OpSequence 
which results in the different outcomes.


The SPARQL spec doesn't say much about the semantics of Service 
(https://www.w3.org/TR/sparql11-query/#sparqlAlgebraEval)


It isn't about the semantics of SERVICE.  Its the (join) local-side.

So I wonder which behavior is expected when using SERVICE with SLICE'd 
queries.


"SERVICE { pattern }" executes "SELECT * { pattern }" at the far end, 
LIMITS and all.


Andy




Cheers,

Claus




Re: Semantics of SERVICE w.r.t. slicing

2022-06-03 Thread Lorenz Buehmann
The semantics should be in a separate document: 
https://www.w3.org/TR/sparql11-federated-query/#fedSemantics

On 02.06.22 22:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub query 
with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }

  SERVICE  { BIND(?s AS ?x) }
}


Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a Musical 
Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation (without 
result set limits) - just much faster.


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
  SERVICE  { ?s 
 ?x }

}


The main point:

However, the following query with ARQ interestingly yields one binding 
for every musical artist - which contradicts the bottom-up paradigm:


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
  SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


 "Aarti Mukherjee"@en
 "Abatte Barihun"@en
... 3 more results ...


With bottom-up semantics, the second service clause would only fetch a 
single binding so in the unlikely event that it happens to join with a 
musical artist I'd expect at most one binding


in the overall result set.

Now I wonder whether this is a bug or a feature.

I know that Jena's VarFinder is used to decide whether to perform a 
bottom-up evaluation using OpJoin or a correlated join using 
OpSequence which results in the different outcomes.


The SPARQL spec doesn't say much about the semantics of Service 
(https://www.w3.org/TR/sparql11-query/#sparqlAlgebraEval)


So I wonder which behavior is expected when using SERVICE with SLICE'd 
queries.



Cheers,

Claus