Re: Semantics of SERVICE w.r.t. slicing

Andy Seaborne Fri, 03 Jun 2022 10:30:21 -0700

Probably a bug then.

Are you going to be making improvements to querytranformation/optimization as part of your work on the enhanced SERVICEhandling on the active PR?


    Andy

On 03/06/2022 10:39, Claus Stadler wrote:

Hi again,
I think the point was missed; what I was actually after is that in thefollowing query a "join" is optimized into a "sequence"
and I wonder whether this is the correct behavior if a LIMIT/OFFSET ispresent.
So running the following query with optimize enabled/disabled givesdifferent results:
SELECT * {
SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s<http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 }
}


➜  bin ./arq --query service-query.rq

   (sequence !!!!!

     (service <https://dbpedia.org/sparql>
       (slice _ 5
(bgp (triple ?s<http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://dbpedia.org/ontology/MusicalArtist>))))
     (service <https://dbpedia.org/sparql>
       (slice _ 1
(bgp (triple ?s <http://www.w3.org/2000/01/rdf-schema#label>?x)))))
-------------------------------------------------------------------------------| s |x |===============================================================================| <http://dbpedia.org/resource/Aarti_Mukherjee> | "AartiMukherjee"@en || <http://dbpedia.org/resource/Abatte_Barihun> | "AbatteBarihun"@en || <http://dbpedia.org/resource/Abby_Abadi> | "AbbyAbadi"@en || <http://dbpedia.org/resource/Abd_al_Malik_(rapper)> | "Abd alMalik"@de || <http://dbpedia.org/resource/Abdul_Wahid_Khan> | "Abdul WahidKhan"@en |-------------------------------------------------------------------------------
./arq --explain --optimize=no --query service-query.rq
   (join !!!!!
     (service <https://dbpedia.org/sparql>
       (slice _ 5
(bgp (triple ?s<http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://dbpedia.org/ontology/MusicalArtist>))))
     (service <https://dbpedia.org/sparql>
       (slice _ 1
(bgp (triple ?s <http://www.w3.org/2000/01/rdf-schema#label>?x)))))
---------
| s | x |
=========
---------


Cheers,

Claus


On 03.06.22 10:22, Andy Seaborne wrote:
On 02/06/2022 21:19, Claus Stadler wrote:
Hi,
I noticed some interesting results when using SERVICE with a subquery with a slice (limit / offset).
Preliminary Remark:
Because SPARQL semantics is bottom up, a query such as the followingwill not yield bindings for ?x:
SELECT * {
SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 }
   SERVICE <https://dbpedia.org/sparql> { BIND(?s AS ?x) }
}
The query plan for that is:

(join
  (service <https://dbpedia.org/sparql>
    (slice _ 5
(bgp (triple ?s<http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://dbpedia.org/ontology/MusicalArtist>))))
  (service <https://dbpedia.org/sparql>
    (extend ((?x ?s))
      (table unit))))
which has not had any optimization applied. ARQ checks scopes beforedoing any transfomation.
Change BIND(?s AS ?x) to BIND(?s1 AS ?x)

and it will have (join) replaced by (sequence)

-----------------------------------------------------------
| s                                                   | x |
===========================================================
| <http://dbpedia.org/resource/Aarti_Mukherjee>       |   |
| <http://dbpedia.org/resource/Abatte_Barihun>        |   |
| <http://dbpedia.org/resource/Abby_Abadi>            |   |
| <http://dbpedia.org/resource/Abd_al_Malik_(rapper)> |   |
| <http://dbpedia.org/resource/Abdul_Wahid_Khan>      |   |
-----------------------------------------------------------
LIMIT 1 is a no-op - the second SERVICE always evals to one row of nocolumns. Which makes the second SERVICE the join identity and theresult is the first SERVICE.
Column ?x is only in the display because it is in "SELECT *"
Query engines, such as Jena, attempt to optimize execution. Forinstance, in the following query,
instead of retrieving all labels, jena uses each binding for aMusical Artist to perform a lookup at the service.
The result is semantically equivalent to bottom up evaluation(without result set limits) - just much faster.
SELECT * {
SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } SERVICE <https://dbpedia.org/sparql> { ?s<http://www.w3.org/2000/01/rdf-schema#label> ?x }
}


The main point:
However, the following query with ARQ interestingly yields onebinding for every musical artist - which contradicts the bottom-upparadigm:
SELECT * {
SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s a<http://dbpedia.org/ontology/MusicalArtist> } LIMIT 5 } SERVICE <https://dbpedia.org/sparql> { SELECT * { ?s<http://www.w3.org/2000/01/rdf-schema#label> ?x } LIMIT 1 }
}


<http://dbpedia.org/resource/Aarti_Mukherjee> "Aarti Mukherjee"@en
<http://dbpedia.org/resource/Abatte_Barihun> "Abatte Barihun"@en
... 3 more results ...
With bottom-up semantics, the second service clause would only fetcha single binding so in the unlikely event that it happens to joinwith a musical artist I'd expect at most one binding
in the overall result set.

Now I wonder whether this is a bug or a feature.
I know that Jena's VarFinder is used to decide whether to perform abottom-up evaluation using OpJoin or a correlated join usingOpSequence which results in the different outcomes.
The SPARQL spec doesn't say much about the semantics of Service(https://www.w3.org/TR/sparql11-query/#sparqlAlgebraEval)
It isn't about the semantics of SERVICE.  Its the (join) local-side.
So I wonder which behavior is expected when using SERVICE withSLICE'd queries.
"SERVICE { pattern }" executes "SELECT * { pattern }" at the far end,LIMITS and all.
    Andy
Cheers,

Claus

Re: Semantics of SERVICE w.r.t. slicing

Reply via email to