Re: Possible Bug Roundtripping SPARQL to an SSE and back

Rick Moynihan Thu, 04 Jun 2015 08:25:18 -0700

On 4 June 2015 at 14:08, Andy Seaborne <[email protected]> wrote:

> Hi Rick,
>
> Are you in a position where you give some background on what you're trying
> to achieve overall?
>


Sure.  Basically we're porting some existing work of ours which was based
on Sesame, which would rewrite queries (and then the results on the way
out) using sesame's AST.  We're not doing anything especially fancy...
mostly simple URI substitutions on URI constants, both on the query and
then the results; so from the query side the transformation is essentially
invisible.

This worked great, whilst the we were using the sesame native store, as
we'd essentially go from SPARQL -> AST -> rewritten AST -> execution &
results.  However we now wish to make this same code work with remote
stores, which essentially means we have to convert the rewritten queries
back into valid SPARQL queries.

We don't care about syntactic or structural differences in queries, that
occur, but semantically the queries need to be isomorphic.  We originally
tried using sesame's query renderer, but it doesn't support SPARQL 1.1  So
we thought we'd port this code to use ARQ.

and inline ...
>
> On 04/06/15 13:49, Rick Moynihan wrote:
>
>> Hi Rob,
>>
>> Firstly thanks for filing the bug for me.
>>
>> Secondly in the case you cite, I don't understand why the query you cite:
>>
>> SELECT * {
>>    SELECT ?x { ?x a ?type }
>> }
>>
>> Isn't converted into the SSE:
>>
>> (project (?x ?type)
>>   (project (?x)
>>    (bgp
>>     (triple ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> ?type))))
>>
>>
> I hope not - that is the algebra for:
>
> SELECT ?x ?type {
>    SELECT ?x { ?x a ?type }
>
> same effect, different way of writing it.
>

I expanded the * partly because I didn't know whether SSE's represent
wildcards.  For my particular use case either way is fine.

Rewriting does no "optimization" however simple.
>
>  Or the semantically equivalent:
>>
>> (project (?x ?type)
>>    (bgp
>>     (triple ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
>>
>> Both of which will OpAsQuery.asQuery back into a semantically equivalent
>> SPARQL query.
>>
>> I clearly don't quite understand the role of the SSE's and the various
>> representations of Query inside ARQ.
>>
>
> SSE is the syntax in which the ARQ internal algebra can be written out and
> read in.  We do loosely talk about SSE for the alegbra but in fact theer
> are lots of other things that can be written in SSE.  Includind RDF data!
>
> http://jena.apache.org/documentation/notes/sse.html
>
> SSE is just a syntax.
>
>
>> Essentially I was wanting to use the SSE's to convert a SPARQL query into
>> a
>> tree, rewrite the query tree, and convert it back into SPARQL with
>> OpAsQuery.
>>
>
> If you want to do transforms that restructure the query, that's the way
> I'd do it.
>

That's a relief.  As I've already spent 2 days porting my rewriting code to
ARQ and this style, and my only failing test case is currently with nested
sub queries.

I think I'd misread Robs answer with regards to the semantics of * and
subqueries, and thought it was returning a different query!

So am I right that once JENA-954 is fixed, round tripping should always be
possible, and should always yield a semantically equivalent query?


If you want more of a surface manipulation, and only such manipulations,
> e.g. rename a variable, then a transformation of the query syntax tree
> might be easier.  I have some code for this I can contribute (written as an
> alternative to the query builder / parameterized query code).
>
> https://github.com/afs/AFS-Dev/tree/master/src/main/java/element
>
> If you think you are ever going to need the complicated part, investing in
> setting up the op based one is better. AST manipulation is a bit of a dead
> end.
>
> The OpAsQuery contract can't be query=>op=>exactly the same query because
> there ways to write two queries that lead to "the same" algebra.
>
> ARQ executes the algebra so the Op=>Query step isn't needed (or desirable).
>
>  I had assumed that aside from handling the trivialities around restoring
>> the outer queries type (e.g. CONSTRUCT/ASK/DESRIBE etc...) that this would
>> work, and that SPARQL queries would be round tripable.
>>
>> What is the reasoning for this property not holding?  I understand that
>> SSE's can express things that can't be expressed in SPARQL, presumably
>> useful once the SSE has been optimized; but why isn't every valid SPARQL
>> query round tripable, as suggested above?
>>
>
> We hope to have that contract (essentially, reverse the syntax to algebra
> algorithm in the SPARQL spec).
>

Amazing!  I think this answers my above question.

I noticed that JENA-954 has been scheduled for 3.0.0; is that release
expected anytime soon?  I'm not sure I know enough about JENA to provide a
patch for this issue yet, but if we could assemble one, could it be
included in an earlier release?


R.


 Thanks again for filing the bug report for me, and answering my questions.
>
> Kind regards,
>
> R.
>
> On 4 June 2015 at 12:28, Rob Vesse <[email protected]> wrote:
>
>  Rick
>>
>> Yes this does look like a bug
>>
>> Please bear in mind however that conversion to algebra does NOT guarantee
>> to round trip because some parts of a query do not end up in the algebra
>> and so OpAsQuery has simply no way to reconstruct the exact original query
>>
>> For example:
>>
>> SELECT * {
>>    SELECT ?x { ?x a ?type }
>> }
>>
>> Would round trip back to just:
>>
>> SELECT ?x { ?x a ?type }
>>
>>
>> There are also other cases where things could move around slightly, for
>> example a BIND is potentially indistinguishable from a SELECT expression
>> depending on the structure of the query.
>>
>> I have filed this as JENA-954 -
>> https://issues.apache.org/jira/browse/JENA-954
>>
>> Thanks for reporting this,
>>
>> Rob
>>
>> On 04/06/2015 11:45, "Rick Moynihan" <[email protected]> wrote:
>>
>>  Hi all,
>>>
>>> I have been playing around using ARQ to rewrite queries with Jena 2.13.0
>>> and have encountered what appears to be a bug when roundtripping a valid
>>> SPARQL query through to an SSE and back out as SPARQL.
>>>
>>> The original SPARQL query is this:
>>>
>>> SELECT (COUNT(*) as ?count) {
>>>   SELECT DISTINCT ?uri ?graph WHERE {
>>>     GRAPH ?graph {
>>>       ?uri ?p ?o .
>>>       }
>>>     } LIMIT 1
>>> }
>>>
>>> This parses into the following SSE by going through QueryFactory.create
>>> ->
>>> Algebra.compile :
>>>
>>> #<OpProject (project (?tripod_count_var)
>>>   (extend ((?tripod_count_var ?.0))
>>>     (group () ((?.0 (count)))
>>>       (distinct
>>>         (project (?uri ?graph)
>>>           (graph ?graph
>>>             (bgp (triple ?uri ?p ?o))))))))
>>>
>>> To my eye this looks correct so far... next we round trip it back into a
>>> SPARQL query by using OpAsQuery.asQuery that results in:
>>>
>>> #<Query SELECT DISTINCT  (count(*) AS ?tripod_count_var)
>>> WHERE
>>>   { { SELECT  ?uri ?graph
>>>       WHERE
>>>         { GRAPH ?graph
>>>             { ?uri ?p ?o}
>>>         }
>>>     }
>>>   }
>>>
>>>>
>>>>
>>> This now seems broken...  asQuery has mixed the inner select distinct
>>> onto
>>> the outer one.  This appears to happen with all sub selects.  I suspect
>>> it
>>> might be due to OpAsQuery.asQuery building only Query object which is
>>> somehow being reused for all sub queries.
>>>
>>> I took a look in the unit tests and found that some of the test queries
>>> in
>>> TestOpAsQuery are also subject to this bug e.g. the query on line 223:
>>>
>>> SELECT ?key ?agg WHERE { { SELECT ?key (COUNT(*) AS ?agg) { ?key ?p ?o }
>>> GROUP BY ?key } }
>>>
>>> Though the tests don't seem to currently test for this kind of thing.
>>>
>>> Can anyone confirm that this is a bug?
>>>
>>> Kind regards,
>>>
>>> R.
>>>
>>
>>
>>
>>
>>
>>
>

Re: Possible Bug Roundtripping SPARQL to an SSE and back

Reply via email to