Re: Bug Round-tripping HAVING clauses to an SSE and back

Andy Seaborne Mon, 22 Jun 2015 11:24:43 -0700

Hi Rick,

I've put in a new OpAsQuery (and new tests); I also ran against theSELECT queries in "sparql-corpus". Because the outcome of OpAsQuery isan equivalent query, not the .equals same query, some of the checkingwas manual but it looks good to me. Currently, as in the old OpAsQuery,it puts in {} in patterns quite often to be safe.

I may look at a separate syntax cleaning step to remove unnecessaryones; putting it in the translation process seems wrong to me.


The snapshot development is rebuilding at the moment and contains this fix.

It even "optimized" the query in one case (!!!)

SELECT  *
WHERE
  { SELECT DISTINCT  ?uri ?label
    WHERE
      {...}
    ORDER BY ?uri
  }
OFFSET  0
LIMIT   5000

became

SELECT DISTINCT  ?uri ?label
WHERE
    {...}
    ORDER BY ?uri
OFFSET  0
LIMIT   5000

and some other SELECT * WHERE got removed (if there are no modifierslike LIMIT it's a no-op and is not visible in the algebra). The'optimization' above is to collapse the levels.


(sparql-corpus/cabi/opendatascotland.sparql is a compendium of queries)

        Andy


On 18/06/15 12:30, Rick Moynihan wrote:

On 17 June 2015 at 14:13, Andy Seaborne <[email protected]> wrote:

On 17/06/15 10:16, Rick Moynihan wrote:

Hi Andy,

Thanks for raising JENA-963 for me - I'll raise the issues directly in the
future.  Sometimes it's hard to know whether things are intended (or at
least accepted) behaviours though.


Point taken. It's those unfunded volunteers - can't rely on them!
The project takes whatever channels work; we're not, I hope, dogmatic.
When stuff gets detailed, email isn't so good, whether basic formatting
stuff or just as a record over time, JIRA is better, at least I find so.
Helps people see what they can contribute as well.


I completely agree about your channels point.  Which is precisely why I'll
often go to the mailing list before the bug tracker.  If you're unsure
about the behaviour, or whether its a bug you can get quicker feedback by
going to the mailing list first, and when you're satisfied its a bug;
filing it.

Regardless, I think 963 was clearly a bug, and I should have directly filed
it for you in JIRA, and will do in the future.

  Unfortunately I haven't got an exhaustive set of queries we need to

support; but we're basically hoping to have all arbitrary SPARQL 1.1
queries round-trip back to a query which is at least equivalent when
evaluated on any complaint SPARQL 1.1 database to what went in.

Most of the problems I've run into have been uncovered either by using it,
writing unit tests for my domain code, by integration testing with some of
our other components, or in this particular case by a colleague trying to
generate some stats on data we have.

Would every example query from the SPARQL 1.1 spec be a good start?

http://www.w3.org/TR/sparql11-query/

I also have a small collection of about 28 different real world queries
(mostly for handling RDF data cubes) which were generated via some of our
systems that may be useful.  If you'd like me to provide them as potential
test cases I'm sure I can do that.


That would be great.


Ok, I'm not sure how useful these will be for this bug, but I've created a
repo with 56 real world SPARQL queries (no data), which you're more than
welcome to use as you please.

I've licensed the repo as MIT, which I think should work with Apache; but
I'm happy to grant you an Apache license to the queries as they are too.
Many of the queries were auto generated, so might not be what a user would
write.

https://github.com/Swirrl/sparql-corpus

Let me know if you need anything else.

I've done some analysis on JENA-963 and written in the cases I think turn

out for GROUP BY and it woudl be good to validate that analysis with real
world queries of interest.


Ok, there happen to be 11 real world GROUP BY queries in that repo:

12:07 $ git grep GROUP
cabi/cabi-calculate-level.sparql:} GROUP BY ?leafConcept ?topConcept
cabi/cabi-count-documents-countries.sparql:} GROUP BY ?countable
?countryLabel LIMIT 10 OFFSET 0
cabi/cabi-count-documents-regions.sparql:} GROUP BY ?countable ?countableId
?countableName
cabi/cabi-count-documents-themes.sparql:} GROUP BY ?countable ?countableId
cabi/cabi-graphs.sparql:} GROUP BY ?o ?g
cabi/cabi-research-outputs.sparql:    } GROUP BY ?resource ?title
?projectUri ?outputTitle ?outputDate
cabi/opendatascotland.sparql:} GROUP BY
cabi/spog.sparql:} GROUP BY ?g
cabi/test-sparql.sparql:     } GROUP BY ?resource ?title ?projectUri
?projectId ?outputTitle ?outputDate
cabi/test.sparql:} GROUP BY ?countable ?countableId ?countableName LIMIT 10
OFFSET 10
pmd/dataset_period_row_labels.sparql:            GROUP BY ?row

It looks to me like the top-down visit-driven translation is good for the
WHERE{} part of the algebra to query but spotting group, and all it's
details, is more of a pattern matching task.  In fact, having pattern
matching for the parts outside WHERE{}, all the modifiers in SPARQL, looks
good.

Algebra that is not in the shape originally generated by the query needs
to be factored in (not that the contract of OpAsQuery can promise
perfection there), it's just that, my guess, algebra-like-queries is the
major use case.


I can't say I understand all the details here, but it sounds good.  If you
let me know when the code lands in a SNAPSHOT jar, I'll happily integrate
it with our stuff and see if anything else falls out.

(Yes, clojure would be perfect for this!)


It's funny you should say that!  Our systems are actually written in
Clojure, and rather than make use of the visitors JENA provides - I wrote a
small functional zipper in just 9 lines of clojure.zip that you can use to
trivially traverse SSE trees in a few lines.  Obviously from a clojure
perspective it would be better if SSE items, lists and nodes were actually
clojure data - but the SSE idea made the whole thing a joy.  Bravo!

R.

On 15 June 2015 at 18:31, Andy Seaborne <[email protected]> wrote:


  Hi Rick,


Sorry, your not having a good time of it here.

Not one but 2 related bugs (filter in wrong place, lost the aggregate
function) this time.  HAVING is particularly hard because it isn't a
simple
mapping to one algebra form.

If split up:
--------------
PREFIX  qb:   <http://purl.org/linked-data/cube#>

SELECT  ?obs (COUNT(?value) AS ?C)
WHERE
    { ?obs a qb:Observation .
      ?obs qb:measureType ?measure .
      ?obs ?measure ?value
    }
GROUP BY ?obs
HAVING ( ?C > 1 )
--------------
it goes wrong as well.

I've recorded it as

https://issues.apache.org/jira/browse/JENA-963

A couple of things would be good:

You can raise JIRA directly - I attached code to the JIRA like it was
from
JENA-954.  Prefixes etc. - query-in, query-out.

What would be really good is fix the test coverage.  "TestOpAsQuery" is
the test class. Do you have a complete (nearly complete ...) list of
features? What's missing in TestOpAsQuery?

If we can get the coverage up, we'll be a better position long term.

          Andy


On 15/06/15 16:57, Rick Moynihan wrote:

  Hi all,


I've been using the recent fixes to ARQ (made in JENA-954) around
rendering
SPARQL queries and have encountered another problem where a valid query
appears to roundtrip to an invalid one.

The problematic query is this:

SELECT ?obs
WHERE {
     ?obs a qb:Observation ;
            qb:measureType ?measure ;
            ?measure ?value ;
            .
}
GROUP BY ?obs
HAVING (COUNT(?value) > 1)

Which generates this SSE:

(project
     (?obs)
     (filter
       (> ?.0 1)
       (group
         (?obs)
         ((?.0
           (count ?value)))
         (bgp
           (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<
http://purl.org/linked-data/cube#Observation>)
           (triple ?obs <http://purl.org/linked-data/cube#measureType>
?measure)
           (triple ?obs ?measure ?value)))))

But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
this invalid query:

SELECT  ?obs
WHERE
     { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
qb:Observation .
       ?obs qb:measureType ?measure .
       ?obs ?measure ?value
       FILTER ( ?.0 > 1 )
     }
GROUP BY ?obs


R.

Re: Bug Round-tripping HAVING clauses to an SSE and back

Reply via email to