Re: Bug Round-tripping HAVING clauses to an SSE and back

Rick Moynihan Thu, 18 Jun 2015 04:31:43 -0700

On 17 June 2015 at 14:13, Andy Seaborne <[email protected]> wrote:

> On 17/06/15 10:16, Rick Moynihan wrote:
>
>> Hi Andy,
>>
>> Thanks for raising JENA-963 for me - I'll raise the issues directly in the
>> future.  Sometimes it's hard to know whether things are intended (or at
>> least accepted) behaviours though.
>>
>
> Point taken. It's those unfunded volunteers - can't rely on them!
> The project takes whatever channels work; we're not, I hope, dogmatic.
> When stuff gets detailed, email isn't so good, whether basic formatting
> stuff or just as a record over time, JIRA is better, at least I find so.
> Helps people see what they can contribute as well.
>


I completely agree about your channels point.  Which is precisely why I'll
often go to the mailing list before the bug tracker.  If you're unsure
about the behaviour, or whether its a bug you can get quicker feedback by
going to the mailing list first, and when you're satisfied its a bug;
filing it.

Regardless, I think 963 was clearly a bug, and I should have directly filed
it for you in JIRA, and will do in the future.

 Unfortunately I haven't got an exhaustive set of queries we need to
>> support; but we're basically hoping to have all arbitrary SPARQL 1.1
>> queries round-trip back to a query which is at least equivalent when
>> evaluated on any complaint SPARQL 1.1 database to what went in.
>>
>> Most of the problems I've run into have been uncovered either by using it,
>> writing unit tests for my domain code, by integration testing with some of
>> our other components, or in this particular case by a colleague trying to
>> generate some stats on data we have.
>>
>> Would every example query from the SPARQL 1.1 spec be a good start?
>>
>> http://www.w3.org/TR/sparql11-query/
>>
>> I also have a small collection of about 28 different real world queries
>> (mostly for handling RDF data cubes) which were generated via some of our
>> systems that may be useful.  If you'd like me to provide them as potential
>> test cases I'm sure I can do that.
>>
>
> That would be great.
>

Ok, I'm not sure how useful these will be for this bug, but I've created a
repo with 56 real world SPARQL queries (no data), which you're more than
welcome to use as you please.

I've licensed the repo as MIT, which I think should work with Apache; but
I'm happy to grant you an Apache license to the queries as they are too.
Many of the queries were auto generated, so might not be what a user would
write.

https://github.com/Swirrl/sparql-corpus

Let me know if you need anything else.

I've done some analysis on JENA-963 and written in the cases I think turn
> out for GROUP BY and it woudl be good to validate that analysis with real
> world queries of interest.
>

Ok, there happen to be 11 real world GROUP BY queries in that repo:

12:07 $ git grep GROUP
cabi/cabi-calculate-level.sparql:} GROUP BY ?leafConcept ?topConcept
cabi/cabi-count-documents-countries.sparql:} GROUP BY ?countable
?countryLabel LIMIT 10 OFFSET 0
cabi/cabi-count-documents-regions.sparql:} GROUP BY ?countable ?countableId
?countableName
cabi/cabi-count-documents-themes.sparql:} GROUP BY ?countable ?countableId
cabi/cabi-graphs.sparql:} GROUP BY ?o ?g
cabi/cabi-research-outputs.sparql:    } GROUP BY ?resource ?title
?projectUri ?outputTitle ?outputDate
cabi/opendatascotland.sparql:} GROUP BY
cabi/spog.sparql:} GROUP BY ?g
cabi/test-sparql.sparql:     } GROUP BY ?resource ?title ?projectUri
?projectId ?outputTitle ?outputDate
cabi/test.sparql:} GROUP BY ?countable ?countableId ?countableName LIMIT 10
OFFSET 10
pmd/dataset_period_row_labels.sparql:            GROUP BY ?row


> It looks to me like the top-down visit-driven translation is good for the
> WHERE{} part of the algebra to query but spotting group, and all it's
> details, is more of a pattern matching task.  In fact, having pattern
> matching for the parts outside WHERE{}, all the modifiers in SPARQL, looks
> good.
>
> Algebra that is not in the shape originally generated by the query needs
> to be factored in (not that the contract of OpAsQuery can promise
> perfection there), it's just that, my guess, algebra-like-queries is the
> major use case.
>

I can't say I understand all the details here, but it sounds good.  If you
let me know when the code lands in a SNAPSHOT jar, I'll happily integrate
it with our stuff and see if anything else falls out.


> (Yes, clojure would be perfect for this!)
>

It's funny you should say that!  Our systems are actually written in
Clojure, and rather than make use of the visitors JENA provides - I wrote a
small functional zipper in just 9 lines of clojure.zip that you can use to
trivially traverse SSE trees in a few lines.  Obviously from a clojure
perspective it would be better if SSE items, lists and nodes were actually
clojure data - but the SSE idea made the whole thing a joy.  Bravo!

R.


> On 15 June 2015 at 18:31, Andy Seaborne <[email protected]> wrote:
>>
>>  Hi Rick,
>>>
>>> Sorry, your not having a good time of it here.
>>>
>>> Not one but 2 related bugs (filter in wrong place, lost the aggregate
>>> function) this time.  HAVING is particularly hard because it isn't a
>>> simple
>>> mapping to one algebra form.
>>>
>>> If split up:
>>> --------------
>>> PREFIX  qb:   <http://purl.org/linked-data/cube#>
>>>
>>> SELECT  ?obs (COUNT(?value) AS ?C)
>>> WHERE
>>>    { ?obs a qb:Observation .
>>>      ?obs qb:measureType ?measure .
>>>      ?obs ?measure ?value
>>>    }
>>> GROUP BY ?obs
>>> HAVING ( ?C > 1 )
>>> --------------
>>> it goes wrong as well.
>>>
>>> I've recorded it as
>>>
>>> https://issues.apache.org/jira/browse/JENA-963
>>>
>>> A couple of things would be good:
>>>
>>> You can raise JIRA directly - I attached code to the JIRA like it was
>>> from
>>> JENA-954.  Prefixes etc. - query-in, query-out.
>>>
>>> What would be really good is fix the test coverage.  "TestOpAsQuery" is
>>> the test class. Do you have a complete (nearly complete ...) list of
>>> features? What's missing in TestOpAsQuery?
>>>
>>> If we can get the coverage up, we'll be a better position long term.
>>>
>>>          Andy
>>>
>>>
>>> On 15/06/15 16:57, Rick Moynihan wrote:
>>>
>>>  Hi all,
>>>>
>>>> I've been using the recent fixes to ARQ (made in JENA-954) around
>>>> rendering
>>>> SPARQL queries and have encountered another problem where a valid query
>>>> appears to roundtrip to an invalid one.
>>>>
>>>> The problematic query is this:
>>>>
>>>> SELECT ?obs
>>>> WHERE {
>>>>     ?obs a qb:Observation ;
>>>>            qb:measureType ?measure ;
>>>>            ?measure ?value ;
>>>>            .
>>>> }
>>>> GROUP BY ?obs
>>>> HAVING (COUNT(?value) > 1)
>>>>
>>>> Which generates this SSE:
>>>>
>>>> (project
>>>>     (?obs)
>>>>     (filter
>>>>       (> ?.0 1)
>>>>       (group
>>>>         (?obs)
>>>>         ((?.0
>>>>           (count ?value)))
>>>>         (bgp
>>>>           (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <
>>>> http://purl.org/linked-data/cube#Observation>)
>>>>           (triple ?obs <http://purl.org/linked-data/cube#measureType>
>>>> ?measure)
>>>>           (triple ?obs ?measure ?value)))))
>>>>
>>>> But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
>>>> this invalid query:
>>>>
>>>> SELECT  ?obs
>>>> WHERE
>>>>     { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> qb:Observation .
>>>>       ?obs qb:measureType ?measure .
>>>>       ?obs ?measure ?value
>>>>       FILTER ( ?.0 > 1 )
>>>>     }
>>>> GROUP BY ?obs
>>>>
>>>>
>>>> R.
>>>>
>>>>
>>>>
>>
>

Re: Bug Round-tripping HAVING clauses to an SSE and back

Reply via email to