On 30/07/13 18:01, Joshua TAYLOR wrote:
On Tue, Jul 30, 2013 at 12:58 PM, Joshua TAYLOR <[email protected]> wrote:
This query
prefix : <http://example.org/>
select ?x ?y where {
values (?x ?y) {
(1 2)
(1 UNDEF )
}
}
produces
---------
| x | y |
=========
| 1 | |
| 1 | 2 |
---------
because in the ordering, the UNDEF ?y comes before the 2 ?y. As such,
I'd expect that if I were to group by ?x and have the aggregate set
for ?y = { UNDEF, 2 } and select max(?y) that I should get 2.
However, this query
order by ?x ?y
prefix : <http://example.org/>
select ?x (max(?y) as ?maxY) where {
values (?x ?y) {
(1 2)
(1 UNDEF)
}
}
group by ?x
order by ?x
which does just that, produces:
------------
| x | maxY |
============
| 1 | |
------------
Am I misunderstanding how max [1] works? The spec says that max makes
use of the ORDER BY ordering, so I'm surprised by these results. Have
I missed something?
Thanks,
//JT
[1] http://www.w3.org/TR/sparql11-query/#defn_aggMax
Hi there,
Is it a bug? Probably, maybe.
It's worth noting that COUNT(?var) says:
"remove error elements from N"
so COUNT(?var) specifically handles the case of ?var unbound.
I think it was intended that MAX...error... is an error.
Treating error as a value leads to odd situations.
For example:
-1 * MIN(1,2,err) != MAX(-1,-2,err)
One way of arguing it is as you have, and MAX() should be 2 because
sorted the first element is 2 by SPARQL ORDER BY rules.
But
The expression ?y is evaluates in the MAX(?y). ?y is as much an
expression as ?y+1 is. There isn't a value to represent unbound - it's
just an error on evaluation.
The definition relies on "18.5.1 Aggregate Algebra" ; ListEval retains
errors resulting from the evaluation of the list elements. That does
sort of hint that "error" is a value but it's not supposed to.
The multiset of values passed as an argument is converted to a sequence
S, this sequence is ordered as per the ORDER BY DESC clause. So the
"error" is raised before we get to the sequence S and the ORDER BY bit.
OK - that's a pretty weasely argument but I think the intent is that
MAX() is an error if one of the thing it's MAX of is an undef or other
evaluation error. (It's even hinted at in the ARQ code!)
So there is something to check with the spec and if necessary raise an
errata.
Andy
PS
Thanks for the code. It really helps to have something to cut and paste.
I use arq.sparql from the command line to investigate queries and I
trust you to be able to execute a query correctly!
And here's minimal working code, in case you want to reproduce this.
Sorry, I meant to include this in the first email:
import com.hp.hpl.jena.query.QueryExecutionFactory;
import com.hp.hpl.jena.query.ResultSetFormatter;
import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
public class GroupAndIf {
final static String orderQuery = ""+
"prefix : <http://example.org/> \n" +
"select ?x ?y where {\n" +
" values (?x ?y) {\n" +
" (1 2)\n" +
" (1 UNDEF )\n" +
" }\n" +
"}\n" +
"order by ?x ?y" +
"";
final static String groupQuery = "" +
"prefix : <http://example.org/> \n" +
"select ?x (max(?y) as ?maxY) where {\n" +
" values (?x ?y) {\n" +
" (1 2)\n" +
" (1 UNDEF)\n" +
" }\n" +
"}\n" +
"group by ?x\n" +
"order by ?x\n" +
"";
public static void main(String[] args) {
final Model model = ModelFactory.createDefaultModel();
System.out.println( orderQuery );
System.out.println( groupQuery );
// Note that in the output, ordered by ?x ?y, the UNDEF < 2
ResultSetFormatter.out( QueryExecutionFactory.create(
orderQuery,
model ).execSelect() );
// Yet when we group by ?x and ask for (max(?y) as ?maxY), we
get an
undefined ?maxY.
ResultSetFormatter.out( QueryExecutionFactory.create(
groupQuery,
model ).execSelect() );
}
}