On 03/12/13 15:10, Emanuele Della Valle wrote:
Dear all,
Marco and I would like to implement a new aggregation in ARQ.
We understand that a textual query is translated in expressions and,
then, into an algebraic execution plan. We found the aggregation
expressions [1], but we cannot find the operators that implement them
in [2] Can you help us?
Aggregates are only used in (group) so they appear as arguments the
(group) operator. They aren't top-level operators per se because they
only have meaning when part of the grouping process that feeds them
their inputs.
arq.qparse can help here: it will print the algebra:
example:
~/tmp >> qparse --print=query --print=op --file Q.rq
SELECT (count(*) AS ?C)
WHERE
{ ?s ?p ?o }
GROUP BY ?s
HAVING ( count(*) > 5 )
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(project (?C)
(filter (> ?.0 5)
(extend ((?C ?.0))
(group (?s) ((?.0 (count)))
(bgp (triple ?s ?p ?o))))))
or when then there is no GROUP BY clause:
SELECT (count(*) AS ?C)
WHERE
{ ?s ?p ?o }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(project (?C)
(extend ((?C ?.0))
(group () ((?.0 (count)))
(bgp (triple ?s ?p ?o)))))
and the online version (also included in Fuseki)
http://www.sparql.org/query-validator.html
Aggregates can only appear in certain places for meaning - SELECT
clause, HAVING, ORDER BY (the latter being obscure).
In SPARQL, custom aggregate named by URI is not called out in the
syntax. They look like functions - except that they allow the word
DISTINCT in the arguments.
What ARQ does not have is a registry of aggregates - the only ones it
supports is limited by the parser. The execution engine doesn't have a
fixed set. You'll need to tweak the parser process; one way is to look
any function URI up in a new AggregationRegistry to see if it is a
function or an aggregate and proceed acordingly.
Example:
SELECT (my:something(?x) AS ?X) { ... }
you can't tell by syntax if that's an aggregate or a plain custom
extension function [*].
There are other aggregates that could be usefully added to the general
distribution - more stats ones being obvious (to me!).
Andy
[*]
Personally, I think that aggregates and functions should be separate
syntax e.g. AGG(uri, args, ...) or AGG(uri(args, ...)) and if that works
better for you, we can add it to the extended language.
Bests,
Emanuele
[1]
http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/expr/aggregate/package-frame.html
[2]
http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/algebra/op/package-frame.html
-- prof. Emanuele Della Valle DEIB - Politecnico di Milano m.
+393389375810 w. http://emanueledellavalle.org