Re: Aggregation over unbound variables

Arthur Keen Fri, 14 Feb 2014 08:22:13 -0800

Andy,

Thanks, your explanation makes this crystal clear. I can see this now by re-reading the spec on COUNT versus SUM.

The COUNT spec specifically says it counts the bound values :

"counts the number of times a given _expression_ has a bound, and non-error value",

and the spec for SUM says:

"the numeric value obtained by summing the values within the aggregate group".

The spec would have said something like

"the numeric value obtained by summing the bound values within the aggregate group" if the SQL interpretation of SUM was being specified.

Thanks very much for the clarification.

Arthur

On Feb 14, 2014, at 6:44 AM, Andy Seaborne <a...@apache.org> wrote:

Arthur,

What does the spec say?

http://www.w3.org/TR/sparql11-query/#defn_aggSum

sum is defined using XSD's op:numeric-add of the evaluation of the sum'ed _expression_ over the grouped rows.

If one of the expressions is an error, then the whole aggregate is an error. Unbound is not null, it's an error to try to get the value of the variable.

You can sum over expressions where items may be unbound with either of:

SUM(IF(bound(?x),?x,0))

SUM(COALESCE(?x,0))

   Andy

PS
> We have found that at least two SPARQL implementations use the SQL
> semantics

If you are naming implementations, could you name all of them?

On 14/02/14 00:29, Arthur Keen wrote:
We are developing a SPARQL 1.1 implementation, and and are hoping for
some guidance on the SPARQL 1.1. specification on how to deal with
aggregation over unbound variables.

I believe COUNT functions are the same in SQL and SPARQL. but the other
aggregates (sum/min/max/avg), seem to have different semantics (at least
per Jena).

The semantics of sum/min/max/avg are different w.r.t nulls (at least
according ot Jena).

In SQL, the sum/min/max/avg of a nullable column is the sum/min/max/avg
of the *non-null* values. For example, Suppose you have the following
data in table "foo":

      name    | age
   ----------+------
      "Bob"   | 5
      "Bob"   |
      "Alice" | 3
      "Alice" | 4

Then the query, "select sum(age) from foo group by name" gives this result:

      name    | sum
   ----------+-------
      "Bob"   | 5
      "Alice" | 7

In contrast, Jena returns NULL (i.e. unbound) if there are any nulls in
the data:

   -----------------------------------------------------
   | name    | total | cnt | cntstar | avg | min | max |
   =====================================================
   | "Bob"   |       | 1   | 2       |     |     |     |
   | "Alice" | 7     | 2   | 2       | 3.5 | 3   | 4   |
   -----------------------------------------------------

   Data:

       @prefix foaf:       <http://xmlns.com/foaf/0.1/> ..

       _:a foaf:name       "Alice" ..

       _:a foaf:age        4 .

       _:b foaf:name       "Alice" .

       _:b foaf:age        3 .

       _:c foaf:name       "Bob" .

       _:c foaf:age        5 .

       _:d foaf:name       "Bob" ..

   Query:

       PREFIX foaf: <http://xmlns.com/foaf/0.1/>

       SELECT ?name

               (sum(?age) as ?total)

               (count(?age) as ?cnt)

               (count(*) as ?cntstar)

               (avg(?age) as ?avg)

               (min(?age) as ?min)

               (max(?age) as ?max)

       WHERE {

          ?x foaf:name ?name .

          OPTIONAL { ?x foaf:age ?age }

       }

       group by ?name

We have found that at least two SPARQL implementations use the SQL
semantics for this, so it would be of benefit to the SPARQL community to
have consistent way to handle aggregation over unbound variables.

Best regards

Arthur Keen

<<inline: favicon.ico>>

SPARQL City, Inc.

NoSQL, NoPain, NoWaiting…

Arthur Keen. Ph.D.,

Vice President Solutions Architecture

Mobile: 512-433-9537

arthur.k...@sparqlcity.com

Re: Aggregation over unbound variables

Reply via email to