Re: Sessionizing raw events / does Calcite support ARRAY_AGG?

2018-12-15 Thread Kenneth Knowles
Super! I was not aware of COLLECT. That sounds like an even better fit.
Great timing on the release, for my sake :-)

Kenn

On Sat, Dec 15, 2018, 13:51 Julian Hyde  As of https://issues.apache.org/jira/browse/CALCITE-2224 (to be
> released shortly in 1.18) Calcite supports the WITHIN GROUP clause
> (that allows you to specify the order in which values are supplied to
> an aggregate function) and the COLLECT aggregate function (similar to
> ARRAY_AGG but returns a nested relation rather than an array).
>
> LISTAGG (an aggregate function that concatenates its arguments into a
> string) and ARRAY_AGG are not implemented but would be straightforward
> follow-ups to that work.
>
> As I noted in CALCITE-2224, the SQL standard says that ARRAY_AGG may
> optionally have an ORDER BY clause inside its parentheses, e.g.
>
>   SELECT deptno, ARRAY_AGG(empno ORDER BY sal DESC) AS emps
>   FROM emp
>   GROUP BY deptno
>
> I think WITHIN GROUP could and should be used instead, viz
>
>   SELECT deptno, ARRAY_AGG(empno) WITHIN GROUP (ORDER BY sal DESC) AS emps
>   FROM emp
>   GROUP BY deptno
>
> because that is more consistent with other aggregate functions, and
> would allow us to supply ARRAY_AGG without extra parser work.
>
> Julian
>
>
> On Fri, Dec 14, 2018 at 5:58 PM Kenneth Knowles  wrote:
> >
> > Hello!
> >
> > My use case is sessionizing raw events without an aggregation function.
> > Approximate code that I tried out:
> >
> > SELECT ARRAY_AGG(ROW(...))
> > FROM ...
> > GROUP BY SESSION(...)
> >
> > (followed by UNNEST to get the raw events, tagged with session info, back
> > out into a stream)
> >
> > I get a parser error on the paren after ARRAY_AGG, presumably because it
> is
> > an identifier treated as a column name?
> >
> > So I was digging through Calcite's code and my conclusion is that there
> is
> > no implementation of ARRAY_AGG. Is there an alternative? Is there another
> > way to use Calcite's streaming extensions to do sessionization of raw
> > events?
> >
> > Kenn
>


Re: Sessionizing raw events / does Calcite support ARRAY_AGG?

2018-12-15 Thread Julian Hyde
As of https://issues.apache.org/jira/browse/CALCITE-2224 (to be
released shortly in 1.18) Calcite supports the WITHIN GROUP clause
(that allows you to specify the order in which values are supplied to
an aggregate function) and the COLLECT aggregate function (similar to
ARRAY_AGG but returns a nested relation rather than an array).

LISTAGG (an aggregate function that concatenates its arguments into a
string) and ARRAY_AGG are not implemented but would be straightforward
follow-ups to that work.

As I noted in CALCITE-2224, the SQL standard says that ARRAY_AGG may
optionally have an ORDER BY clause inside its parentheses, e.g.

  SELECT deptno, ARRAY_AGG(empno ORDER BY sal DESC) AS emps
  FROM emp
  GROUP BY deptno

I think WITHIN GROUP could and should be used instead, viz

  SELECT deptno, ARRAY_AGG(empno) WITHIN GROUP (ORDER BY sal DESC) AS emps
  FROM emp
  GROUP BY deptno

because that is more consistent with other aggregate functions, and
would allow us to supply ARRAY_AGG without extra parser work.

Julian


On Fri, Dec 14, 2018 at 5:58 PM Kenneth Knowles  wrote:
>
> Hello!
>
> My use case is sessionizing raw events without an aggregation function.
> Approximate code that I tried out:
>
> SELECT ARRAY_AGG(ROW(...))
> FROM ...
> GROUP BY SESSION(...)
>
> (followed by UNNEST to get the raw events, tagged with session info, back
> out into a stream)
>
> I get a parser error on the paren after ARRAY_AGG, presumably because it is
> an identifier treated as a column name?
>
> So I was digging through Calcite's code and my conclusion is that there is
> no implementation of ARRAY_AGG. Is there an alternative? Is there another
> way to use Calcite's streaming extensions to do sessionization of raw
> events?
>
> Kenn


Sessionizing raw events / does Calcite support ARRAY_AGG?

2018-12-14 Thread Kenneth Knowles
Hello!

My use case is sessionizing raw events without an aggregation function.
Approximate code that I tried out:

SELECT ARRAY_AGG(ROW(...))
FROM ...
GROUP BY SESSION(...)

(followed by UNNEST to get the raw events, tagged with session info, back
out into a stream)

I get a parser error on the paren after ARRAY_AGG, presumably because it is
an identifier treated as a column name?

So I was digging through Calcite's code and my conclusion is that there is
no implementation of ARRAY_AGG. Is there an alternative? Is there another
way to use Calcite's streaming extensions to do sessionization of raw
events?

Kenn