-----Original Message-----
From: Dmitry Lychagin <[email protected]>
Sent: Wednesday, August 4, 2021 3:22 AM
To: [email protected]
Subject: Re: Producing combinations of array elements in SQL++
Hi Ingo,
I think positional vars should probably work fine in your usecase (unnesting
array
fields within an object).
There are issues with this feature when it's used in other contexts though (e.g.
position of an object inside a dataset) These are open bugs in JIRA that track
problems with this feature: [1][2][3]
[1] https://issues.apache.org/jira/browse/ASTERIXDB-1050
[2] https://issues.apache.org/jira/browse/ASTERIXDB-1052
[3] https://issues.apache.org/jira/browse/ASTERIXDB-2595 (this one is relevant
for array unnesting too)
Thanks,
-- Dmitry
On 8/3/21, 10:20 AM, "Müller Ingo" <[email protected]> wrote:
EXTERNAL EMAIL: Use caution when opening attachments or clicking on links
Quick update: The "AT posVar" construct seems to produce the correct result
in some simple test cases and are much cleaner that my previous construct (see
[6])! In particular, the fact that I don't even need UNNEST makes some queries
really concise :)
Small follow-up question: Is there an issue tracking the status of this
feature? I
couldn't find it in JIRA...
Cheers,
Ingo
[6] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/commit/1023d97
> -----Original Message-----
> From: Müller Ingo <[email protected]>
> Sent: Tuesday, August 3, 2021 6:03 PM
> To: [email protected]
> Subject: RE: Producing combinations of array elements in SQL++
>
> Dear Mike, dear Dmitry,
>
> Thanks a lot for the quick and helpful answers! I think what I was
looking for
is
> the "AT posVar". We are using similar constructs in several other SQL
> dialects/query languages. I will try it out right away, but I am not
sure yet
how
> we will deal with such a "beta" feature in our study.
>
> @Mike: I indeed "just" need all combinations and the position of array
elements
> is just a means to get that -- their order isn't important. (In fact,
queries
typically
> compute some metric for each combination and then the combination that
> minimizes or maximizes that metric.) If you know a better way to express
that, I
> am very interested!
>
> All the best,
> Ingo
>
>
> > -----Original Message-----
> > From: Mike Carey <[email protected]>
> > Sent: Tuesday, August 3, 2021 5:48 PM
> > To: [email protected]
> > Subject: Re: Producing combinations of array elements in SQL++
> >
> > Ingo,
> >
> > ALSO: Perhaps clarify what you mean by "unique"? Suppose an array
> > has [1, 2, 2, 3] as its contents. What would the desired result be?
> > The best approach to this will depend on what you mean by that.
> >
> >
> > Cheers,
> >
> > Mike
> >
> >
> > On 8/3/21 8:38 AM, Dmitry Lychagin wrote:
> >
> >
> > Hi Ingo,
> >
> >
> >
> > 1) SQL++ supports positional variables in the FROM clause, as
follows:
> >
> > FROM dataset AS ds, ds.array_field AS elementVar AT posVar
> >
> > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178
> > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178> --
> FromTerm)
> >
> > or using UNNEST instead of “,”
> >
> > FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar
> >
> > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181
> > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181> –
> > UnnestClause)
> >
> >
> >
> > Positional variable is bound to a position of an element inside an
> > array which is being unnested.
> >
> >
> >
> > Here’s how your query could look like using these variables:
> >
> > SELECT …
> >
> > FROM base_table, base_table.array_field AS array_element1 AT pos1,
> > base_table.array_field AS array_element2 AT pos2
> >
> > WHERE pos1 < pos2
> >
> >
> >
> > 2) As for window function calls. I can confirm that if there is no
> > ORDER BY sub-clause inside OVER clause then the order of the elements
> > processed by window functions (row_number(), rank(), etc) is not
guaranteed.
> >
> >
> >
> > Thanks,
> >
> > -- Dmitry
> >
> >
> >
> >
> >
> > From: Müller Ingo <[email protected]>
> > <mailto:[email protected]>
> > Reply-To: "[email protected]"
> > <mailto:[email protected]> <[email protected]>
> > <mailto:[email protected]>
> > Date: Tuesday, August 3, 2021 at 1:39 AM
> > To: "[email protected]"
> > <mailto:[email protected]> <[email protected]>
> > <mailto:[email protected]>
> > Subject: Producing combinations of array elements in SQL++
> >
> >
> >
> > EXTERNAL EMAIL: Use caution when opening attachments or clicking on
> > links
> >
> >
> >
> >
> >
> > Dear SQL++ experts,
> >
> >
> >
> > I am working on an academic study [1] comparing the expressiveness
of
> > various query languages for nested data in a high-energy physics use
case.
> > Among other systems, I have implemented the queries of the use case in
> > SQL++ [2]. In general, SQL++ seems to be quite well suited for those
> > queries, but one frequent pattern seems to have only cumbersome
> formulations.
> >
> >
> >
> > In short, I am missing the functionality to produce (deterministic)
> > element identifiers for array elements. In BigQuery’s SQL dialect [3],
> > this can be done with “UNNEST(array_field) WITH OFFSET AS idx”. For
> > now, I am using a construct similar to this: “FROM (SELECT
> > array_field.*, row_number() OVER () AS idx FROM
> > base_table.array_field) AS array_element” (see full query here [4]). I
> > have two issues with this: First, it needs undebatably more characters
> > than the BigQuery version and it is arguably also more cumbersome.
> > Second, I don’t think that it is even correct since the element
> > numbers are not guaranteed to be deterministic (i.e., if I use that
> > construct in several places in the query, the same array element may
> > get different values for row_number()). The documentation on
> > row_number says “If the window order clause is omitted, the return
> > values may be unpredictable.“ However, introducing an order clause not
> > only makes the pattern even more cumbersome (the elements are objects
> > with at least four fields), it also does not guarantee a deterministic
> > order still. The documentation of the Window Order Clause says: “The
> > row_number() function returns a distinct number for each tuple. If
> > tuples are tied, the results may be unpredictable.” (To be fair, if
> > two array elements agree on all fields, I can probably treat them
> > interchangeably in my use case.)
> >
> >
> >
> > With that introduction, two questions: (1) Can you confirm that
there
> > is really no language construct better than what I currently use (such
> > as BigQuery’s “OFFSET”)? (2) Can you confirm that row_number() is
> > really undeterministic in the way I use it?
> >
> >
> >
> > I need these element identifiers for producing combinations (in the
> > mathematical sense [5]) of the elements in an array field, but maybe
> > there is a better way. In the simple case of producing all (unique)
> > pairs of elements of a particular array, I use a construct like the
following:
> >
> >
> >
> > FROM (SELECT array_field.*, row_number() OVER () AS idx FROM
> > base_table.array_field) AS array_element1,
> >
> > (SELECT array_field.*, row_number() OVER () AS idx FROM
> > base_table. array_field) AS array_element2
> >
> > WHERE
> >
> > array_element1.idx < array_element2.idx
> >
> >
> >
> > So a last question: (3) Can you think of a better way to compute
> > combinations?
> >
> >
> >
> > Thanks a lot in advance and best regards,
> >
> > Ingo
> >
> >
> >
> >
> >
> > [1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, Mason
> > Proffitt, Gustavo Alonso. "Evaluating Query Languages and Systems for
> > High- Energy Physics Data." arXiv: 2104.12615 [cs.DB], 2021.
> > https://arxiv.org/abs/2104.12615
> >
> > [2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp
> > <https://github.com/RumbleDB/iris-hep-benchmark-sqlpp>
> >
> > [3] https://cloud.google.com/bigquery/docs/reference/standard-
> > sql/arrays#flattening_arrays
> > <https://cloud.google.com/bigquery/docs/reference/standard-
> > sql/arrays#flattening_arrays>
> >
> > [4] https://github.com/RumbleDB/iris-hep-benchmark-
> > sqlpp/blob/master/queries/query-5/query.sqlpp
> > <https://github.com/RumbleDB/iris-hep-benchmark-
> > sqlpp/blob/master/queries/query-5/query.sqlpp>
> >
> > [5] https://en.wikipedia.org/wiki/Combination
> >
> >
> >
> >