Dear Mike, dear Dmitry, Thanks a lot for the quick and helpful answers! I think what I was looking for is the "AT posVar". We are using similar constructs in several other SQL dialects/query languages. I will try it out right away, but I am not sure yet how we will deal with such a "beta" feature in our study.
@Mike: I indeed "just" need all combinations and the position of array elements is just a means to get that -- their order isn't important. (In fact, queries typically compute some metric for each combination and then the combination that minimizes or maximizes that metric.) If you know a better way to express that, I am very interested! All the best, Ingo > -----Original Message----- > From: Mike Carey <[email protected]> > Sent: Tuesday, August 3, 2021 5:48 PM > To: [email protected] > Subject: Re: Producing combinations of array elements in SQL++ > > Ingo, > > ALSO: Perhaps clarify what you mean by "unique"? Suppose an array has [1, 2, > 2, 3] as its contents. What would the desired result be? The best approach > to > this will depend on what you mean by that. > > > Cheers, > > Mike > > > On 8/3/21 8:38 AM, Dmitry Lychagin wrote: > > > Hi Ingo, > > > > 1) SQL++ supports positional variables in the FROM clause, as follows: > > FROM dataset AS ds, ds.array_field AS elementVar AT posVar > > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178 > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178> -- FromTerm) > > or using UNNEST instead of “,” > > FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar > > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181 > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181> – > UnnestClause) > > > > Positional variable is bound to a position of an element inside an array > which is being unnested. > > > > Here’s how your query could look like using these variables: > > SELECT … > > FROM base_table, base_table.array_field AS array_element1 AT pos1, > base_table.array_field AS array_element2 AT pos2 > > WHERE pos1 < pos2 > > > > 2) As for window function calls. I can confirm that if there is no ORDER > BY sub-clause inside OVER clause then the order of the elements processed by > window functions (row_number(), rank(), etc) is not guaranteed. > > > > Thanks, > > -- Dmitry > > > > > > From: Müller Ingo <[email protected]> > <mailto:[email protected]> > Reply-To: "[email protected]" > <mailto:[email protected]> <[email protected]> > <mailto:[email protected]> > Date: Tuesday, August 3, 2021 at 1:39 AM > To: "[email protected]" > <mailto:[email protected]> <[email protected]> > <mailto:[email protected]> > Subject: Producing combinations of array elements in SQL++ > > > > EXTERNAL EMAIL: Use caution when opening attachments or clicking on links > > > > > > Dear SQL++ experts, > > > > I am working on an academic study [1] comparing the expressiveness of > various query languages for nested data in a high-energy physics use case. > Among other systems, I have implemented the queries of the use case in SQL++ > [2]. In general, SQL++ seems to be quite well suited for those queries, but > one > frequent pattern seems to have only cumbersome formulations. > > > > In short, I am missing the functionality to produce (deterministic) > element identifiers for array elements. In BigQuery’s SQL dialect [3], this > can be > done with “UNNEST(array_field) WITH OFFSET AS idx”. For now, I am using a > construct similar to this: “FROM (SELECT array_field.*, row_number() OVER () > AS > idx FROM base_table.array_field) AS array_element” (see full query here [4]). > I > have two issues with this: First, it needs undebatably more characters than > the > BigQuery version and it is arguably also more cumbersome. Second, I don’t > think > that it is even correct since the element numbers are not guaranteed to be > deterministic (i.e., if I use that construct in several places in the query, > the same > array element may get different values for row_number()). The documentation > on row_number says “If the window order clause is omitted, the return values > may be unpredictable.“ However, introducing an order clause not only makes > the pattern even more cumbersome (the elements are objects with at least four > fields), it also does not guarantee a deterministic order still. The > documentation > of the Window Order Clause says: “The row_number() function returns a distinct > number for each tuple. If tuples are tied, the results may be unpredictable.” > (To > be fair, if two array elements agree on all fields, I can probably treat them > interchangeably in my use case.) > > > > With that introduction, two questions: (1) Can you confirm that there is > really no language construct better than what I currently use (such as > BigQuery’s > “OFFSET”)? (2) Can you confirm that row_number() is really undeterministic in > the way I use it? > > > > I need these element identifiers for producing combinations (in the > mathematical sense [5]) of the elements in an array field, but maybe there is > a > better way. In the simple case of producing all (unique) pairs of elements of > a > particular array, I use a construct like the following: > > > > FROM (SELECT array_field.*, row_number() OVER () AS idx FROM > base_table.array_field) AS array_element1, > > (SELECT array_field.*, row_number() OVER () AS idx FROM > base_table. array_field) AS array_element2 > > WHERE > > array_element1.idx < array_element2.idx > > > > So a last question: (3) Can you think of a better way to compute > combinations? > > > > Thanks a lot in advance and best regards, > > Ingo > > > > > > [1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, Mason > Proffitt, Gustavo Alonso. "Evaluating Query Languages and Systems for High- > Energy Physics Data." arXiv: 2104.12615 [cs.DB], 2021. > https://arxiv.org/abs/2104.12615 > > [2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp > <https://github.com/RumbleDB/iris-hep-benchmark-sqlpp> > > [3] https://cloud.google.com/bigquery/docs/reference/standard- > sql/arrays#flattening_arrays > <https://cloud.google.com/bigquery/docs/reference/standard- > sql/arrays#flattening_arrays> > > [4] https://github.com/RumbleDB/iris-hep-benchmark- > sqlpp/blob/master/queries/query-5/query.sqlpp > <https://github.com/RumbleDB/iris-hep-benchmark- > sqlpp/blob/master/queries/query-5/query.sqlpp> > > [5] https://en.wikipedia.org/wiki/Combination > > > >
