Quick update: The "AT posVar" construct seems to produce the correct result in some simple test cases and are much cleaner that my previous construct (see [6])! In particular, the fact that I don't even need UNNEST makes some queries really concise :)
Small follow-up question: Is there an issue tracking the status of this feature? I couldn't find it in JIRA... Cheers, Ingo [6] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/commit/1023d97 > -----Original Message----- > From: Müller Ingo <[email protected]> > Sent: Tuesday, August 3, 2021 6:03 PM > To: [email protected] > Subject: RE: Producing combinations of array elements in SQL++ > > Dear Mike, dear Dmitry, > > Thanks a lot for the quick and helpful answers! I think what I was looking > for is > the "AT posVar". We are using similar constructs in several other SQL > dialects/query languages. I will try it out right away, but I am not sure yet > how > we will deal with such a "beta" feature in our study. > > @Mike: I indeed "just" need all combinations and the position of array > elements > is just a means to get that -- their order isn't important. (In fact, queries > typically > compute some metric for each combination and then the combination that > minimizes or maximizes that metric.) If you know a better way to express > that, I > am very interested! > > All the best, > Ingo > > > > -----Original Message----- > > From: Mike Carey <[email protected]> > > Sent: Tuesday, August 3, 2021 5:48 PM > > To: [email protected] > > Subject: Re: Producing combinations of array elements in SQL++ > > > > Ingo, > > > > ALSO: Perhaps clarify what you mean by "unique"? Suppose an array > > has [1, 2, 2, 3] as its contents. What would the desired result be? > > The best approach to this will depend on what you mean by that. > > > > > > Cheers, > > > > Mike > > > > > > On 8/3/21 8:38 AM, Dmitry Lychagin wrote: > > > > > > Hi Ingo, > > > > > > > > 1) SQL++ supports positional variables in the FROM clause, as follows: > > > > FROM dataset AS ds, ds.array_field AS elementVar AT posVar > > > > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178 > > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178> -- > FromTerm) > > > > or using UNNEST instead of “,” > > > > FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar > > > > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181 > > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181> – > > UnnestClause) > > > > > > > > Positional variable is bound to a position of an element inside an > > array which is being unnested. > > > > > > > > Here’s how your query could look like using these variables: > > > > SELECT … > > > > FROM base_table, base_table.array_field AS array_element1 AT pos1, > > base_table.array_field AS array_element2 AT pos2 > > > > WHERE pos1 < pos2 > > > > > > > > 2) As for window function calls. I can confirm that if there is no > > ORDER BY sub-clause inside OVER clause then the order of the elements > > processed by window functions (row_number(), rank(), etc) is not guaranteed. > > > > > > > > Thanks, > > > > -- Dmitry > > > > > > > > > > > > From: Müller Ingo <[email protected]> > > <mailto:[email protected]> > > Reply-To: "[email protected]" > > <mailto:[email protected]> <[email protected]> > > <mailto:[email protected]> > > Date: Tuesday, August 3, 2021 at 1:39 AM > > To: "[email protected]" > > <mailto:[email protected]> <[email protected]> > > <mailto:[email protected]> > > Subject: Producing combinations of array elements in SQL++ > > > > > > > > EXTERNAL EMAIL: Use caution when opening attachments or clicking on > > links > > > > > > > > > > > > Dear SQL++ experts, > > > > > > > > I am working on an academic study [1] comparing the expressiveness of > > various query languages for nested data in a high-energy physics use case. > > Among other systems, I have implemented the queries of the use case in > > SQL++ [2]. In general, SQL++ seems to be quite well suited for those > > queries, but one frequent pattern seems to have only cumbersome > formulations. > > > > > > > > In short, I am missing the functionality to produce (deterministic) > > element identifiers for array elements. In BigQuery’s SQL dialect [3], > > this can be done with “UNNEST(array_field) WITH OFFSET AS idx”. For > > now, I am using a construct similar to this: “FROM (SELECT > > array_field.*, row_number() OVER () AS idx FROM > > base_table.array_field) AS array_element” (see full query here [4]). I > > have two issues with this: First, it needs undebatably more characters > > than the BigQuery version and it is arguably also more cumbersome. > > Second, I don’t think that it is even correct since the element > > numbers are not guaranteed to be deterministic (i.e., if I use that > > construct in several places in the query, the same array element may > > get different values for row_number()). The documentation on > > row_number says “If the window order clause is omitted, the return > > values may be unpredictable.“ However, introducing an order clause not > > only makes the pattern even more cumbersome (the elements are objects > > with at least four fields), it also does not guarantee a deterministic > > order still. The documentation of the Window Order Clause says: “The > > row_number() function returns a distinct number for each tuple. If > > tuples are tied, the results may be unpredictable.” (To be fair, if > > two array elements agree on all fields, I can probably treat them > > interchangeably in my use case.) > > > > > > > > With that introduction, two questions: (1) Can you confirm that there > > is really no language construct better than what I currently use (such > > as BigQuery’s “OFFSET”)? (2) Can you confirm that row_number() is > > really undeterministic in the way I use it? > > > > > > > > I need these element identifiers for producing combinations (in the > > mathematical sense [5]) of the elements in an array field, but maybe > > there is a better way. In the simple case of producing all (unique) > > pairs of elements of a particular array, I use a construct like the > > following: > > > > > > > > FROM (SELECT array_field.*, row_number() OVER () AS idx FROM > > base_table.array_field) AS array_element1, > > > > (SELECT array_field.*, row_number() OVER () AS idx FROM > > base_table. array_field) AS array_element2 > > > > WHERE > > > > array_element1.idx < array_element2.idx > > > > > > > > So a last question: (3) Can you think of a better way to compute > > combinations? > > > > > > > > Thanks a lot in advance and best regards, > > > > Ingo > > > > > > > > > > > > [1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, Mason > > Proffitt, Gustavo Alonso. "Evaluating Query Languages and Systems for > > High- Energy Physics Data." arXiv: 2104.12615 [cs.DB], 2021. > > https://arxiv.org/abs/2104.12615 > > > > [2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp > > <https://github.com/RumbleDB/iris-hep-benchmark-sqlpp> > > > > [3] https://cloud.google.com/bigquery/docs/reference/standard- > > sql/arrays#flattening_arrays > > <https://cloud.google.com/bigquery/docs/reference/standard- > > sql/arrays#flattening_arrays> > > > > [4] https://github.com/RumbleDB/iris-hep-benchmark- > > sqlpp/blob/master/queries/query-5/query.sqlpp > > <https://github.com/RumbleDB/iris-hep-benchmark- > > sqlpp/blob/master/queries/query-5/query.sqlpp> > > > > [5] https://en.wikipedia.org/wiki/Combination > > > > > > > >
