Hi Dmitry, Thanks a lot for the pointers! (In the snippets there, I also learned about the LET clause. Great feature -- I'll need to review the query implementations again to use it!)
Cheers, Ingo > -----Original Message----- > From: Dmitry Lychagin <[email protected]> > Sent: Wednesday, August 4, 2021 3:22 AM > To: [email protected] > Subject: Re: Producing combinations of array elements in SQL++ > > Hi Ingo, > > I think positional vars should probably work fine in your usecase (unnesting > array > fields within an object). > There are issues with this feature when it's used in other contexts though > (e.g. > position of an object inside a dataset) These are open bugs in JIRA that track > problems with this feature: [1][2][3] > > [1] https://issues.apache.org/jira/browse/ASTERIXDB-1050 > [2] https://issues.apache.org/jira/browse/ASTERIXDB-1052 > [3] https://issues.apache.org/jira/browse/ASTERIXDB-2595 (this one is > relevant > for array unnesting too) > > Thanks, > -- Dmitry > > > On 8/3/21, 10:20 AM, "Müller Ingo" <[email protected]> wrote: > > EXTERNAL EMAIL: Use caution when opening attachments or clicking on > links > > > > > > Quick update: The "AT posVar" construct seems to produce the correct > result > in some simple test cases and are much cleaner that my previous construct (see > [6])! In particular, the fact that I don't even need UNNEST makes some queries > really concise :) > > Small follow-up question: Is there an issue tracking the status of this > feature? I > couldn't find it in JIRA... > > Cheers, > Ingo > > > [6] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/commit/1023d97 > > > > -----Original Message----- > > From: Müller Ingo <[email protected]> > > Sent: Tuesday, August 3, 2021 6:03 PM > > To: [email protected] > > Subject: RE: Producing combinations of array elements in SQL++ > > > > Dear Mike, dear Dmitry, > > > > Thanks a lot for the quick and helpful answers! I think what I was > looking for > is > > the "AT posVar". We are using similar constructs in several other SQL > > dialects/query languages. I will try it out right away, but I am not > sure yet > how > > we will deal with such a "beta" feature in our study. > > > > @Mike: I indeed "just" need all combinations and the position of array > elements > > is just a means to get that -- their order isn't important. (In fact, > queries > typically > > compute some metric for each combination and then the combination that > > minimizes or maximizes that metric.) If you know a better way to express > that, I > > am very interested! > > > > All the best, > > Ingo > > > > > > > -----Original Message----- > > > From: Mike Carey <[email protected]> > > > Sent: Tuesday, August 3, 2021 5:48 PM > > > To: [email protected] > > > Subject: Re: Producing combinations of array elements in SQL++ > > > > > > Ingo, > > > > > > ALSO: Perhaps clarify what you mean by "unique"? Suppose an array > > > has [1, 2, 2, 3] as its contents. What would the desired result be? > > > The best approach to this will depend on what you mean by that. > > > > > > > > > Cheers, > > > > > > Mike > > > > > > > > > On 8/3/21 8:38 AM, Dmitry Lychagin wrote: > > > > > > > > > Hi Ingo, > > > > > > > > > > > > 1) SQL++ supports positional variables in the FROM clause, as > follows: > > > > > > FROM dataset AS ds, ds.array_field AS elementVar AT posVar > > > > > > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178 > > > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178> -- > > FromTerm) > > > > > > or using UNNEST instead of “,” > > > > > > FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar > > > > > > (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181 > > > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181> – > > > UnnestClause) > > > > > > > > > > > > Positional variable is bound to a position of an element inside an > > > array which is being unnested. > > > > > > > > > > > > Here’s how your query could look like using these variables: > > > > > > SELECT … > > > > > > FROM base_table, base_table.array_field AS array_element1 AT > pos1, > > > base_table.array_field AS array_element2 AT pos2 > > > > > > WHERE pos1 < pos2 > > > > > > > > > > > > 2) As for window function calls. I can confirm that if there is no > > > ORDER BY sub-clause inside OVER clause then the order of the elements > > > processed by window functions (row_number(), rank(), etc) is not > guaranteed. > > > > > > > > > > > > Thanks, > > > > > > -- Dmitry > > > > > > > > > > > > > > > > > > From: Müller Ingo <[email protected]> > > > <mailto:[email protected]> > > > Reply-To: "[email protected]" > > > <mailto:[email protected]> <[email protected]> > > > <mailto:[email protected]> > > > Date: Tuesday, August 3, 2021 at 1:39 AM > > > To: "[email protected]" > > > <mailto:[email protected]> <[email protected]> > > > <mailto:[email protected]> > > > Subject: Producing combinations of array elements in SQL++ > > > > > > > > > > > > EXTERNAL EMAIL: Use caution when opening attachments or clicking on > > > links > > > > > > > > > > > > > > > > > > Dear SQL++ experts, > > > > > > > > > > > > I am working on an academic study [1] comparing the > expressiveness of > > > various query languages for nested data in a high-energy physics use > case. > > > Among other systems, I have implemented the queries of the use case in > > > SQL++ [2]. In general, SQL++ seems to be quite well suited for those > > > queries, but one frequent pattern seems to have only cumbersome > > formulations. > > > > > > > > > > > > In short, I am missing the functionality to produce > (deterministic) > > > element identifiers for array elements. In BigQuery’s SQL dialect [3], > > > this can be done with “UNNEST(array_field) WITH OFFSET AS idx”. For > > > now, I am using a construct similar to this: “FROM (SELECT > > > array_field.*, row_number() OVER () AS idx FROM > > > base_table.array_field) AS array_element” (see full query here [4]). I > > > have two issues with this: First, it needs undebatably more characters > > > than the BigQuery version and it is arguably also more cumbersome. > > > Second, I don’t think that it is even correct since the element > > > numbers are not guaranteed to be deterministic (i.e., if I use that > > > construct in several places in the query, the same array element may > > > get different values for row_number()). The documentation on > > > row_number says “If the window order clause is omitted, the return > > > values may be unpredictable.“ However, introducing an order clause not > > > only makes the pattern even more cumbersome (the elements are objects > > > with at least four fields), it also does not guarantee a deterministic > > > order still. The documentation of the Window Order Clause says: “The > > > row_number() function returns a distinct number for each tuple. If > > > tuples are tied, the results may be unpredictable.” (To be fair, if > > > two array elements agree on all fields, I can probably treat them > > > interchangeably in my use case.) > > > > > > > > > > > > With that introduction, two questions: (1) Can you confirm that > there > > > is really no language construct better than what I currently use (such > > > as BigQuery’s “OFFSET”)? (2) Can you confirm that row_number() is > > > really undeterministic in the way I use it? > > > > > > > > > > > > I need these element identifiers for producing combinations (in > the > > > mathematical sense [5]) of the elements in an array field, but maybe > > > there is a better way. In the simple case of producing all (unique) > > > pairs of elements of a particular array, I use a construct like the > following: > > > > > > > > > > > > FROM (SELECT array_field.*, row_number() OVER () AS idx FROM > > > base_table.array_field) AS array_element1, > > > > > > (SELECT array_field.*, row_number() OVER () AS idx FROM > > > base_table. array_field) AS array_element2 > > > > > > WHERE > > > > > > array_element1.idx < array_element2.idx > > > > > > > > > > > > So a last question: (3) Can you think of a better way to compute > > > combinations? > > > > > > > > > > > > Thanks a lot in advance and best regards, > > > > > > Ingo > > > > > > > > > > > > > > > > > > [1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, > Mason > > > Proffitt, Gustavo Alonso. "Evaluating Query Languages and Systems for > > > High- Energy Physics Data." arXiv: 2104.12615 [cs.DB], 2021. > > > https://arxiv.org/abs/2104.12615 > > > > > > [2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp > > > <https://github.com/RumbleDB/iris-hep-benchmark-sqlpp> > > > > > > [3] https://cloud.google.com/bigquery/docs/reference/standard- > > > sql/arrays#flattening_arrays > > > <https://cloud.google.com/bigquery/docs/reference/standard- > > > sql/arrays#flattening_arrays> > > > > > > [4] https://github.com/RumbleDB/iris-hep-benchmark- > > > sqlpp/blob/master/queries/query-5/query.sqlpp > > > <https://github.com/RumbleDB/iris-hep-benchmark- > > > sqlpp/blob/master/queries/query-5/query.sqlpp> > > > > > > [5] https://en.wikipedia.org/wiki/Combination > > > > > > > > > > > > >
