RE: Producing combinations of array elements in SQL++

Müller Ingo Wed, 04 Aug 2021 00:23:10 -0700

Hi Dmitry,

Thanks a lot for the pointers! (In the snippets there, I also learned about the 
LET clause. Great feature -- I'll need to review the query implementations 
again to use it!)


Cheers,
Ingo


> -----Original Message-----
> From: Dmitry Lychagin <[email protected]>
> Sent: Wednesday, August 4, 2021 3:22 AM
> To: [email protected]
> Subject: Re: Producing combinations of array elements in SQL++
> 
> Hi Ingo,
> 
> I think positional vars should probably work fine in your usecase (unnesting 
> array
> fields within an object).
> There are issues with this feature when it's used in other contexts though 
> (e.g.
> position of an object inside a dataset) These are open bugs in JIRA that track
> problems with this feature: [1][2][3]
> 
> [1] https://issues.apache.org/jira/browse/ASTERIXDB-1050
> [2] https://issues.apache.org/jira/browse/ASTERIXDB-1052
> [3] https://issues.apache.org/jira/browse/ASTERIXDB-2595  (this one is 
> relevant
> for array unnesting too)
> 
> Thanks,
> -- Dmitry
> 
> 
> On 8/3/21, 10:20 AM, "Müller  Ingo" <[email protected]> wrote:
> 
>      EXTERNAL EMAIL:  Use caution when opening attachments or clicking on 
> links
> 
> 
> 
> 
> 
>     Quick update: The "AT posVar" construct seems to produce the correct 
> result
> in some simple test cases and are much cleaner that my previous construct (see
> [6])! In particular, the fact that I don't even need UNNEST makes some queries
> really concise :)
> 
>     Small follow-up question: Is there an issue tracking the status of this 
> feature? I
> couldn't find it in JIRA...
> 
>     Cheers,
>     Ingo
> 
> 
>     [6] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/commit/1023d97
> 
> 
>     > -----Original Message-----
>     > From: Müller Ingo <[email protected]>
>     > Sent: Tuesday, August 3, 2021 6:03 PM
>     > To: [email protected]
>     > Subject: RE: Producing combinations of array elements in SQL++
>     >
>     > Dear Mike, dear Dmitry,
>     >
>     > Thanks a lot for the quick and helpful answers! I think what I was 
> looking for
> is
>     > the "AT posVar". We are using similar constructs in several other SQL
>     > dialects/query languages. I will try it out right away, but I am not 
> sure yet
> how
>     > we will deal with such a "beta" feature in our study.
>     >
>     > @Mike: I indeed "just" need all combinations and the position of array
> elements
>     > is just a means to get that -- their order isn't important. (In fact, 
> queries
> typically
>     > compute some metric for each combination and then the combination that
>     > minimizes or maximizes that metric.) If you know a better way to express
> that, I
>     > am very interested!
>     >
>     > All the best,
>     > Ingo
>     >
>     >
>     > > -----Original Message-----
>     > > From: Mike Carey <[email protected]>
>     > > Sent: Tuesday, August 3, 2021 5:48 PM
>     > > To: [email protected]
>     > > Subject: Re: Producing combinations of array elements in SQL++
>     > >
>     > > Ingo,
>     > >
>     > > ALSO:  Perhaps clarify what you mean by "unique"?  Suppose an array
>     > > has [1, 2, 2, 3] as its contents.  What would the desired result be?
>     > > The best approach to this will depend on what you mean by that.
>     > >
>     > >
>     > > Cheers,
>     > >
>     > > Mike
>     > >
>     > >
>     > > On 8/3/21 8:38 AM, Dmitry Lychagin wrote:
>     > >
>     > >
>     > >     Hi Ingo,
>     > >
>     > >
>     > >
>     > >     1) SQL++ supports positional variables in the FROM clause, as 
> follows:
>     > >
>     > >     FROM dataset AS ds, ds.array_field AS elementVar AT posVar
>     > >
>     > >     (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178
>     > > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178>   --
>     > FromTerm)
>     > >
>     > >     or using UNNEST instead of “,”
>     > >
>     > >     FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar
>     > >
>     > >     (see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181
>     > > <https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181>  –
>     > > UnnestClause)
>     > >
>     > >
>     > >
>     > >     Positional variable is bound to a position of an element inside an
>     > > array which is being unnested.
>     > >
>     > >
>     > >
>     > >     Here’s how your query could look like using these variables:
>     > >
>     > >     SELECT …
>     > >
>     > >     FROM base_table, base_table.array_field AS  array_element1 AT 
> pos1,
>     > > base_table.array_field AS  array_element2 AT pos2
>     > >
>     > >     WHERE pos1 < pos2
>     > >
>     > >
>     > >
>     > >     2) As for window function calls. I can confirm that if there is no
>     > > ORDER BY sub-clause inside OVER clause then the order of the elements
>     > > processed by window functions (row_number(), rank(), etc) is not
> guaranteed.
>     > >
>     > >
>     > >
>     > >     Thanks,
>     > >
>     > >     -- Dmitry
>     > >
>     > >
>     > >
>     > >
>     > >
>     > >     From: Müller Ingo <[email protected]>
>     > > <mailto:[email protected]>
>     > >     Reply-To: "[email protected]"
>     > > <mailto:[email protected]>  <[email protected]>
>     > > <mailto:[email protected]>
>     > >     Date: Tuesday, August 3, 2021 at 1:39 AM
>     > >     To: "[email protected]"
>     > > <mailto:[email protected]>  <[email protected]>
>     > > <mailto:[email protected]>
>     > >     Subject: Producing combinations of array elements in SQL++
>     > >
>     > >
>     > >
>     > >  EXTERNAL EMAIL:  Use caution when opening attachments or clicking on
>     > > links
>     > >
>     > >
>     > >
>     > >
>     > >
>     > >     Dear SQL++ experts,
>     > >
>     > >
>     > >
>     > >     I am working on an academic study [1] comparing the 
> expressiveness of
>     > > various query languages for nested data in a high-energy physics use 
> case.
>     > > Among other systems, I have implemented the queries of the use case in
>     > > SQL++ [2]. In general, SQL++ seems to be quite well suited for those
>     > > queries, but one frequent pattern seems to have only cumbersome
>     > formulations.
>     > >
>     > >
>     > >
>     > >     In short, I am missing the functionality to produce 
> (deterministic)
>     > > element identifiers for array elements. In BigQuery’s SQL dialect [3],
>     > > this can be done with “UNNEST(array_field) WITH OFFSET AS idx”. For
>     > > now, I am using a construct similar to this: “FROM (SELECT
>     > > array_field.*, row_number() OVER () AS idx FROM
>     > > base_table.array_field) AS array_element” (see full query here [4]). I
>     > > have two issues with this: First, it needs undebatably more characters
>     > > than the BigQuery version and it is arguably also more cumbersome.
>     > > Second, I don’t think that it is even correct since the element
>     > > numbers are not guaranteed to be deterministic (i.e., if I use that
>     > > construct in several places in the query, the same array element may
>     > > get different values for row_number()). The documentation on
>     > > row_number says “If the window order clause is omitted, the return
>     > > values may be unpredictable.“ However, introducing an order clause not
>     > > only makes the pattern even more cumbersome (the elements are objects
>     > > with at least four fields), it also does not guarantee a deterministic
>     > > order still. The documentation of the Window Order Clause says: “The
>     > > row_number() function returns a distinct number for each tuple. If
>     > > tuples are tied, the results may be unpredictable.” (To be fair, if
>     > > two array elements agree on all fields, I can probably treat them
>     > > interchangeably in my use case.)
>     > >
>     > >
>     > >
>     > >     With that introduction, two questions: (1) Can you confirm that 
> there
>     > > is really no language construct better than what I currently use (such
>     > > as BigQuery’s “OFFSET”)? (2) Can you confirm that row_number() is
>     > > really undeterministic in the way I use it?
>     > >
>     > >
>     > >
>     > >     I need these element identifiers for producing combinations (in 
> the
>     > > mathematical sense [5]) of the elements in an array field, but maybe
>     > > there is a better way. In the simple case of producing all (unique)
>     > > pairs of elements of a particular array, I use a construct like the 
> following:
>     > >
>     > >
>     > >
>     > >           FROM (SELECT array_field.*, row_number() OVER () AS idx FROM
>     > > base_table.array_field) AS array_element1,
>     > >
>     > >                (SELECT array_field.*, row_number() OVER () AS idx FROM
>     > > base_table. array_field) AS array_element2
>     > >
>     > >           WHERE
>     > >
>     > >             array_element1.idx < array_element2.idx
>     > >
>     > >
>     > >
>     > >     So a last question: (3) Can you think of a better way to compute
>     > > combinations?
>     > >
>     > >
>     > >
>     > >     Thanks a lot in advance and best regards,
>     > >
>     > >     Ingo
>     > >
>     > >
>     > >
>     > >
>     > >
>     > >     [1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, 
> Mason
>     > > Proffitt, Gustavo Alonso. "Evaluating Query Languages and Systems for
>     > > High- Energy Physics Data." arXiv: 2104.12615 [cs.DB], 2021.
>     > > https://arxiv.org/abs/2104.12615
>     > >
>     > >     [2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp
>     > > <https://github.com/RumbleDB/iris-hep-benchmark-sqlpp>
>     > >
>     > >     [3] https://cloud.google.com/bigquery/docs/reference/standard-
>     > > sql/arrays#flattening_arrays
>     > > <https://cloud.google.com/bigquery/docs/reference/standard-
>     > > sql/arrays#flattening_arrays>
>     > >
>     > >     [4] https://github.com/RumbleDB/iris-hep-benchmark-
>     > > sqlpp/blob/master/queries/query-5/query.sqlpp
>     > > <https://github.com/RumbleDB/iris-hep-benchmark-
>     > > sqlpp/blob/master/queries/query-5/query.sqlpp>
>     > >
>     > >     [5] https://en.wikipedia.org/wiki/Combination
>     > >
>     > >
>     > >
>     > >
>

RE: Producing combinations of array elements in SQL++

Reply via email to