Re: Producing combinations of array elements in SQL++

Mike Carey Tue, 03 Aug 2021 08:48:27 -0700

Ingo,

ALSO: Perhaps clarify what you mean by "unique"? Suppose an array has[1, 2, 2, 3] as its contents. What would the desired result be? Thebest approach to this will depend on what you mean by that.


Cheers,

Mike

On 8/3/21 8:38 AM, Dmitry Lychagin wrote:

Hi Ingo,

1) SQL++ supports positional variables in the FROM clause, as follows:

FROM dataset AS ds, ds.array_field AS elementVar AT posVar
(see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178<https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178> -- FromTerm)
or using UNNEST instead of “,”

FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar
(see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181<https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181> –UnnestClause)
Positional variable is bound to a position of an element inside anarray which is being unnested.
Here’s how your query could look like using these variables:

SELECT …
FROM base_table, base_table.array_field AS array_element1 AT pos1, base_table.array_field AS array_element2 AT pos2
WHERE pos1 < pos2
2) As for window function calls. I can confirm that if there is noORDER BY sub-clause inside OVER clause then the order of the elementsprocessed by window functions (row_number(), rank(), etc) is notguaranteed.
Thanks,

-- Dmitry

*From: *Müller Ingo <[email protected]>
*Reply-To: *"[email protected]" <[email protected]>
*Date: *Tuesday, August 3, 2021 at 1:39 AM
*To: *"[email protected]" <[email protected]>
*Subject: *Producing combinations of array elements in SQL++
*EXTERNAL EMAIL**: *Use caution when opening attachments or clickingon links
Dear SQL++ experts,
I am working on an academic study [1] comparing the expressiveness ofvarious query languages for nested data in a high-energy physics usecase. Among other systems, I have implemented the queries of the usecase in SQL++ [2]. In general, SQL++ seems to be quite well suited forthose queries, but one frequent pattern seems to have only cumbersomeformulations.
In short, I am missing the functionality to produce (deterministic)element identifiers for array elements. In BigQuery’s SQL dialect [3],this can be done with “UNNEST(array_field) WITH OFFSET AS idx”. Fornow, I am using a construct similar to this: “FROM (SELECTarray_field.*, row_number() OVER () AS idx FROMbase_table.array_field) AS array_element” (see full query here [4]). Ihave two issues with this: First, it needs undebatably more charactersthan the BigQuery version and it is arguably also more cumbersome.Second, I don’t think that it is even correct since the elementnumbers are not guaranteed to be deterministic (i.e., if I use thatconstruct in several places in the query, the same array element mayget different values for row_number()). The documentation onrow_number says “If the window order clause is omitted, the returnvalues may be unpredictable.“ However, introducing an order clause notonly makes the pattern even more cumbersome (the elements are objectswith at least four fields), it also does not guarantee a deterministicorder still. The documentation of the Window Order Clause says: “Therow_number() function returns a distinct number for each tuple. Iftuples are tied, the results may be unpredictable.” (To be fair, iftwo array elements agree on all fields, I can probably treat theminterchangeably in my use case.)
With that introduction, two questions: (1) Can you confirm that thereis really no language construct better than what I currently use (suchas BigQuery’s “OFFSET”)? (2) Can you confirm that row_number() isreally undeterministic in the way I use it?
I need these element identifiers for producing combinations (in themathematical sense [5]) of the elements in an array field, but maybethere is a better way. In the simple case of producing all (unique)pairs of elements of a particular array, I use a construct like thefollowing:
FROM (SELECT array_field.*, row_number() OVER () AS idx FROMbase_table.array_field) AS array_element1,
(SELECT array_field.*, row_number() OVER () AS idx FROMbase_table. array_field) AS array_element2
      WHERE

        array_element1.idx < array_element2.idx
So a last question: (3) Can you think of a better way to computecombinations?
Thanks a lot in advance and best regards,

Ingo
[1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, MasonProffitt, Gustavo Alonso. "Evaluating Query Languages and Systems forHigh-Energy Physics Data." arXiv: 2104.12615 [cs.DB], 2021.https://arxiv.org/abs/2104.12615 <https://arxiv.org/abs/2104.12615>
[2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp<https://github.com/RumbleDB/iris-hep-benchmark-sqlpp>
[3]https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays<https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays>
[4]https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/blob/master/queries/query-5/query.sqlpp<https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/blob/master/queries/query-5/query.sqlpp>
[5] https://en.wikipedia.org/wiki/Combination<https://en.wikipedia.org/wiki/Combination>

Re: Producing combinations of array elements in SQL++

Reply via email to