Re: Producing combinations of array elements in SQL++

Dmitry Lychagin Tue, 03 Aug 2021 08:39:37 -0700

Hi Ingo,

1) SQL++ supports positional variables in the FROM clause, as follows:
FROM dataset AS ds, ds.array_field AS elementVar AT posVar
(see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod178  -- FromTerm)
or using UNNEST instead of “,”
FROM dataset AS ds UNNEST ds.array_field AS elementVar AT posVar
(see https://asterixdb.apache.org/docs/0.9.7/SQLPP.html#prod181 – UnnestClause)


Positional variable is bound to a position of an element inside an array which 
is being unnested.

Here’s how your query could look like using these variables:
SELECT …
FROM base_table, base_table.array_field AS  array_element1 AT pos1,  
base_table.array_field AS  array_element2 AT pos2
WHERE pos1 < pos2

2) As for window function calls. I can confirm that if there is no ORDER BY 
sub-clause inside OVER clause then the order of the elements processed by 
window functions (row_number(), rank(), etc) is not guaranteed.

Thanks,
-- Dmitry


From: Müller Ingo <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, August 3, 2021 at 1:39 AM
To: "[email protected]" <[email protected]>
Subject: Producing combinations of array elements in SQL++

 EXTERNAL EMAIL:  Use caution when opening attachments or clicking on links




Dear SQL++ experts,

I am working on an academic study [1] comparing the expressiveness of various 
query languages for nested data in a high-energy physics use case. Among other 
systems, I have implemented the queries of the use case in SQL++ [2]. In 
general, SQL++ seems to be quite well suited for those queries, but one 
frequent pattern seems to have only cumbersome formulations.

In short, I am missing the functionality to produce (deterministic) element 
identifiers for array elements. In BigQuery’s SQL dialect [3], this can be done 
with “UNNEST(array_field) WITH OFFSET AS idx”. For now, I am using a construct 
similar to this: “FROM (SELECT array_field.*, row_number() OVER () AS idx FROM 
base_table.array_field) AS array_element” (see full query here [4]). I have two 
issues with this: First, it needs undebatably more characters than the BigQuery 
version and it is arguably also more cumbersome. Second, I don’t think that it 
is even correct since the element numbers are not guaranteed to be 
deterministic (i.e., if I use that construct in several places in the query, 
the same array element may get different values for row_number()). The 
documentation on row_number says “If the window order clause is omitted, the 
return values may be unpredictable.“ However, introducing an order clause not 
only makes the pattern even more cumbersome (the elements are objects with at 
least four fields), it also does not guarantee a deterministic order still. The 
documentation of the Window Order Clause says: “The row_number() function 
returns a distinct number for each tuple. If tuples are tied, the results may 
be unpredictable.” (To be fair, if two array elements agree on all fields, I 
can probably treat them interchangeably in my use case.)

With that introduction, two questions: (1) Can you confirm that there is really 
no language construct better than what I currently use (such as BigQuery’s 
“OFFSET”)? (2) Can you confirm that row_number() is really undeterministic in 
the way I use it?

I need these element identifiers for producing combinations (in the 
mathematical sense [5]) of the elements in an array field, but maybe there is a 
better way. In the simple case of producing all (unique) pairs of elements of a 
particular array, I use a construct like the following:

      FROM (SELECT array_field.*, row_number() OVER () AS idx FROM 
base_table.array_field) AS array_element1,
           (SELECT array_field.*, row_number() OVER () AS idx FROM base_table. 
array_field) AS array_element2
      WHERE
        array_element1.idx < array_element2.idx

So a last question: (3) Can you think of a better way to compute combinations?

Thanks a lot in advance and best regards,
Ingo


[1] Dan Graur, Ingo Müller, Ghislain Fourny, Gordon T. Watts, Mason Proffitt, 
Gustavo Alonso. "Evaluating Query Languages and Systems for High-Energy 
Physics Data." arXiv: 2104.12615 [cs.DB], 2021. https://arxiv.org/abs/2104.12615
[2] https://github.com/RumbleDB/iris-hep-benchmark-sqlpp
[3] 
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening_arrays
[4] 
https://github.com/RumbleDB/iris-hep-benchmark-sqlpp/blob/master/queries/query-5/query.sqlpp
[5] https://en.wikipedia.org/wiki/Combination

Re: Producing combinations of array elements in SQL++

Reply via email to