Hey Mike,

Thanks a lot for following up! I realize that one aspect wasn't very explicit 
in my description: I only want one permutation of each combination, e.g., I 
only want one pair out of (a1, a2) and (a2, a1). The Cartesian product contains 
both. In your example data, for "pkey": 2, I only want one of (3, 5) and (5, 3).

(This is really just a performance optimization: of all of the pairs, I 
eventually pick the one that maximizes some metric, and that metric is 
symmetric, so I can could pick any of the equivalent permutations. However, I 
doubt that any query optimizer can detect that, and for 3-combinations, that 
optimization makes a really significant difference.)

Off topic: ROOT [1], the standard framework used by physicists today, and 
interestingly PrestoDB as well [2], have a built-in function "combinations" for 
that purpose.

Cheers,
Ingo


[1] 
https://root.cern/doc/master/group__vecops.html#gabbbf82257156b750c7b9bdd39a53ee33
[2] https://prestodb.io/docs/current/functions/array.html#combinations


> -----Original Message-----
> From: Mike Carey <dtab...@gmail.com>
> Sent: Thursday, August 5, 2021 1:03 AM
> To: users@asterixdb.apache.org
> Subject: Re: Producing combinations of array elements in SQL++
> 
> Ingo,
> 
> So if I'm understanding the use case right, and order is irrelevant, and it's 
> just
> about computing unique pairs, here is a potentially simpler version:
> 
>       SELECT pkey,
>              (SELECT DISTINCT a1, a2
>               FROM ds.array_field AS a1,
>                    ds.array_field AS a2
>               ) AS upairs
>       FROM my_dataset AS ds;
> 
> I.e., do a cartesian product of the array with itself and eliminate duplicate 
> pairs.
> No positional info needed.  Does that meet the problem statement?  (I may be
> missing a part of it.)
> 
> 
> Cheers,
> 
> Mike
> 
> PS - Here is an example that can be run to test it w/o creating data (just
> pretending to do so with WITH):
> 
>       WITH my_dataset AS
>          ([{"pkey": 1, "array_field":[1, 2, 2, 4]},
>           {"pkey": 2, "array_field":[3, 5]}])
>       SELECT pkey,
>              (SELECT DISTINCT a1, a2
>               FROM ds.array_field AS a1,
>                    ds.array_field AS a2
>               ) AS upairs
>       FROM my_dataset AS ds;

Reply via email to