Hi Ioannis,

with a flatMap operation which replicates elements and assigning them a
proper key followed by a keyBy operation you can practically generate all
different kinds of partitionings.

So if you first collect the data in parallel windows, you can then
replicate half of the data of each window for each other window (assigning
the replicates for each other window a distinct key). Next you group on
this key and calculate the cartesian product for each resulting group. This
should give you a parallel cartesian product.

Cheers,
Till

On Thu, Feb 16, 2017 at 2:09 PM, Ioannis Kontopoulos <kls.yan...@gmail.com>
wrote:

> Hello everyone,
>
> Given a stream of events (each event has a timestamp and a key), I want to
> create all possible combinations of the keys in a window (sliding, event
> time) and then process those combinations in parallel.
>
> For example, if the stream contains events with keys 1,2,3,4 in a given
> window and the possible combinations are:
>
> 1-2
> 1-3
> 1-4
> 2-3
> 2-4
> 3-4
>
> and if the parallelism is set to 2, I want to have events with these keys:
>
> 1-2    2-3
> 1-3    2-4
> 1-4    3-4
>
> You can see that there is some replication. So when I use the apply method
> on a window it will have the keys separated like the example above.
>
> Is there a way to do that?
>

Reply via email to