Hi Dmitry,
the third version is the way to go, IMO.
You might want to have a larger number of partitions if you are planning to
later increase the parallelism of the job.
Also note, that it is not guaranteed that 4 keys are uniformly distributed
to 4 tasks. It might happen that one task ends up wi
Hi Dmitry,
In all cases, the result of the countWindow will be also grouped by key because
of
the keyBy() that you are using.
If you want to have a non-keyed stream and then split it in count windows,
remove
the keyBy() and instead of countWindow(), use countWindowAll(). This will have
paral
Hi,
I'm looking for the right way to do the following scheme:
1. Read data
2. Split it into partitions for parallel processing
3. In every partition group data in N elements batches
4. Process these batches
My first attempt was:
*dataStream.keyBy(_.key).countWindow(..)*
But countWindow groups by