Re: Count window on partition

2017-01-23 Thread Fabian Hueske
Hi Dmitry, the third version is the way to go, IMO. You might want to have a larger number of partitions if you are planning to later increase the parallelism of the job. Also note, that it is not guaranteed that 4 keys are uniformly distributed to 4 tasks. It might happen that one task ends up wi

Re: Count window on partition

2017-01-23 Thread Kostas Kloudas
Hi Dmitry, In all cases, the result of the countWindow will be also grouped by key because of the keyBy() that you are using. If you want to have a non-keyed stream and then split it in count windows, remove the keyBy() and instead of countWindow(), use countWindowAll(). This will have paral

Count window on partition

2017-01-23 Thread Dmitry Golubets
Hi, I'm looking for the right way to do the following scheme: 1. Read data 2. Split it into partitions for parallel processing 3. In every partition group data in N elements batches 4. Process these batches My first attempt was: *dataStream.keyBy(_.key).countWindow(..)* But countWindow groups by