subject:"Count window on partition"

Re: Count window on partition

2017-01-23 Thread Fabian Hueske

Hi Dmitry, the third version is the way to go, IMO. You might want to have a larger number of partitions if you are planning to later increase the parallelism of the job. Also note, that it is not guaranteed that 4 keys are uniformly distributed to 4 tasks. It might happen that one task ends up wi

Re: Count window on partition

2017-01-23 Thread Kostas Kloudas

Hi Dmitry, In all cases, the result of the countWindow will be also grouped by key because of the keyBy() that you are using. If you want to have a non-keyed stream and then split it in count windows, remove the keyBy() and instead of countWindow(), use countWindowAll(). This will have paral

Count window on partition

2017-01-23 Thread Dmitry Golubets

Hi, I'm looking for the right way to do the following scheme: 1. Read data 2. Split it into partitions for parallel processing 3. In every partition group data in N elements batches 4. Process these batches My first attempt was: *dataStream.keyBy(_.key).countWindow(..)* But countWindow groups by

Re: Count window on partition

Re: Count window on partition

Count window on partition

3 matches

Site Navigation

Mail list logo

Footer information