Re: Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

2017-06-26 Thread Tathagata Das
Unfortunately the way reduceByKeyAndWindow is implemented, it does iterate through all the counts. To have something more efficient, you may have to implement your own windowing logic using mapWithState. Something like eventDStream.flatmap { event => // find the windows each even maps to, and

Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

2017-06-26 Thread SRK
Hi, We have reduceByKeyAndWindow with inverse function feature in our Streaming job to calculate rolling counts for the past hour and for the past 24 hours. It seems that the functionality is iterating over all the keys in the window even though they are not present in the current batch causing