Hi,

Memory consumption and checkpointed data seems to increase incrementally
when reduceByKeyAndWindow with inverse function is used with mapWithState. 

My application uses stateful streaming with mapWithState. The keys generated
by mapWithState are then used by reduceByKeyAndWindow to do rolling counts
for  24 hours. The MapWithStateRDD seems to be getting persisted forever
even though I have checkpointing enabled every 10 minutes and the
ShuffledRDD generated by reduceByKeyAndWindow seems to be getting
incremented in memory linearly. Any idea why this happens?

Is it a possibility that ShuffledRDD is caching some data from mapWithState
as it is dependent on that for keys?



Thanks,
Swetha



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-consumption-and-checkpointed-data-seems-to-increase-incrementally-when-reduceByKeyAndWIndow-wg-tp28860.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to