Hi Takeshi thanks for the answer, looks like spark would free up old RDD's however using admin UI we see ie
Block ID, it corresponds with each receiver and a timestamp. For example, block input-0-1485275695898 is from receiver 0 and it was created at 1485275695898 (1/24/2017, 11:34:55 AM GMT-5:00). That corresponds with the start time. that block even after running whole day is still not being released! RDD's in our scenario are Strings coming from kinesis stream is there a way to explicitly purge RDD after last step in M/R process once and for all ? thanks much! On Fri, Jan 20, 2017 at 2:35 AM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, > > AFAIK, the blocks of minibatch RDDs are checked every job finished, and > older blocks automatically removed (See: https://github.com/ > apache/spark/blob/master/streaming/src/main/scala/org/ > apache/spark/streaming/dstream/DStream.scala#L463). > > You can control this behaviour by StreamingContext#remember to some extent. > > // maropu > > > On Fri, Jan 20, 2017 at 3:17 AM, Andrew Milkowski <amgm2...@gmail.com> > wrote: > >> hello >> >> using spark 2.0.2 and while running sample streaming app with kinesis >> noticed (in admin ui Storage tab) "Stream Blocks" for each worker keeps >> climbing up >> >> then also (on same ui page) in Blocks section I see blocks such as below >> >> input-0-1484753367056 >> >> that are marked as Memory Serialized >> >> that do not seem to be "released" >> >> above eventually consumes executor memories leading to out of memory >> exception on some >> >> is there a way to "release" these blocks free them up , app is sample m/r >> >> I attempted rdd.unpersist(false) in the code but that did not lead to >> memory free up >> >> thanks much in advance! >> > > > > -- > --- > Takeshi Yamamuro >