Hi Takeshi thanks for the answer, looks like spark would free up old RDD's
however using admin UI we see ie

 Block ID, it corresponds with each receiver and a timestamp.
For example, block input-0-1485275695898 is from receiver 0 and it was
created at 1485275695898 (1/24/2017, 11:34:55 AM GMT-5:00).
That corresponds with the start time.

that block even after running whole day is still not being released! RDD's
in our scenario are Strings coming from kinesis stream

is there a way to explicitly purge RDD after last step in M/R process once
and for all ?

thanks much!

On Fri, Jan 20, 2017 at 2:35 AM, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> Hi,
>
> AFAIK, the blocks of minibatch RDDs are checked every job finished, and
> older blocks automatically removed (See: https://github.com/
> apache/spark/blob/master/streaming/src/main/scala/org/
> apache/spark/streaming/dstream/DStream.scala#L463).
>
> You can control this behaviour by StreamingContext#remember to some extent.
>
> // maropu
>
>
> On Fri, Jan 20, 2017 at 3:17 AM, Andrew Milkowski <amgm2...@gmail.com>
> wrote:
>
>> hello
>>
>> using spark 2.0.2  and while running sample streaming app with kinesis
>> noticed (in admin ui Storage tab)  "Stream Blocks" for each worker keeps
>> climbing up
>>
>> then also (on same ui page) in Blocks section I see blocks such as below
>>
>> input-0-1484753367056
>>
>> that are marked as Memory Serialized
>>
>> that do not seem to be "released"
>>
>> above eventually consumes executor memories leading to out of memory
>> exception on some
>>
>> is there a way to "release" these blocks free them up , app is sample m/r
>>
>> I attempted rdd.unpersist(false) in the code but that did not lead to
>> memory free up
>>
>> thanks much in advance!
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Reply via email to