Do we have any guarantees on the maximum duration?
I've seen RDDs kept around for 7-10 minutes on batches of 20 secs and
checkpoint of 100 secs. No windows, just updateStateByKey.
t's not a memory issue but on checkpoint recovery it goes back to Kafka for 10
minutes of data, any idea why?
Its just in the same thread for a particular RDD, I need to uncache it
every 2 minutes to clear out the data that is present in a Map inside that.
On Wed, Nov 4, 2015 at 11:54 PM, Saisai Shao wrote:
> Hi Swetha,
>
> Would you mind elaborating your usage scenario of
Spark streaming automatically takes care of unpersisting any RDDs generated
by DStream. You can set the StreamingContext.remember() to set the minimum
persistence duration. Any persisted RDD older than that will be
automatically unpersisted
On Thu, Nov 5, 2015 at 9:12 AM, swetha kasireddy
Hi,
How to unpersist a DStream in Spark Streaming? I know that we can persist
using dStream.persist() or dStream.cache. But, I don't see any method to
unPersist.
Thanks,
Swetha
--
View this message in context:
Hi Swetha,
Would you mind elaborating your usage scenario of DStream unpersisting?
>From my understanding:
1. Spark Streaming will automatically unpersist outdated data (you already
mentioned about the configurations).
2. If streaming job is started, I think you may lose the control of the
job,
Hi,
DStream->Discretized Streams are made up of multiple RDDs
You can unpersist each RDD by accessing the individual RDD's using
dstreamrdd.foreachRDD
{
rdd.unpersist().
}
--
View this message in context:
Other than setting the following.
sparkConf.set("spark.streaming.unpersist", "true")
sparkConf.set("spark.cleaner.ttl", "7200s")
On Wed, Nov 4, 2015 at 5:03 PM, swetha wrote:
> Hi,
>
> How to unpersist a DStream in Spark Streaming? I know that we can persist
> using