Re: Fault tolerant broadcast in updateStateByKey
I'm updating the Broadcast between batches, but I've ended up doing it in a listener, thanks! On Wed, Feb 8, 2017 at 12:31 AM Tathagata Daswrote: > broadcasts are not saved in checkpoints. so you have to save it externally > yourself, and recover it before restarting the stream from checkpoints. > > On Tue, Feb 7, 2017 at 3:55 PM, Amit Sela wrote: > > I know this approach, only thing is, it relies on the transformation being > an RDD transfomration as well and so could be applied via foreachRDD and > using the rdd context to avoid a stale context after recovery/resume. > My question is how to void stale context in a DStream-only transformation > such as updateStateByKey / mapWithState ? > > On Tue, Feb 7, 2017 at 9:19 PM Shixiong(Ryan) Zhu > wrote: > > It's documented here: > http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints > > On Tue, Feb 7, 2017 at 8:12 AM, Amit Sela wrote: > > Hi all, > > I was wondering if anyone ever used a broadcast variable within > an updateStateByKey op. ? Using it is straight-forward but I was wondering > how it'll work after resuming from checkpoint (using the rdd.context() > trick is not possible here) ? > > Thanks, > Amit > > > >
Re: Fault tolerant broadcast in updateStateByKey
broadcasts are not saved in checkpoints. so you have to save it externally yourself, and recover it before restarting the stream from checkpoints. On Tue, Feb 7, 2017 at 3:55 PM, Amit Selawrote: > I know this approach, only thing is, it relies on the transformation being > an RDD transfomration as well and so could be applied via foreachRDD and > using the rdd context to avoid a stale context after recovery/resume. > My question is how to void stale context in a DStream-only transformation > such as updateStateByKey / mapWithState ? > > On Tue, Feb 7, 2017 at 9:19 PM Shixiong(Ryan) Zhu > wrote: > >> It's documented here: http://spark.apache.org/docs/ >> latest/streaming-programming-guide.html#accumulators- >> broadcast-variables-and-checkpoints >> >> On Tue, Feb 7, 2017 at 8:12 AM, Amit Sela wrote: >> >> Hi all, >> >> I was wondering if anyone ever used a broadcast variable within >> an updateStateByKey op. ? Using it is straight-forward but I was wondering >> how it'll work after resuming from checkpoint (using the rdd.context() >> trick is not possible here) ? >> >> Thanks, >> Amit >> >> >>
Re: Fault tolerant broadcast in updateStateByKey
I know this approach, only thing is, it relies on the transformation being an RDD transfomration as well and so could be applied via foreachRDD and using the rdd context to avoid a stale context after recovery/resume. My question is how to void stale context in a DStream-only transformation such as updateStateByKey / mapWithState ? On Tue, Feb 7, 2017 at 9:19 PM Shixiong(Ryan) Zhuwrote: > It's documented here: > http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints > > On Tue, Feb 7, 2017 at 8:12 AM, Amit Sela wrote: > > Hi all, > > I was wondering if anyone ever used a broadcast variable within > an updateStateByKey op. ? Using it is straight-forward but I was wondering > how it'll work after resuming from checkpoint (using the rdd.context() > trick is not possible here) ? > > Thanks, > Amit > > >
Re: Fault tolerant broadcast in updateStateByKey
It's documented here: http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints On Tue, Feb 7, 2017 at 8:12 AM, Amit Selawrote: > Hi all, > > I was wondering if anyone ever used a broadcast variable within > an updateStateByKey op. ? Using it is straight-forward but I was wondering > how it'll work after resuming from checkpoint (using the rdd.context() > trick is not possible here) ? > > Thanks, > Amit >
Fault tolerant broadcast in updateStateByKey
Hi all, I was wondering if anyone ever used a broadcast variable within an updateStateByKey op. ? Using it is straight-forward but I was wondering how it'll work after resuming from checkpoint (using the rdd.context() trick is not possible here) ? Thanks, Amit