Re: Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread Steve Reinhardt
From: Sean Owen >Maybe you are looking for updateStateByKey? >http://spark.apache.org/docs/latest/streaming-programming-guide.html#trans >formations-on-dstreams > >You can use broadcast to efficiently send info to all the workers, if >you have some other data that's immutable, like in a local fil

Re: Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread Sean Owen
On Tue, Nov 4, 2014 at 8:02 PM, spr wrote: > To state this another way, it seems like there's no way to straddle the > streaming world and the non-streaming world; to get input from both a > (vanilla, Linux) file and a stream. Is that true? > > If so, it seems I need to turn my (vanilla file) da

Re: Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread Steve Reinhardt
-Original Message- From: Sean Owen >On Tue, Nov 4, 2014 at 8:02 PM, spr wrote: >> To state this another way, it seems like there's no way to straddle the >> streaming world and the non-streaming world; to get input from both a >> (vanilla, Linux) file and a stream. Is that true? >> >>

Re: Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread Sean Owen
Maybe you are looking for updateStateByKey? http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams You can use broadcast to efficiently send info to all the workers, if you have some other data that's immutable, like in a local file, that needs to be distr

Re: Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread spr
ach batch interval > on an RDD. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-which-code-is-not-executed-at-every-batch-interval-tp18071p18087.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread Sean Owen
ot repeatedly. So, two questions: > > 1) Is it correct that Spark code does not get executed per batch interval? > > 2) Is there a definition somewhere of what code will and will not get > executed per batch interval? (I didn't find it in either the Spark or Spark > Streaming pr

Streaming: which code is (not) executed at every batch interval?

2014-11-04 Thread spr
amming guides.) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-which-code-is-not-executed-at-every-batch-interval-tp18071.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --