Yes, that's correct. Spark Streaming will generate several jobs in each batch duration depending on how many actions you have.
On Sun, Sep 6, 2015 at 10:31 PM, Priya Ch <learnings.chitt...@gmail.com> wrote: > Hi All, > > Thanks for the info. I have one more doubt - > When writing a streaming application, I specify batch-interval. Lets say > if the interval is 1sec, for every 1sec batch, rdd is formed and launches a > job. If there are >1 action specified on an rdd....how many jobs would it > launch??? > > I mean every 1sec batch launches a job and suppose there are two actions > then internally 2 more jobs launched ? > > On Sun, Sep 6, 2015 at 1:15 PM, ayan guha <guha.a...@gmail.com> wrote: > >> Hi >> >> "... Here in job2, when calculating rdd.first..." >> >> If you mean if rdd2.first, then it uses rdd2 already computed by >> rdd2.count, because it is already available. If some partitions are not >> available due to GC, then only those partitions are recomputed. >> >> On Sun, Sep 6, 2015 at 5:11 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> If you want to reuse the data, you need to call rdd2.cache >>> >>> >>> >>> On Sun, Sep 6, 2015 at 2:33 PM, Priya Ch <learnings.chitt...@gmail.com> >>> wrote: >>> >>>> Hi All, >>>> >>>> In Spark, each action results in launching a job. Lets say my spark >>>> app looks as- >>>> >>>> val baseRDD =sc.parallelize(Array(1,2,3,4,5),2) >>>> val rdd1 = baseRdd.map(x => x+2) >>>> val rdd2 = rdd1.filter(x => x%2 ==0) >>>> val count = rdd2.count >>>> val firstElement = rdd2.first >>>> >>>> println("Count is"+count) >>>> println("First is"+firstElement) >>>> >>>> Now, rdd2.count launches job0 with 1 task and rdd2.first launches job1 >>>> with 1 task. Here in job2, when calculating rdd.first, is the entire >>>> lineage computed again or else as job0 already computes rdd2, is it reused >>>> ??? >>>> >>>> Thanks, >>>> Padma Ch >>>> >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >