Yes, that's correct. Spark Streaming will generate several jobs in each
batch duration depending on how many actions you have.

On Sun, Sep 6, 2015 at 10:31 PM, Priya Ch <learnings.chitt...@gmail.com>
wrote:

> Hi All,
>
>  Thanks for the info. I have one more doubt -
> When writing a streaming application, I specify batch-interval. Lets say
> if the interval is 1sec, for every 1sec batch, rdd is formed and launches a
> job. If there are >1 action specified on an rdd....how many jobs would it
> launch???
>
> I mean every 1sec batch launches a job and suppose there are two actions
> then internally 2 more jobs launched ?
>
> On Sun, Sep 6, 2015 at 1:15 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> "... Here in job2, when calculating rdd.first..."
>>
>> If you mean if rdd2.first, then it uses rdd2 already computed by
>> rdd2.count, because it is already available. If some partitions are not
>> available due to GC, then only those partitions are recomputed.
>>
>> On Sun, Sep 6, 2015 at 5:11 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>> If you want to reuse the data, you need to call rdd2.cache
>>>
>>>
>>>
>>> On Sun, Sep 6, 2015 at 2:33 PM, Priya Ch <learnings.chitt...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>>  In Spark, each action results in launching a job. Lets say my spark
>>>> app looks as-
>>>>
>>>> val baseRDD =sc.parallelize(Array(1,2,3,4,5),2)
>>>> val rdd1 = baseRdd.map(x => x+2)
>>>> val rdd2 = rdd1.filter(x => x%2 ==0)
>>>> val count = rdd2.count
>>>> val firstElement = rdd2.first
>>>>
>>>> println("Count is"+count)
>>>> println("First is"+firstElement)
>>>>
>>>> Now, rdd2.count launches  job0 with 1 task and rdd2.first launches job1
>>>> with 1 task. Here in job2, when calculating rdd.first, is the entire
>>>> lineage computed again or else as job0 already computes rdd2, is it reused
>>>> ???
>>>>
>>>> Thanks,
>>>> Padma Ch
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Reply via email to