So essentially the driver/client program needs to explicitly have two
threads to ensure concurrency?

What happens when the program is sequential... I.e. I execute function A
and then function B. Does this mean that each RDD first goes through
function A, and them stream X is persisted, but processed in function B
only after the RDD has been processed by A?

Thanks
Nipun
On Sat, Oct 24, 2015 at 5:32 PM Andy Dang <nam...@gmail.com> wrote:

> If you execute the collect step (foreach in 1, possibly reduce in 2) in
> two threads in the driver then both of them will be executed in parallel.
> Whichever gets submitted to Spark first gets executed first - you can use a
> semaphore if you need to ensure the ordering of execution, though I would
> assume that the ordering wouldn't matter.
>
> -------
> Regards,
> Andy
>
> On Sat, Oct 24, 2015 at 10:08 PM, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
>
>> I wanted to understand something about the internals of spark streaming
>> executions.
>>
>> If I have a stream X, and in my program I send stream X to function A and
>> function B:
>>
>> 1. In function A, I do a few transform/filter operations etc. on X->Y->Z
>> to create stream Z. Now I do a forEach Operation on Z and print the output
>> to a file.
>>
>> 2. Then in function B, I reduce stream X -> X2 (say min value of each
>> RDD), and print the output to file
>>
>> Are both functions being executed for each RDD in parallel? How does it
>> work?
>>
>> Thanks
>> Nipun
>>
>>
>

Reply via email to