So essentially the driver/client program needs to explicitly have two threads to ensure concurrency?
What happens when the program is sequential... I.e. I execute function A and then function B. Does this mean that each RDD first goes through function A, and them stream X is persisted, but processed in function B only after the RDD has been processed by A? Thanks Nipun On Sat, Oct 24, 2015 at 5:32 PM Andy Dang <nam...@gmail.com> wrote: > If you execute the collect step (foreach in 1, possibly reduce in 2) in > two threads in the driver then both of them will be executed in parallel. > Whichever gets submitted to Spark first gets executed first - you can use a > semaphore if you need to ensure the ordering of execution, though I would > assume that the ordering wouldn't matter. > > ------- > Regards, > Andy > > On Sat, Oct 24, 2015 at 10:08 PM, Nipun Arora <nipunarora2...@gmail.com> > wrote: > >> I wanted to understand something about the internals of spark streaming >> executions. >> >> If I have a stream X, and in my program I send stream X to function A and >> function B: >> >> 1. In function A, I do a few transform/filter operations etc. on X->Y->Z >> to create stream Z. Now I do a forEach Operation on Z and print the output >> to a file. >> >> 2. Then in function B, I reduce stream X -> X2 (say min value of each >> RDD), and print the output to file >> >> Are both functions being executed for each RDD in parallel? How does it >> work? >> >> Thanks >> Nipun >> >> >