Is there a way I can queue several stages at once?

On Mon, Feb 17, 2014 at 12:08 PM, Mark Hamstra <[email protected]>wrote:

> With so little information about what your code is actually doing, what
> you have shared looks likely to be an anti-pattern to me.  Doing many
> collect actions is something to be avoided if at all possible, since this
> forces a lot of network communication to materialize the results back
> within the driver process, and network communication severely constrains
> performance.
>
>
> On Mon, Feb 17, 2014 at 9:51 AM, David Thomas <[email protected]> wrote:
>
>> I have a spark application that has the below structure:
>>
>> while(...) { // 10-100k iterations
>>   rdd.map(...).collect
>> }
>>
>> Basically, I have an RDD and I need to query it multiple times.
>>
>> Now when I run this, for each iteration, Spark creates a new stage (each
>> stage having multiple tasks). What I find is that the stage execution takes
>> about 1 second and most time is spend in scheduling the tasks. Since a
>> stage is not submitted until the previous stage is completed, this loop
>> takes a long time to complete. So my question is, is there a way to
>> interleave multiple stage executions? Any other suggestions to improve the
>> above query pattern?
>>
>
>

Reply via email to