Interleaving stages

David Thomas Mon, 17 Feb 2014 09:52:06 -0800

I have a spark application that has the below structure:

while(...) { // 10-100k iterations
  rdd.map(...).collect
}


Basically, I have an RDD and I need to query it multiple times.

Now when I run this, for each iteration, Spark creates a new stage (each
stage having multiple tasks). What I find is that the stage execution takes
about 1 second and most time is spend in scheduling the tasks. Since a
stage is not submitted until the previous stage is completed, this loop
takes a long time to complete. So my question is, is there a way to
interleave multiple stage executions? Any other suggestions to improve the
above query pattern?

Interleaving stages

Reply via email to