Sean, A related question. When to persist the RDD after step 2 or after Step 3 (nothing would happen before step 3 I assume)?
On Mon, Jan 19, 2015 at 5:17 PM, Sean Owen <so...@cloudera.com> wrote: > From the OP: > > (1) val lines = Import full dataset using sc.textFile > (2) val ABonly = Filter out all rows from "lines" that are not of type A or B > (3) val processA = Process only the A rows from ABonly > (4) val processB = Process only the B rows from ABonly > > I assume that 3 and 4 are actions, or else nothing happens here at all. > > When 3 is invoked, it will compute 1, then 2, then 3. 4 will happen > after 3, and may even cause 1 and 2 to happen again if nothing is > persisted. > > You can invoke 3 and 4 in parallel on the driver if you like. That's > fine. But actions are blocking in the driver. > > > > On Mon, Jan 19, 2015 at 8:21 AM, davidkl <davidkl...@hotmail.com> wrote: >> Hi Jon, I am looking for an answer for a similar question in the doc now, so >> far no clue. >> >> I would need to know what is spark behaviour in a situation like the example >> you provided, but taking into account also that there are multiple >> partitions/workers. >> >> I could imagine it's possible that different spark workers are not >> synchronized in terms of waiting for each other to progress to the next >> step/stage for the partitions of data they get assigned, while I believe in >> streaming they would wait for the current batch to complete before they >> start working on a new one. >> >> In the code I am working on, I need to make sure a particular step is >> completed (in all workers, for all partitions) before next transformation is >> applied. >> >> Would be great if someone could clarify or point to these issues in the doc! >> :-) >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-automatically-run-different-stages-concurrently-when-possible-tp21075p21227.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org