Re: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Sean Owen
You can persist the RDD in (2) right after it is created. It will not cause it to be persisted immediately, but rather the first time it is materialized. If you persist after (3) is calculated, then it will be re-calculated (and persisted) after (4) is calculated. On Tue, Jan 20, 2015 at 3:38 AM,

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Ashish
Thanks Sean ! On Tue, Jan 20, 2015 at 3:32 PM, Sean Owen so...@cloudera.com wrote: You can persist the RDD in (2) right after it is created. It will not cause it to be persisted immediately, but rather the first time it is materialized. If you persist after (3) is calculated, then it will be

RE: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Bob Tiernay
I found the following to be a good discussion of the same topic: http://apache-spark-user-list.1001560.n3.nabble.com/The-concurrent-model-of-spark-job-stage-task-td13083.html From: so...@cloudera.com Date: Tue, 20 Jan 2015 10:02:20 + Subject: Re: Does Spark automatically run different

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Kane Kim
: http://apache-spark-user-list.1001560.n3.nabble.com/The-concurrent-model-of-spark-job-stage-task-td13083.html From: so...@cloudera.com Date: Tue, 20 Jan 2015 10:02:20 + Subject: Re: Does Spark automatically run different stages concurrently when possible? To: paliwalash...@gmail.com CC

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Mark Hamstra
From: so...@cloudera.com Date: Tue, 20 Jan 2015 10:02:20 + Subject: Re: Does Spark automatically run different stages concurrently when possible? To: paliwalash...@gmail.com CC: davidkl...@hotmail.com; user@spark.apache.org You can persist the RDD in (2) right after it is created

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread Ashish
Sean, A related question. When to persist the RDD after step 2 or after Step 3 (nothing would happen before step 3 I assume)? On Mon, Jan 19, 2015 at 5:17 PM, Sean Owen so...@cloudera.com wrote: From the OP: (1) val lines = Import full dataset using sc.textFile (2) val ABonly = Filter out

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread davidkl
Hi Jon, I am looking for an answer for a similar question in the doc now, so far no clue. I would need to know what is spark behaviour in a situation like the example you provided, but taking into account also that there are multiple partitions/workers. I could imagine it's possible that

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread critikaled
+1, I too need to know. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-automatically-run-different-stages-concurrently-when-possible-tp21075p21233.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread Sean Owen
From the OP: (1) val lines = Import full dataset using sc.textFile (2) val ABonly = Filter out all rows from lines that are not of type A or B (3) val processA = Process only the A rows from ABonly (4) val processB = Process only the B rows from ABonly I assume that 3 and 4 are actions, or else

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread critikaled
Hi, john and david I tried this to run them concurrently List(RDD1,RDD2,.).par.foreach{ rdd= rdd.collect().foreach(println) } this was able to successfully register the task but the parallelism of the stages is limited it was able run 4 of them some time and only one of them some time

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread Sean Owen
Keep in mind that your executors will be able to run some fixed number of tasks in parallel, given your configuration. You should not necessarily expect that arbitrarily many RDDs and tasks would schedule simultaneously. On Mon, Jan 19, 2015 at 5:34 PM, critikaled isasmani@gmail.com wrote:

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-10 Thread Benyi Wang
You may try to change the schudlingMode to FAIR, the default is FIFO. Take a look at this page https://spark.apache.org/docs/1.1.0/job-scheduling.html#scheduling-within-an-application On Sat, Jan 10, 2015 at 10:24 AM, YaoPau jonrgr...@gmail.com wrote: I'm looking for ways to reduce the

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-10 Thread Stéphane Verlet
From your pseudo code, it would be sequential and done twice 1+2+3 then 1+2+4 If you do a .cache() in step 2 then you would have 1+2+3 , then 4 I ran several steps in parrallel from the same program but never using the same source RDD so I do not know the limitations there. I simply started