Is storage resources counted during the scheduling
Hi Spark users/experts, I’m wondering how does the Spark scheduler work? What kind of resources will be considered during the scheduling, does it include the disk resources or I/O resources, e.g., number of IO ports. Is network resources considered in that? My understanding is that only CPU is considered, right? Best, Jialin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is storage resources counted during the scheduling
Thanks Ted, but that page seems to be scheduling policy, I have no idea of what resources are considered in the scheduling. And for scheduling, I’m wondering, in case of just one application, is there also a scheduling process? otherwise, why I see some launching delay in the tasks. (well, this might be another question). Thanks. Best, Jialin > On Apr 11, 2016, at 3:18 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > See > https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application > > <https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application> > > On Mon, Apr 11, 2016 at 3:15 PM, Jialin Liu <jaln...@lbl.gov > <mailto:jaln...@lbl.gov>> wrote: > Hi Spark users/experts, > > I’m wondering how does the Spark scheduler work? > What kind of resources will be considered during the scheduling, does it > include the disk resources or I/O resources, e.g., number of IO ports. > Is network resources considered in that? > > My understanding is that only CPU is considered, right? > > Best, > Jialin > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >
Re: spark launching range is 10 mins
Hi, I have set the partitions as 6000, and requested 100 nodes, with 32 cores each node, and the number of executors is 32 per node spark-submit --master $SPARKURL --executor-cores 32 --driver-memory 20G --executor-memory 80G single-file-test.py And I'm reading a 2.2 TB, the code, just has simple two steps, rdd=sc.read rdd.count Then I checked the log file, and history server, it shows that the count stage has a really large tasks launching range, e.g., 16/03/19 22:40:17 16/03/19 22:30:56 which is about 10 minutes, Has anyone experienced this before? Could you please let me know the reason and internal of Spark relating to this issue, and how to resolve it? Thanks much. Best, Jialin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org