Re: Is storage resources counted during the scheduling

2016-04-11 Thread Jialin Liu
Thanks Ted, 
but that page seems to be scheduling policy, I have no idea of what resources 
are considered in the scheduling. 

And for scheduling, I’m wondering, in case of just one application, is there 
also a scheduling process?
otherwise, why I see some launching delay in the tasks. (well, this might be 
another question). Thanks. 

Best,
Jialin
> On Apr 11, 2016, at 3:18 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> See 
> https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
>  
> <https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application>
> 
> On Mon, Apr 11, 2016 at 3:15 PM, Jialin Liu <jaln...@lbl.gov 
> <mailto:jaln...@lbl.gov>> wrote:
> Hi Spark users/experts,
> 
> I’m wondering how does the Spark scheduler work?
> What kind of resources will be considered during the scheduling, does it 
> include the disk resources or I/O resources, e.g., number of IO ports.
> Is network resources considered in that?
> 
> My understanding is that only CPU is considered, right?
> 
> Best,
> Jialin
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 



Is storage resources counted during the scheduling

2016-04-11 Thread Jialin Liu
Hi Spark users/experts, 

I’m wondering how does the Spark scheduler work?
What kind of resources will be considered during the scheduling, does it 
include the disk resources or I/O resources, e.g., number of IO ports. 
Is network resources considered in that?

My understanding is that only CPU is considered, right?

Best,
Jialin
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark launching range is 10 mins

2016-03-20 Thread Jialin Liu
Hi,
I have set the partitions as 6000, and requested 100 nodes, with 32
cores each node,
and the number of executors is 32 per node

spark-submit --master $SPARKURL --executor-cores 32 --driver-memory
20G --executor-memory 80G single-file-test.py


And I'm reading a 2.2 TB, the code, just has simple two steps,
rdd=sc.read
rdd.count
Then I checked the log file, and history server, it shows that the
count stage has a really large tasks launching range, e.g.,

16/03/19 22:40:17
16/03/19 22:30:56

which is about 10 minutes,
Has anyone experienced this before?
Could you please let me know the reason and internal of Spark relating
to this issue,
and how to resolve it? Thanks much.

Best,
Jialin

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org