[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers

2015-07-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618313#comment-14618313
 ] 

Sean Owen commented on SPARK-8881:
--

Yes, the punchline is that each worker is asked for 48/4 = 12 cores, but 12 is 
less than the 16 cores each executor needs, so for every worker, 0 executors 
are allocated. Grabbing cores in chunks of 16 in this case works, as does only 
considering 3 workers to allocate 3 executors, since the problem is that it 
never makes sense to try allocating N executors over MN workers.

 Scheduling fails if num_executors  num_workers
 ---

 Key: SPARK-8881
 URL: https://issues.apache.org/jira/browse/SPARK-8881
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.4.0, 1.5.0
Reporter: Nishkam Ravi

 Current scheduling algorithm (in Master.scala) has two issues:
 1. cores are allocated one at a time instead of spark.executor.cores at a time
 2. when spark.cores.max/spark.executor.cores  num_workers, executors are not 
 launched and the app hangs (due to 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers

2015-07-08 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618362#comment-14618362
 ] 

Nishkam Ravi commented on SPARK-8881:
-

There's more to it. Consider the following: three workers with num_cores (8, 8, 
2). spark.cores.maximum = 12, spark.executor.cores = 4. Core allocation would 
be (5, 5, 2). num_executors = num_workers and nothing gets launched! 

Problem isn't that num_workers  num_executors (that's just a place it 
manifests in practice). Problem is we are allocating one core at a time and 
ignoring spark.executor.cores during allocation.

 Scheduling fails if num_executors  num_workers
 ---

 Key: SPARK-8881
 URL: https://issues.apache.org/jira/browse/SPARK-8881
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.4.0, 1.5.0
Reporter: Nishkam Ravi

 Current scheduling algorithm (in Master.scala) has two issues:
 1. cores are allocated one at a time instead of spark.executor.cores at a time
 2. when spark.cores.max/spark.executor.cores  num_workers, executors are not 
 launched and the app hangs (due to 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers

2015-07-08 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618369#comment-14618369
 ] 

Nishkam Ravi commented on SPARK-8881:
-

This isn't the best example because the third worker will get screened out. 
Consider the following instead: three workers with num_cores (8, 8, 3). 
spark.cores.maximum=8, spark.executor.cores=2. Core allocation would be (3, 3, 
2). 3 executors launched instead of 4. You get the drift.

 Scheduling fails if num_executors  num_workers
 ---

 Key: SPARK-8881
 URL: https://issues.apache.org/jira/browse/SPARK-8881
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.4.0, 1.5.0
Reporter: Nishkam Ravi

 Current scheduling algorithm (in Master.scala) has two issues:
 1. cores are allocated one at a time instead of spark.executor.cores at a time
 2. when spark.cores.max/spark.executor.cores  num_workers, executors are not 
 launched and the app hangs (due to 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers

2015-07-07 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617732#comment-14617732
 ] 

Nishkam Ravi commented on SPARK-8881:
-

No that's not the problem.

You have 4 workers with 16 cores each. You request 3 executors (spark.cores.max 
= 48, spark.executor.cores = 16). App hangs. Because the following condition is 
never satisfied: while (coresLeft = coresPerExecutor  worker.memoryFree = 
memoryPerExecutor). You will have to stare at the scheduling algorithm for a 
good 5 minutes to understand what's happening. Try to simulate the case stated 
above. 

 Scheduling fails if num_executors  num_workers
 ---

 Key: SPARK-8881
 URL: https://issues.apache.org/jira/browse/SPARK-8881
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.4.0, 1.5.0
Reporter: Nishkam Ravi

 Current scheduling algorithm (in Master.scala) has two issues:
 1. cores are allocated one at a time instead of spark.executor.cores at a time
 2. when spark.cores.max/spark.executor.cores  num_workers, executors are not 
 launched and the app hangs (due to 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers

2015-07-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617712#comment-14617712
 ] 

Sean Owen commented on SPARK-8881:
--

I think this needs better explanation. So you are asking for 8 cores per 
executor and all workers have 7 cores available, and the result is that no 
executors are allocated, and the app is still waiting for executors. That seems 
like correct behavior, right?

Cores aren't really allocated one at a time; in spreadOut mode the target 
allocation amount is spread around, but executors (only) launch with the # of 
cores desired. Grabbing 8 cores at that phase in each pass wouldn't help, since 
none have 8 cores available. 

What does it have to do with the number of workers?

 Scheduling fails if num_executors  num_workers
 ---

 Key: SPARK-8881
 URL: https://issues.apache.org/jira/browse/SPARK-8881
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.4.0, 1.5.0
Reporter: Nishkam Ravi

 Current scheduling algorithm (in Master.scala) has two issues:
 1. cores are allocated one at a time instead of spark.executor.cores at a time
 2. when spark.cores.max/spark.executor.cores  num_workers, executors are not 
 launched and the app hangs (due to 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers

2015-07-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617714#comment-14617714
 ] 

Apache Spark commented on SPARK-8881:
-

User 'nishkamravi2' has created a pull request for this issue:
https://github.com/apache/spark/pull/7274

 Scheduling fails if num_executors  num_workers
 ---

 Key: SPARK-8881
 URL: https://issues.apache.org/jira/browse/SPARK-8881
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.4.0, 1.5.0
Reporter: Nishkam Ravi

 Current scheduling algorithm (in Master.scala) has two issues:
 1. cores are allocated one at a time instead of spark.executor.cores at a time
 2. when spark.cores.max/spark.executor.cores  num_workers, executors are not 
 launched and the app hangs (due to 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org