[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers
[ https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618313#comment-14618313 ] Sean Owen commented on SPARK-8881: -- Yes, the punchline is that each worker is asked for 48/4 = 12 cores, but 12 is less than the 16 cores each executor needs, so for every worker, 0 executors are allocated. Grabbing cores in chunks of 16 in this case works, as does only considering 3 workers to allocate 3 executors, since the problem is that it never makes sense to try allocating N executors over MN workers. Scheduling fails if num_executors num_workers --- Key: SPARK-8881 URL: https://issues.apache.org/jira/browse/SPARK-8881 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.4.0, 1.5.0 Reporter: Nishkam Ravi Current scheduling algorithm (in Master.scala) has two issues: 1. cores are allocated one at a time instead of spark.executor.cores at a time 2. when spark.cores.max/spark.executor.cores num_workers, executors are not launched and the app hangs (due to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers
[ https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618362#comment-14618362 ] Nishkam Ravi commented on SPARK-8881: - There's more to it. Consider the following: three workers with num_cores (8, 8, 2). spark.cores.maximum = 12, spark.executor.cores = 4. Core allocation would be (5, 5, 2). num_executors = num_workers and nothing gets launched! Problem isn't that num_workers num_executors (that's just a place it manifests in practice). Problem is we are allocating one core at a time and ignoring spark.executor.cores during allocation. Scheduling fails if num_executors num_workers --- Key: SPARK-8881 URL: https://issues.apache.org/jira/browse/SPARK-8881 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.4.0, 1.5.0 Reporter: Nishkam Ravi Current scheduling algorithm (in Master.scala) has two issues: 1. cores are allocated one at a time instead of spark.executor.cores at a time 2. when spark.cores.max/spark.executor.cores num_workers, executors are not launched and the app hangs (due to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers
[ https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618369#comment-14618369 ] Nishkam Ravi commented on SPARK-8881: - This isn't the best example because the third worker will get screened out. Consider the following instead: three workers with num_cores (8, 8, 3). spark.cores.maximum=8, spark.executor.cores=2. Core allocation would be (3, 3, 2). 3 executors launched instead of 4. You get the drift. Scheduling fails if num_executors num_workers --- Key: SPARK-8881 URL: https://issues.apache.org/jira/browse/SPARK-8881 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.4.0, 1.5.0 Reporter: Nishkam Ravi Current scheduling algorithm (in Master.scala) has two issues: 1. cores are allocated one at a time instead of spark.executor.cores at a time 2. when spark.cores.max/spark.executor.cores num_workers, executors are not launched and the app hangs (due to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers
[ https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617732#comment-14617732 ] Nishkam Ravi commented on SPARK-8881: - No that's not the problem. You have 4 workers with 16 cores each. You request 3 executors (spark.cores.max = 48, spark.executor.cores = 16). App hangs. Because the following condition is never satisfied: while (coresLeft = coresPerExecutor worker.memoryFree = memoryPerExecutor). You will have to stare at the scheduling algorithm for a good 5 minutes to understand what's happening. Try to simulate the case stated above. Scheduling fails if num_executors num_workers --- Key: SPARK-8881 URL: https://issues.apache.org/jira/browse/SPARK-8881 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.4.0, 1.5.0 Reporter: Nishkam Ravi Current scheduling algorithm (in Master.scala) has two issues: 1. cores are allocated one at a time instead of spark.executor.cores at a time 2. when spark.cores.max/spark.executor.cores num_workers, executors are not launched and the app hangs (due to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers
[ https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617712#comment-14617712 ] Sean Owen commented on SPARK-8881: -- I think this needs better explanation. So you are asking for 8 cores per executor and all workers have 7 cores available, and the result is that no executors are allocated, and the app is still waiting for executors. That seems like correct behavior, right? Cores aren't really allocated one at a time; in spreadOut mode the target allocation amount is spread around, but executors (only) launch with the # of cores desired. Grabbing 8 cores at that phase in each pass wouldn't help, since none have 8 cores available. What does it have to do with the number of workers? Scheduling fails if num_executors num_workers --- Key: SPARK-8881 URL: https://issues.apache.org/jira/browse/SPARK-8881 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.4.0, 1.5.0 Reporter: Nishkam Ravi Current scheduling algorithm (in Master.scala) has two issues: 1. cores are allocated one at a time instead of spark.executor.cores at a time 2. when spark.cores.max/spark.executor.cores num_workers, executors are not launched and the app hangs (due to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8881) Scheduling fails if num_executors num_workers
[ https://issues.apache.org/jira/browse/SPARK-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617714#comment-14617714 ] Apache Spark commented on SPARK-8881: - User 'nishkamravi2' has created a pull request for this issue: https://github.com/apache/spark/pull/7274 Scheduling fails if num_executors num_workers --- Key: SPARK-8881 URL: https://issues.apache.org/jira/browse/SPARK-8881 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.4.0, 1.5.0 Reporter: Nishkam Ravi Current scheduling algorithm (in Master.scala) has two issues: 1. cores are allocated one at a time instead of spark.executor.cores at a time 2. when spark.cores.max/spark.executor.cores num_workers, executors are not launched and the app hangs (due to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org