[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968648#comment-14968648 ] Saisai Shao commented on SPARK-11181: - OK, get it. > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > --- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN >Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. >Reporter: prakhar jauhari > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968655#comment-14968655 ] prakhar jauhari commented on SPARK-11181: - Ya. i'll do it for branch 1.3 :) > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > --- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN >Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. >Reporter: prakhar jauhari > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968649#comment-14968649 ] Saisai Shao commented on SPARK-11181: - I think you'd better backport to branch 1.3, not for tag 1.3.1 :). > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > --- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN >Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. >Reporter: prakhar jauhari > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968645#comment-14968645 ] prakhar jauhari commented on SPARK-11181: - Hey [~saisai_shao], this jira is for the issue on spark-1.3.1 branch and the issue still exists on the branch. We are currently using spark -1.3.1 in production so moving to 1.5 is not a option right now. I am currently working to back-port the fix in SPARK-8119 to spark-1.3.1. I'll create a PR for the same (just in case some one needs the fix on spark -1.3.1). Will close this jira after that. > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > --- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN >Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. >Reporter: prakhar jauhari > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968453#comment-14968453 ] Saisai Shao commented on SPARK-11181: - Hi [~prakhar088], As discussed in the mailist (http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-driver-reducing-total-executors-count-even-when-Dynamic-Allocation-is-disabled-td14679.html), this problem is already solved in SPARK-8119, would you please close this jira. > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > --- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN >Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. >Reporter: prakhar jauhari > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.
[ https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962961#comment-14962961 ] prakhar jauhari commented on SPARK-11181: - On analysing the code (Spark 1.3.1): When my DN goes unreachable: Spark core's HeartbeatReceiver invokes _expireDeadHosts()_: which checks if Dynamic Allocation is supported and then invokes _"sc.killExecutor()"_ {quote} if (sc.supportDynamicAllocation) \{ sc.killExecutor(executorId) } {quote} Surprisingly _supportDynamicAllocation_ in _sparkContext.scala_ is defined to result "True" if _dynamicAllocationTesting_ flag is enabled or spark is running over _yarn_ {quote} private\[spark\] def supportDynamicAllocation = master.contains("yarn") || dynamicAllocationTesting {quote} _"sc.killExecutor()"_ matches it to configured _"schedulerBackend"_ (CoarseGrainedSchedulerBackend in this case) and invokes _"killExecutors(executorIds)"_ CoarseGrainedSchedulerBackend calculates a _"newTotal"_ for the total number of executors required, and sends a update to application master by invoking _"doRequestTotalExecutors(newTotal)"_ CoarseGrainedSchedulerBackend then invokes a _"doKillExecutors(filteredExecutorIds)"_ for the lost executors. Thus reducing the total number of executors in a host intermittently unreachable scenario. > Spark Yarn : Spark reducing total executors count even when Dynamic > Allocation is disabled. > --- > > Key: SPARK-11181 > URL: https://issues.apache.org/jira/browse/SPARK-11181 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, YARN >Affects Versions: 1.3.1 > Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. > All servers in cluster running Linux version 2.6.32. > Job in yarn-client mode. >Reporter: prakhar jauhari > Fix For: 1.3.2 > > > Spark driver reduces total executors count even when Dynamic Allocation is > not enabled. > To reproduce this: > 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores. > 2. When the application launches and gets it required executors, One of the > DN's losses connectivity and is timed out. > 3. Spark issues a killExecutor for the executor on the DN which was timed > out. > 4. Even with dynamic allocation off, spark's scheduler reduces the > "targetNumExecutors". > 5. Thus the job runs with reduced executor count. > Note : The severity of the issue increases : If some of the DN that were > running my job's executors lose connectivity intermittently, spark scheduler > reduces "targetNumExecutors", thus not asking for new executors on any other > nodes, causing the job to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org