[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.

2015-10-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968648#comment-14968648
 ] 

Saisai Shao commented on SPARK-11181:
-

OK, get it.

> Spark Yarn : Spark reducing total executors count even when Dynamic 
> Allocation is disabled.
> ---
>
> Key: SPARK-11181
> URL: https://issues.apache.org/jira/browse/SPARK-11181
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, YARN
>Affects Versions: 1.3.1
> Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>Reporter: prakhar jauhari
>
> Spark driver reduces total executors count even when Dynamic Allocation is 
> not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the 
> DN's losses connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed 
> out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the 
> "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were 
> running my job's executors lose connectivity intermittently, spark scheduler 
> reduces "targetNumExecutors", thus not asking for new executors on any other 
> nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.

2015-10-22 Thread prakhar jauhari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968655#comment-14968655
 ] 

prakhar jauhari commented on SPARK-11181:
-

Ya. i'll do it for branch 1.3 :)

> Spark Yarn : Spark reducing total executors count even when Dynamic 
> Allocation is disabled.
> ---
>
> Key: SPARK-11181
> URL: https://issues.apache.org/jira/browse/SPARK-11181
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, YARN
>Affects Versions: 1.3.1
> Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>Reporter: prakhar jauhari
>
> Spark driver reduces total executors count even when Dynamic Allocation is 
> not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the 
> DN's losses connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed 
> out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the 
> "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were 
> running my job's executors lose connectivity intermittently, spark scheduler 
> reduces "targetNumExecutors", thus not asking for new executors on any other 
> nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.

2015-10-22 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968649#comment-14968649
 ] 

Saisai Shao commented on SPARK-11181:
-

I think you'd better backport to branch 1.3, not for tag 1.3.1 :).

> Spark Yarn : Spark reducing total executors count even when Dynamic 
> Allocation is disabled.
> ---
>
> Key: SPARK-11181
> URL: https://issues.apache.org/jira/browse/SPARK-11181
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, YARN
>Affects Versions: 1.3.1
> Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>Reporter: prakhar jauhari
>
> Spark driver reduces total executors count even when Dynamic Allocation is 
> not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the 
> DN's losses connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed 
> out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the 
> "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were 
> running my job's executors lose connectivity intermittently, spark scheduler 
> reduces "targetNumExecutors", thus not asking for new executors on any other 
> nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.

2015-10-22 Thread prakhar jauhari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968645#comment-14968645
 ] 

prakhar jauhari commented on SPARK-11181:
-

Hey [~saisai_shao], this jira is for the issue on spark-1.3.1 branch and the 
issue still exists on the branch.
We are currently using spark -1.3.1 in production so moving to 1.5 is not a 
option right now. I am currently working to back-port the fix  in SPARK-8119 to 
spark-1.3.1. I'll create a PR for the same (just in case some one needs the fix 
on spark -1.3.1).
Will close this jira after that. 

> Spark Yarn : Spark reducing total executors count even when Dynamic 
> Allocation is disabled.
> ---
>
> Key: SPARK-11181
> URL: https://issues.apache.org/jira/browse/SPARK-11181
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, YARN
>Affects Versions: 1.3.1
> Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>Reporter: prakhar jauhari
>
> Spark driver reduces total executors count even when Dynamic Allocation is 
> not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the 
> DN's losses connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed 
> out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the 
> "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were 
> running my job's executors lose connectivity intermittently, spark scheduler 
> reduces "targetNumExecutors", thus not asking for new executors on any other 
> nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.

2015-10-21 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968453#comment-14968453
 ] 

Saisai Shao commented on SPARK-11181:
-

Hi [~prakhar088], As discussed in the mailist 
(http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-driver-reducing-total-executors-count-even-when-Dynamic-Allocation-is-disabled-td14679.html),
 this problem is already solved in SPARK-8119, would you please close this jira.

> Spark Yarn : Spark reducing total executors count even when Dynamic 
> Allocation is disabled.
> ---
>
> Key: SPARK-11181
> URL: https://issues.apache.org/jira/browse/SPARK-11181
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, YARN
>Affects Versions: 1.3.1
> Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>Reporter: prakhar jauhari
>
> Spark driver reduces total executors count even when Dynamic Allocation is 
> not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the 
> DN's losses connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed 
> out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the 
> "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were 
> running my job's executors lose connectivity intermittently, spark scheduler 
> reduces "targetNumExecutors", thus not asking for new executors on any other 
> nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11181) Spark Yarn : Spark reducing total executors count even when Dynamic Allocation is disabled.

2015-10-19 Thread prakhar jauhari (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962961#comment-14962961
 ] 

prakhar jauhari commented on SPARK-11181:
-

On analysing the code (Spark 1.3.1): 
When my DN goes unreachable: 

Spark core's HeartbeatReceiver invokes _expireDeadHosts()_: which checks if 
Dynamic Allocation is supported and then invokes _"sc.killExecutor()"_
{quote}
if (sc.supportDynamicAllocation) \{
sc.killExecutor(executorId)
}
{quote} 

Surprisingly _supportDynamicAllocation_ in _sparkContext.scala_ is defined to 
result "True" if _dynamicAllocationTesting_ flag is enabled or spark is running 
over _yarn_  
{quote}
private\[spark\] def supportDynamicAllocation = 
master.contains("yarn") || dynamicAllocationTesting
{quote} 

_"sc.killExecutor()"_ matches it to configured _"schedulerBackend"_ 
(CoarseGrainedSchedulerBackend in this case) and invokes 
_"killExecutors(executorIds)"_

CoarseGrainedSchedulerBackend calculates a _"newTotal"_ for the total number of 
executors required, and sends a update to application master by invoking 
_"doRequestTotalExecutors(newTotal)"_
 
CoarseGrainedSchedulerBackend then invokes a 
_"doKillExecutors(filteredExecutorIds)"_ for the lost executors. 

Thus reducing the total number of executors in a host intermittently 
unreachable scenario. 

> Spark Yarn : Spark reducing total executors count even when Dynamic 
> Allocation is disabled.
> ---
>
> Key: SPARK-11181
> URL: https://issues.apache.org/jira/browse/SPARK-11181
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, YARN
>Affects Versions: 1.3.1
> Environment: Spark-1.3.1 on hadoop-yarn-2.4.0 cluster. 
> All servers in cluster running Linux version 2.6.32. 
> Job in yarn-client mode.
>Reporter: prakhar jauhari
> Fix For: 1.3.2
>
>
> Spark driver reduces total executors count even when Dynamic Allocation is 
> not enabled.
> To reproduce this:
> 1. A 2 node yarn setup : each DN has ~ 20GB mem and 4 cores.
> 2. When the application launches and gets it required executors, One of the 
> DN's losses connectivity and is timed out.
> 3. Spark issues a killExecutor for the executor on the DN which was timed 
> out. 
> 4. Even with dynamic allocation off, spark's scheduler reduces the 
> "targetNumExecutors".
> 5. Thus the job runs with reduced executor count.
> Note : The severity of the issue increases : If some of the DN that were 
> running my job's executors lose connectivity intermittently, spark scheduler 
> reduces "targetNumExecutors", thus not asking for new executors on any other 
> nodes, causing the job to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org