[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738995#comment-14738995 ] Dan Shechter commented on SPARK-8119: - Why was the target version moved to 1.5.1? Wasn't this already marked as fixed for 1.5.0? Is now pushed back? > HeartbeatReceiver should not adjust application executor resources > -- > > Key: SPARK-8119 > URL: https://issues.apache.org/jira/browse/SPARK-8119 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: SaintBacchus >Assignee: Andrew Or >Priority: Critical > Labels: backport-needed > Fix For: 1.5.0 > > > DynamicAllocation will set the total executor to a little number when it > wants to kill some executors. > But in no-DynamicAllocation scenario, Spark will also set the total executor. > So it will cause such problem: sometimes an executor fails down, there is no > more executor which will be pull up by spark. > === EDIT by andrewor14 === > The issue is that the AM forgets about the original number of executors it > wants after calling sc.killExecutor. Even if dynamic allocation is not > enabled, this is still possible because of heartbeat timeouts. > I think the problem is that sc.killExecutor is used incorrectly in > HeartbeatReceiver. The intention of the method is to permanently adjust the > number of executors the application will get. In HeartbeatReceiver, however, > this is used as a best-effort mechanism to ensure that the timed out executor > is dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8119) HeartbeatReceiver should not adjust application executor resources
[ https://issues.apache.org/jira/browse/SPARK-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693113#comment-14693113 ] Dan Shechter commented on SPARK-8119: - Does this mean it's already fixed for the upcoming 1.5.0? The only outstanding issue is for the 1.4.2 backport? HeartbeatReceiver should not adjust application executor resources -- Key: SPARK-8119 URL: https://issues.apache.org/jira/browse/SPARK-8119 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: SaintBacchus Assignee: Andrew Or Priority: Critical Labels: backport-needed Fix For: 1.5.0 DynamicAllocation will set the total executor to a little number when it wants to kill some executors. But in no-DynamicAllocation scenario, Spark will also set the total executor. So it will cause such problem: sometimes an executor fails down, there is no more executor which will be pull up by spark. === EDIT by andrewor14 === The issue is that the AM forgets about the original number of executors it wants after calling sc.killExecutor. Even if dynamic allocation is not enabled, this is still possible because of heartbeat timeouts. I think the problem is that sc.killExecutor is used incorrectly in HeartbeatReceiver. The intention of the method is to permanently adjust the number of executors the application will get. In HeartbeatReceiver, however, this is used as a best-effort mechanism to ensure that the timed out executor is dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org