[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169448#comment-15169448 ] Siddharth Seth commented on TEZ-3128: - +1. Committing. Thanks [~ozawa]. [~jlowe] - should this be pulled into 0.7 as well ? > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, > TEZ-3128.003.patch, TEZ-3128.004.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169260#comment-15169260 ] TezQA commented on TEZ-3128: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12790142/TEZ-3128.004.patch against master revision 15d7339. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 31 javac compiler warnings (more than the master's current 30 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1518//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1518//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1518//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1518//console This message is automatically generated. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, > TEZ-3128.003.patch, TEZ-3128.004.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169234#comment-15169234 ] TezQA commented on TEZ-3128: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12790138/TEZ-3128.003.patch against master revision 15d7339. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1517//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1517//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1517//console This message is automatically generated. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, > TEZ-3128.003.patch, TEZ-3128.004.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169083#comment-15169083 ] Tsuyoshi Ozawa commented on TEZ-3128: - Attached v03 patch to address the comment. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, > TEZ-3128.003.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168481#comment-15168481 ] Siddharth Seth commented on TEZ-3128: - [~ozawa] - looking at the scheduler, we already release all held containers as part of the shutdown process (way before we unregister from the RM). Given that, avoiding the container stop completely would be a better option, and simpler patch. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167944#comment-15167944 ] Tsuyoshi Ozawa commented on TEZ-3128: - [~sseth] [~hitesh] could you check the patch? > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163235#comment-15163235 ] TezQA commented on TEZ-3128: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12789566/TEZ-3128.002.patch against master revision 701e9aa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.history.TestHistoryParser Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1507//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1507//console This message is automatically generated. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, TEZ-3128.002.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159263#comment-15159263 ] Hitesh Shah commented on TEZ-3128: -- We do need to release/stop them before shutdown as there is no guarantee on when the AM will be killed after unregistering if the AM still has pending work ( flushing events, etc). My point was whether we can get away with releasing running containers to YARN instead of calling stop on each of them via the NM proxy. If we cannot release them, then we need to reduce the timeout and use a new NM client proxy with the modified timeouts to stop the containers. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch, amJstack > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158004#comment-15158004 ] Tsuyoshi Ozawa commented on TEZ-3128: - [~hitesh] [~sseth] Thank you for pointing. {quote} dagappmaster shuts down yarn scheduler service but it does not kill containers on shutdown - just releases them via amrmclient TezTaskCommunicatorImpl on stop() does nothing to kill containers. {quote} Right, that's why I thought the place I fixed was what you mentioned. Could you help me to clarify where to fix? > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157279#comment-15157279 ] Hitesh Shah commented on TEZ-3128: -- [~ozawa] I dont think the delayed container manager thread is the issue here. [~sseth] can you add more details/logs on this. I see the following as per code: - dagappmaster shuts down yarn scheduler service but it does not kill containers on shutdown - just releases them via amrmclient - TezTaskCommunicatorImpl on stop() does nothing to kill containers. It seems like the container launcher is the one trying shut down containers for some reason. Maybe we should just release containers via the scheduler service instead of trying to stop them? > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156575#comment-15156575 ] TezQA commented on TEZ-3128: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12788960/TEZ-3128.001.patch against master revision 941d199. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1493//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1493//console This message is automatically generated. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Labels: newbie > Attachments: TEZ-3128.001.patch > > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155125#comment-15155125 ] Hitesh Shah commented on TEZ-3128: -- Could probably override the shutdown timeouts programmatically for this jira and do a follow-up for how to address yarn timeouts for container launches while an app is running. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth > Labels: newbie > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154982#comment-15154982 ] Siddharth Seth commented on TEZ-3128: - Yep. That works. The timeouts while an app are running are also way too high. > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3128) Avoid stopping containers on the AM shutdown thread
[ https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154783#comment-15154783 ] Hitesh Shah commented on TEZ-3128: -- This is done to release containers faster before other services such as ATS are shutdown which can take a long time. But yes, we need to figure out how to short-circuit the release if the NM cannot be communicated. Does it make sense to override the timeouts just for this shutdown phase instead of trying to avoid stopping them ? > Avoid stopping containers on the AM shutdown thread > --- > > Key: TEZ-3128 > URL: https://issues.apache.org/jira/browse/TEZ-3128 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth > > During an AM shutdown, the TaskCommunicator is also shutdown and it tries to > stop containers in the shutdown thread itself. This can cause the AM shutdown > to block if NMs are not available. > This likely affects 0.7 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)