[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313799#comment-16313799 ] Rohini Palaniswamy commented on TEZ-160: Recently ran noticed that about 5% of Pig jobs launched from Oozie in a cluster, had application status as KILLED even though the DAG succeeded and Pig scripts completed successfully. This was because Pig calls TezClient.stop() on shutdown. If it is not killed within 10 seconds, it calls frameworkClient.killApplication(sessionAppId); which kill the AM. Because of the sleep time of 5 seconds after shutdown is issued, an application finishing as SUCCEEDED or KILLED depended on whether the shutdown completed within the next 5 seconds. Can we skip this check if it is a user initiated shutdown or at least lower it to 1 or 2 seconds? In case of Pig it is a Tez session and pig client is calling shutdown. I think we can skip it in general if it was a Tez session. The only time it will go down automatically is if session timeout expires. Adding another 5 seconds in that case is also wasteful. > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > Attachments: test.timeouts.txt > > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351757#comment-15351757 ] Siddharth Seth commented on TEZ-160: bq. Test test this theory. I ran the test suit on my box with the default 5s session timeout and then ran the tests with a 1s session timeout. The results are convincing at 38 minutes vs 29 minutes. Wow. we have inefficient Mini cluster usage in our tests. [~jeagles] - what do you thing the fix should be. Make the timeout configurable, or completely remove it? I believe results will come in from ATS if the AM has gone away. At this point, the switch from the AM to ATS should be working. If data is not available in ATS - limited information would come from the RM. > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > Attachments: test.timeouts.txt > > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367980#comment-14367980 ] André Kelpe commented on TEZ-160: - They are independent apps, so the shutdown happens after each test, so that we have a clean test env. > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375 ] Bikas Saha commented on TEZ-160: This should affect you if your tests are not using session mode. Is that the case? > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360423#comment-14360423 ] André Kelpe commented on TEZ-160: - Could the sleep period be made configurable until this is fixed correctly? We have a test suite with a few thousand dags and waiting 5 extra seconds for every one of them adds a lot of wall-clock time. > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349491#comment-14349491 ] Bikas Saha commented on TEZ-160: None as you can see. Can you please elaborate on what the effect of this is in your test setup? > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349424#comment-14349424 ] Jonathan Eagles commented on TEZ-160: - Any update on this issue? Been hitting this recently in my test setup. > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)