[ https://issues.apache.org/jira/browse/YARN-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587183#comment-14587183 ]
Sangjin Lee commented on YARN-3792: ----------------------------------- Thanks [~Naganarasimha] for identifying the issues and providing a patch! I applied the patch on top of the current YARN-2928 branch, rebuilt, and ran the TestDistributedShell test locally. I still see one test failing: {noformat} ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 581.546 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testDSShellWithoutDomainV2CustomizedFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 29.651 sec <<< FAILURE! java.lang.AssertionError: Application finished event should be published atleast once expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyStringExistsSpecifiedTimes(TestDistributedShell.java:483) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:431) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:323) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow(TestDistributedShell.java:209) Results : Failed tests: TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow:209->testDSShell:323->checkTimelineV2:431->verifyStringExistsSpecifiedTimes:483 Application finished event should be published atleast once expected:<1> but was:<0> Tests run: 13, Failures: 1, Errors: 0, Skipped: 0 {noformat} Have you seen this? Could you kindly look into that? I'll also see if this is reproducible on my end. Some quick comments: (TestDistributedShell.java) - l.71-75: Is this comment necessary here? I'm not sure if we want to add a generic comment like this to a specific test... - l.106: Are the checks for null necessary? I thought that the test name was populated by junit and made available to test methods. Do things fail if we do not check for null? - l.376: I don't really like the sleep call as it is not completely deterministic; could there be a way to make this completely deterministic (using things like CountDownLatch, etc.)? (TimelineClientImpl.java) - l.385: nit: the C-style conditional check is not necessary; I would suggest a more natural check of {{(timelineServiceAddress == null)}} (ContainersMonitorImpl.java) - l.96: It is unrelated to this patch itself, but should we rename the variable name "threadPool"? It is a completely generic name. We should rename it to something like "timelineWriterThreadPool" or something to that effect. Let me know if you have a suggestion. > Test case failures in TestDistributedShell and some issue fixes related to > ATSV2 > -------------------------------------------------------------------------------- > > Key: YARN-3792 > URL: https://issues.apache.org/jira/browse/YARN-3792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Naganarasimha G R > Assignee: Naganarasimha G R > Attachments: YARN-3792-YARN-2928.001.patch > > > # encountered [testcase > failures|https://builds.apache.org/job/PreCommit-YARN-Build/8233/testReport/] > which was happening even without the patch modifications in YARN-3044 > TestDistributedShell.testDSShellWithoutDomainV2CustomizedFlow > TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow > TestDistributedShellWithNodeLabels.testDSShellWithNodeLabelExpression > # Remove unused {{enableATSV1}} in testDisstributedShell > # container metrics needs to be published only for v2 test cases of > testDisstributedShell > # Nullpointer was thrown in TimelineClientImpl.constructResURI when Aux > service was not configured and {{TimelineClient.putObjects}} was getting > invoked. > # Race condition for the Application events to published and test case > verification for RM's ApplicationFinished Timeline Events > # Application Tags for converted to lowercase in > ApplicationSubmissionContextPBimpl, hence RMTimelinecollector was not able to > detect to custom flow details of the app -- This message was sent by Atlassian JIRA (v6.3.4#6332)