Naganarasimha G R commented on YARN-4350:

Hi [~varun_saxena], As discussed offline, this seems to be a problem with the 
Distributed shell AM. {{TestDistributedShell.checkTimelineV1}} checks whether 
only 2 (requested) containers are being launched. But in reality more than 2 
are getting launched. 
possible reasons for it are :
* when RM has assigned additional containers and the Distributed shell AM is 
launching it. I had observed similar behavior of over assigning in MR also but 
MR AM takes care returning the extra apps assigned by the RM. Similar approach 
should exist in Distributed shell AM too.
* RM has killed for some reason and extra Container is reached

Not sure which of these cases is causing the assigning of additional 
containers, to analyze this we require more RM and AM logs which test case logs 
are not providing and further its not related to the fixes of this issue. IMO 
its also possible to come in trunk too. So i think we can raise another jira to 
track this !

> TestDistributedShell fails for V2 scenarios
> -------------------------------------------
>                 Key: YARN-4350
>                 URL: https://issues.apache.org/jira/browse/YARN-4350
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>       at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).

This message was sent by Atlassian JIRA

Reply via email to