Ahmed Hussein created YARN-10334:
------------------------------------

             Summary: TestDistributedShell leaks resources on timeout/failure
                 Key: YARN-10334
                 URL: https://issues.apache.org/jira/browse/YARN-10334
             Project: Hadoop YARN
          Issue Type: Bug
          Components: distributed-shell, test, yarn
            Reporter: Ahmed Hussein


{{TestDistributedShell}} times out on trunk. I found that the application, and 
containers will stay running in the background long after the unit test has 
failed.
This causes failure of other test cases and several false positives failures as 
result of:
* Ports will stay busy, so other tests cases fail to launch.
* Unit tests fail because of memory restrictions.

Although the unit test is already broken on trunk, we do not want its failures 
to other unit tests.
{{TestDistributedShell}} needs to be revisited to make sure that all 
{{YarnClients}}, and {{YarnApplications}} are closed properly at the end of the 
each unit test (including exception and timeouts)

Steps to reproduce:



{code:bash}
mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers

## this will timeout as
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 90.234 
s <<< FAILURE! - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
[ERROR] 
testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 90.018 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 90000 
milliseconds
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117)
        at 
org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089)
        at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   TestDistributedShell.testDSShellWithOpportunisticContainers:1438 ยป 
TestTimedOut
[INFO] 
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
{code}


Using {{ps}} command, you can find the yarn processes are still in the 
background

{code:bash}
/bin/bash -c $JRE_HOME/bin/java -Xmx512m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 
--num_containers 2 --priority 0 --appname DistributedShell --homedir 
file:/Users/ahussein 
1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stdout
 
2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stderr


$JRE_HOME/bin/java -Xmx512m 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster 
--container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 
--num_containers 2 --priority 0 --appname DistributedShell --homedir 
file:/Users/ahussein
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to