Ahmed Hussein created YARN-10334:
------------------------------------
Summary: TestDistributedShell leaks resources on timeout/failure
Key: YARN-10334
URL: https://issues.apache.org/jira/browse/YARN-10334
Project: Hadoop YARN
Issue Type: Bug
Components: distributed-shell, test, yarn
Reporter: Ahmed Hussein
{{TestDistributedShell}} times out on trunk. I found that the application, and
containers will stay running in the background long after the unit test has
failed.
This causes failure of other test cases and several false positives failures as
result of:
* Ports will stay busy, so other tests cases fail to launch.
* Unit tests fail because of memory restrictions.
Although the unit test is already broken on trunk, we do not want its failures
to other unit tests.
{{TestDistributedShell}} needs to be revisited to make sure that all
{{YarnClients}}, and {{YarnApplications}} are closed properly at the end of the
each unit test (including exception and timeouts)
Steps to reproduce:
{code:bash}
mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers
## this will timeout as
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 90.234
s <<< FAILURE! - in
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
[ERROR]
testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
Time elapsed: 90.018 s <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 90000
milliseconds
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117)
at
org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089)
at
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 ยป
TestTimedOut
[INFO]
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
{code}
Using {{ps}} command, you can find the yarn processes are still in the
background
{code:bash}
/bin/bash -c $JRE_HOME/bin/java -Xmx512m
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster
--container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1
--num_containers 2 --priority 0 --appname DistributedShell --homedir
file:/Users/ahussein
1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stdout
2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stderr
$JRE_HOME/bin/java -Xmx512m
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster
--container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1
--num_containers 2 --priority 0 --appname DistributedShell --homedir
file:/Users/ahussein
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]