[jira] [Created] (YARN-7580) ContainersMonitorImpl logged message lacks detail when exceeding memory limits
Wilfred Spiegelenburg created YARN-7580: --- Summary: ContainersMonitorImpl logged message lacks detail when exceeding memory limits Key: YARN-7580 URL: https://issues.apache.org/jira/browse/YARN-7580 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.1.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Currently in the RM logs container memory usage for a container that exceeds the memory limit is reported like this: {code} 2016-06-14 09:15:36,694 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1464251583966_0932_r_000876_0: Container [pid=134938,containerID=container_1464251583966_0932_01_002237] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.9 GB of 2.1 GB virtual memory used. Killing container. {code} Two enhancements as part of this jira: - make it clearer which limit we exceed - show exactly how much we exceeded the limit by -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7579) Add support for FPGA information shown in webUI
Zhankun Tang created YARN-7579: -- Summary: Add support for FPGA information shown in webUI Key: YARN-7579 URL: https://issues.apache.org/jira/browse/YARN-7579 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhankun Tang Supports retrieving FPGA information from REST and viewing from webUI -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7578) Extend TestDiskFailures.waitForDiskHealthCheck() sleeping time.
Guangming Zhang created YARN-7578: - Summary: Extend TestDiskFailures.waitForDiskHealthCheck() sleeping time. Key: YARN-7578 URL: https://issues.apache.org/jira/browse/YARN-7578 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.1.0 Environment: ARMv8 AArch64, Ubuntu16.04 Reporter: Guangming Zhang Priority: Minor Fix For: 3.1.0 Thread.sleep() function is called to wait for NodeManager to identify disk failures. But in some cases, for example the lower-end hardware computer, the sleep time is too short so that the NodeManager may haven't finished identifying disk failures. This will occur test errors: {code:java} Running org.apache.hadoop.yarn.server.TestDiskFailures Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 17.686 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.TestDiskFailures testLocalDirsFailures(org.apache.hadoop.yarn.server.TestDiskFailures) Time elapsed: 10.412 sec <<< FAILURE! java.lang.AssertionError: NodeManager could not identify disk failure. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) at org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:186) at org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99) testLogDirsFailures(org.apache.hadoop.yarn.server.TestDiskFailures) Time elapsed: 5.99 sec <<< FAILURE! java.lang.AssertionError: NodeManager could not identify disk failure. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) at org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:186) at org.apache.hadoop.yarn.server.TestDiskFailures.testLogDirsFailures(TestDiskFailures.java:111) {code} So extend the sleep time from 1000ms to 1500ms to avoid some unit test errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/ [Nov 27, 2017 6:19:58 PM] (jianhe) YARN-6168. Restarted RM may not inform AM about all existing containers. [Nov 27, 2017 10:31:52 PM] (yufei) YARN-7363. ContainerLocalizer don't have a valid log4j config in case of [Nov 28, 2017 3:48:55 AM] (yqlin) HDFS-12858. Add router admin commands usage in HDFS commands reference [Nov 28, 2017 11:52:59 AM] (stevel) HADOOP-15042. Azure PageBlobInputStream.skip() can return negative value [Nov 28, 2017 1:07:11 PM] (sunilg) YARN-7499. Layout changes to Application details page in new YARN UI. -1 overall The following subsystems voted -1: asflicense findbugs unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources At Resource.java:by returning Resource.resources At Resource.java:[line 213] Failed junit tests : hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000 hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 hadoop.hdfs.TestDFSStripedOutputStreamWithFailure hadoop.fs.viewfs.TestViewFileSystemLinkFallback hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 hadoop.fs.viewfs.TestViewFsWithXAttrs hadoop.hdfs.TestQuota hadoop.hdfs.TestMaintenanceState hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy hadoop.hdfs.TestSetrepIncreasing hadoop.hdfs.TestDFSStripedInputStream hadoop.hdfs.TestDFSStripedOutputStreamWithFailure210 hadoop.fs.viewfs.TestViewFileSystemHdfs hadoop.hdfs.TestDFSStripedOutputStream hadoop.hdfs.TestDFSStripedOutputStreamWithFailure200 hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 hadoop.hdfs.TestClientProtocolForPipelineRecovery hadoop.hdfs.server.balancer.TestBalancerRPCDelay hadoop.hdfs.TestUnsetAndChangeDirectoryEcPolicy hadoop.fs.viewfs.TestViewFileSystemLinkMergeSlash hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.TestErasureCodingPolicies hadoop.hdfs.TestDFSStripedOutputStreamWithFailure060 hadoop.fs.TestUnbuffer hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesAttempts hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs hadoop.yarn.service.TestServiceAM hadoop.yarn.service.TestYarnNativeServices hadoop.yarn.sls.nodemanager.TestNMSimulator cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-compile-javac-root.txt [276K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/whitespace-eol.txt [8.8M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/whitespace-tabs.txt [288K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/diff-javadoc-javadoc-root.txt [760K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [1.8M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [80K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/607/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt [104K]
[jira] [Created] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
Miklos Szegedi created YARN-7577: Summary: Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart Key: YARN-7577 URL: https://issues.apache.org/jira/browse/YARN-7577 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi This happens, if Fair Scheduler is the default. The test should run with both schedulers -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7576) Findbug warning for Resource exposing internal representation
Jason Lowe created YARN-7576: Summary: Findbug warning for Resource exposing internal representation Key: YARN-7576 URL: https://issues.apache.org/jira/browse/YARN-7576 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Reporter: Jason Lowe Precommit builds are complaining about a findbugs warning: {noformat} EI org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources Bug type EI_EXPOSE_REP (click for details) In class org.apache.hadoop.yarn.api.records.Resource In method org.apache.hadoop.yarn.api.records.Resource.getResources() Field org.apache.hadoop.yarn.api.records.Resource.resources At Resource.java:[line 213] Returning a reference to a mutable object value stored in one of the object's fields exposes the internal representation of the object. If instances are accessed by untrusted code, and unchecked changes to the mutable object would compromise security or other important properties, you will need to do something different. Returning a new copy of the object is better approach in many situations. {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7575) When using absolute capacity configuration with no max capacity, scheduler UI NPEs and can't grow queue
Eric Payne created YARN-7575: Summary: When using absolute capacity configuration with no max capacity, scheduler UI NPEs and can't grow queue Key: YARN-7575 URL: https://issues.apache.org/jira/browse/YARN-7575 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Eric Payne I encountered the following while reviewing and testing branch YARN-5881. The design document from YARN-5881 says that for max-capacity: {quote} 3) For each queue, we require: a) if max-resource not set, it automatically set to parent.max-resource {quote} When I try leaving blank {{yarn.scheduler.capacity.< queue-path>.maximum-capacity}}, the RMUI scheduler page refuses to render. It looks like it's in {{CapacitySchedulerPage$ LeafQueueInfoBlock}}: {noformat} 2017-11-28 11:29:16,974 [qtp43473566-220] ERROR webapp.Dispatcher: error handling URI: /cluster/scheduler java.lang.reflect.InvocationTargetException ... at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:164) at org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithoutParition(CapacitySchedulerPage.java:129) {noformat} Also... A job will run in the leaf queue with no max capacity set and it will grow to the max capacity of the cluster, but if I add resources to the node, the job won't grow any more even though it has pending resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template
Suma Shivaprasad created YARN-7574: -- Summary: Add support for Node Labels on Auto Created Leaf Queue Template Key: YARN-7574 URL: https://issues.apache.org/jira/browse/YARN-7574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad YARN-7473 adds support for auto created leaf queues to inherit node labels capacities from parent queues. Howebver there is no support for leaf queue template to allow different configured capacities for different node labels. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7573) Gpu Information page could be empty for nodes without GPU
Sunil G created YARN-7573: - Summary: Gpu Information page could be empty for nodes without GPU Key: YARN-7573 URL: https://issues.apache.org/jira/browse/YARN-7573 Project: Hadoop YARN Issue Type: Sub-task Components: webapp, yarn-ui-v2 Reporter: Sunil G Assignee: Sunil G In new YARN UI, node page is not accessible if that node doesnt have any GPU. Also Under node page, when we click on "List of Containers/Applications", Gpu Information left nave is disappearing. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org