[jira] [Created] (YARN-10227) Pull YARN-8242 back to branch-2.10
Jim Brennan created YARN-10227: -- Summary: Pull YARN-8242 back to branch-2.10 Key: YARN-10227 URL: https://issues.apache.org/jira/browse/YARN-10227 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.10.0, 2.10.1 Reporter: Jim Brennan Assignee: Jim Brennan We have recently seen the nodemanager OOM issue reported in YARN-8242 during a rolling upgrade. Our code is currently based on branch-2.8, but we are in the process of moving to 2.10. I checked and YARN-8242 pulls back to branch-2.10 pretty cleanly. The only conflict was a minor one in TestNMLeveldbStateStoreService.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
UI2 setup
Hi, I've followed the instructions for enabling and starting YARN's UI2 at https://hadoop.apache.org/docs/r2.10.0/hadoop-yarn/hadoop-yarn-site/YarnUI2.html I built using "-Pyarn-ui" and set the following properties in my yarn-site.xml. I am bringing up the daemons on my local Linux host: yarn.webapp.ui2.enable: true yarn.timeline-service.http-cross-origin.enabled: true yarn.resourcemanager.webapp.cross-origin.enabled: true yarn.nodemanager.webapp.cross-origin.enabled: true The RM comes up and I don't see any errors. I can access the old UI: rm-address:/cluster ( is my value for yarn.resourcemanager.webapp.address). But I can't access the new UI: rm-address:/ui2. This gets a 404 not found error. Please provide any insights you may have. Thanks, -Eric Payne - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/ [Apr 7, 2020 5:38:09 AM] (github) HDFS-15249 ThrottledAsyncChecker is not thread-safe. (#1922) [Apr 7, 2020 1:51:55 PM] (snemeth) YARN-10001. Add explanation of unimplemented methods in [Apr 7, 2020 3:03:17 PM] (snemeth) YARN-10207. CLOSE_WAIT socket connection leaks during rendering of [Apr 7, 2020 4:55:55 PM] (github) HADOOP-16932. distcp copy calls getFileStatus() needlessly and can fail [Apr 8, 2020 1:30:03 AM] (wilfreds) YARN-10063. Add container-executor arguments --http/--https to usage. -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml FindBugs : module:hadoop-cloud-storage-project/hadoop-cos Redundant nullcheck of dir, which is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:is known to be non-null in org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at BufferPool.java:[line 66] org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may expose internal representation by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At CosNInputStream.java:[line 87] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] Found reliance on default encoding in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, InputStream, byte[], long): new String(byte[]) At CosNativeFileSystemStore.java:[line 178] org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, String, String, int) may fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:fail to clean up java.io.InputStream Obligation to clean up resource created at CosNativeFileSystemStore.java:[line 252] is not discharged FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common org.apache.hadoop.yarn.server.webapp.WebServiceClient.sslFactory should be package protected At WebServiceClient.java: At WebServiceClient.java:[line 42] Failed junit tests : hadoop.hdfs.TestRollingUpgrade hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider hadoop.hdfs.server.federation.router.TestRouterFaultTolerant hadoop.yarn.sls.appmaster.TestAMSimulator hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.yarn.service.TestYarnNativeServices cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-compile-cc-root.txt [8.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-compile-javac-root.txt [428K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-checkstyle-root.txt [16M] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-patch-shellcheck.txt [16K] shelldocs:
[jira] [Created] (YARN-10226) NPE when using %primary_group queue mapping
Peter Bacsko created YARN-10226: --- Summary: NPE when using %primary_group queue mapping Key: YARN-10226 URL: https://issues.apache.org/jira/browse/YARN-10226 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Peter Bacsko Assignee: Peter Bacsko If we use the following queue mapping: {{u:%user:%primary_group}} then we get a NPE inside ResourceManager: {noformat} 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(881)) - Failed to load/recover state java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227) at org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594) ... {noformat} We to check if parent queue is null in {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.registry.secure.TestSecureLogins cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [324K] cc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt [304K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-shellcheck.txt [56K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/whitespace-tabs.txt [1.3M] xml: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [236K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry.txt [12K] https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [96K]
[jira] [Created] (YARN-10225) Support of AMD ROCm GPUs in Yarn
Luca Toscano created YARN-10225: --- Summary: Support of AMD ROCm GPUs in Yarn Key: YARN-10225 URL: https://issues.apache.org/jira/browse/YARN-10225 Project: Hadoop YARN Issue Type: Improvement Reporter: Luca Toscano Hi! I just watched [1] and it seems that Hops supports AMD GPUs natively in Yarn, so I am wondering if there any plans for Hadoop to do the same. I work at the Wikimedia foundation and we are currently using AMD GPUs, it would be really great to have support for them in Hadoop 3.x. [1][ https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow|https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10128) [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled on router
[ https://issues.apache.org/jira/browse/YARN-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T resolved YARN-10128. -- Resolution: Fixed > [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled > on router > --- > > Key: YARN-10128 > URL: https://issues.apache.org/jira/browse/YARN-10128 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > > Exception thrown is > {quote}Protocol interface > org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB is > not known., while invoking > ResourceManagerAdministrationProtocolPBClientImpl.refreshQueues over rm2 > after 1 failover attempts. Trying to failover after sleeping for 44717ms. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org