[jira] [Created] (YARN-10227) Pull YARN-8242 back to branch-2.10

2020-04-08 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10227:
--

 Summary: Pull YARN-8242 back to branch-2.10
 Key: YARN-10227
 URL: https://issues.apache.org/jira/browse/YARN-10227
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.10.0, 2.10.1
Reporter: Jim Brennan
Assignee: Jim Brennan


We have recently seen the nodemanager OOM issue reported in YARN-8242 during a 
rolling upgrade.  Our code is currently based on branch-2.8, but we are in the 
process of moving to 2.10.  I checked and YARN-8242 pulls back to branch-2.10 
pretty cleanly.  The only conflict was a minor one in 
TestNMLeveldbStateStoreService.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



UI2 setup

2020-04-08 Thread epa...@apache.org
Hi,

I've followed the instructions for enabling and starting YARN's UI2 at 
https://hadoop.apache.org/docs/r2.10.0/hadoop-yarn/hadoop-yarn-site/YarnUI2.html

I built using "-Pyarn-ui" and set the following properties in my yarn-site.xml. 
I am bringing up the daemons on my local Linux host:
yarn.webapp.ui2.enable: true
yarn.timeline-service.http-cross-origin.enabled: true
yarn.resourcemanager.webapp.cross-origin.enabled: true
yarn.nodemanager.webapp.cross-origin.enabled: true


The RM comes up and I don't see any errors. I can access the old UI: 
rm-address:/cluster ( is my value for 
yarn.resourcemanager.webapp.address).
But I can't access the new UI: rm-address:/ui2. This gets a 404 not found 
error.

Please provide any insights you may have.
Thanks,
-Eric Payne

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2020-04-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/

[Apr 7, 2020 5:38:09 AM] (github) HDFS-15249 ThrottledAsyncChecker is not 
thread-safe. (#1922)
[Apr 7, 2020 1:51:55 PM] (snemeth) YARN-10001. Add explanation of unimplemented 
methods in
[Apr 7, 2020 3:03:17 PM] (snemeth) YARN-10207. CLOSE_WAIT socket connection 
leaks during rendering of
[Apr 7, 2020 4:55:55 PM] (github) HADOOP-16932. distcp copy calls 
getFileStatus() needlessly and can fail
[Apr 8, 2020 1:30:03 AM] (wilfreds) YARN-10063. Add container-executor 
arguments --http/--https to usage.




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

FindBugs :

   module:hadoop-cloud-storage-project/hadoop-cos 
   Redundant nullcheck of dir, which is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:[line 66] 
   org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may 
expose internal representation by returning CosNInputStream$ReadBuffer.buffer 
At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At 
CosNInputStream.java:[line 87] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, 
byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, 
File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long):in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long): new String(byte[]) At 
CosNativeFileSystemStore.java:[line 178] 
   org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, 
String, String, int) may fail to clean up java.io.InputStream Obligation to 
clean up resource created at CosNativeFileSystemStore.java:fail to clean up 
java.io.InputStream Obligation to clean up resource created at 
CosNativeFileSystemStore.java:[line 252] is not discharged 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
 
   org.apache.hadoop.yarn.server.webapp.WebServiceClient.sslFactory should 
be package protected At WebServiceClient.java: At WebServiceClient.java:[line 
42] 

Failed junit tests :

   hadoop.hdfs.TestRollingUpgrade 
   hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider 
   hadoop.hdfs.server.federation.router.TestRouterFaultTolerant 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.yarn.service.TestYarnNativeServices 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-compile-cc-root.txt
  [8.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-compile-javac-root.txt
  [428K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-checkstyle-root.txt
  [16M]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1463/artifact/out/diff-patch-shellcheck.txt
  [16K]

   shelldocs:

   

[jira] [Created] (YARN-10226) NPE when using %primary_group queue mapping

2020-04-08 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-10226:
---

 Summary: NPE when using %primary_group queue mapping
 Key: YARN-10226
 URL: https://issues.apache.org/jira/browse/YARN-10226
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


If we use the following queue mapping:

{{u:%user:%primary_group}}

then we get a NPE inside ResourceManager:

{noformat}
2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(881)) - Failed to load/recover state
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
...
{noformat}

We to check if parent queue is null in 
{{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-04-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.registry.secure.TestSecureLogins 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [324K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt
  [304K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/whitespace-tabs.txt
  [1.3M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [236K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/649/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [96K]

   

[jira] [Created] (YARN-10225) Support of AMD ROCm GPUs in Yarn

2020-04-08 Thread Luca Toscano (Jira)
Luca Toscano created YARN-10225:
---

 Summary: Support of AMD ROCm GPUs in Yarn
 Key: YARN-10225
 URL: https://issues.apache.org/jira/browse/YARN-10225
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luca Toscano


Hi!

I just watched [1] and it seems that Hops supports AMD GPUs natively in Yarn, 
so I am wondering if there any plans for Hadoop to do the same. I work at the 
Wikimedia foundation and we are currently using AMD GPUs, it would be really 
great to have support for them in Hadoop 3.x. 

[1][ 
https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow|https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10128) [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled on router

2020-04-08 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T resolved YARN-10128.
--
Resolution: Fixed

> [FederationSecurity] YARN RMAdmin commands fail when Authorization is enabled 
> on router
> ---
>
> Key: YARN-10128
> URL: https://issues.apache.org/jira/browse/YARN-10128
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Exception thrown is 
> {quote}Protocol interface 
> org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB is 
> not known., while invoking 
> ResourceManagerAdministrationProtocolPBClientImpl.refreshQueues over rm2 
> after 1 failover attempts. Trying to failover after sleeping for 44717ms.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org