Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2020-07-16 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/206/

[Jul 16, 2020 4:29:37 PM] (noreply) HADOOP-17129. Validating storage keys in 
ABFS correctly (#2141)
[Jul 16, 2020 5:09:59 PM] (noreply) HADOOP-17130. Configuration.getValByRegex() 
shouldn't be updating the results while fetching. (#2142)
[Jul 16, 2020 6:06:49 PM] (pjoseph) YARN-10339. Fix TimelineClient in 
NodeManager failing when Simple Http Auth used in Secure Cluster


[Error replacing 'FILE' - Workspace is not accessible]

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10354) deadlock in ContainerMetrics and MetricsSystemImpl

2020-07-16 Thread Lee young gon (Jira)
Lee young gon created YARN-10354:


 Summary: deadlock in ContainerMetrics and MetricsSystemImpl
 Key: YARN-10354
 URL: https://issues.apache.org/jira/browse/YARN-10354
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: hadoop 3.1.2
Reporter: Lee young gon
 Attachments: full_thread_dump.txt

Could not get information about jmx in nodemanager. and I found deadlock 
through thread dump.

Below is the deadlock threads.
{code:java}
"Timer for 'NodeManager' metrics system" - Thread t@42
   java.lang.Thread.State: BLOCKED
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.getMetrics(ContainerMetrics.java:235)
- waiting to lock <7668d6f0> (a 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics)
 owned by "NM ContainerManager dispatcher" t@299
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.snapshotMetrics(MetricsSystemImpl.java:419)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:406)
- locked <3b956878> (a 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.onTimerEvent(MetricsSystemImpl.java:381)
- locked <3b956878> (a 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl$4.run(MetricsSystemImpl.java:368)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)   Locked ownable 
synchronizers:
- None
"NM ContainerManager dispatcher" - Thread t@299
   java.lang.Thread.State: BLOCKED
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.unregisterSource(MetricsSystemImpl.java:247)
- waiting to lock <3b956878> (a 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl) owned by "Timer for 
'NodeManager' metrics system" t@42
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.unregisterContainerMetrics(ContainerMetrics.java:228)
- locked <4e31c3ec> (a java.lang.Class)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.finished(ContainerMetrics.java:255)
- locked <7668d6f0> (a 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.updateContainerMetrics(ContainersMonitorImpl.java:813)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.onStopMonitoringContainer(ContainersMonitorImpl.java:935)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.handle(ContainersMonitorImpl.java:900)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl.handle(ContainersMonitorImpl.java:57)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)   Locked ownable synchronizers:
- None

{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10353) Log vcores used and cumulative cpu in containers monitor

2020-07-16 Thread Jim Brennan (Jira)
Jim Brennan created YARN-10353:
--

 Summary: Log vcores used and cumulative cpu in containers monitor
 Key: YARN-10353
 URL: https://issues.apache.org/jira/browse/YARN-10353
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.4.0
Reporter: Jim Brennan
Assignee: Jim Brennan


We currently log the percentage/cpu and percentage/cpus-used-by-yarn in the 
Containers Monitor log. It would be useful to also log vcores used vs vcores 
assigned, and total accumulated CPU time.

For example, currently we have an audit log that looks like this:
{noformat}
2020-07-16 20:33:51,550 DEBUG [Container Monitor] ContainersMonitorImpl.audit 
(ContainersMonitorImpl.java:recordUsage(651)) - Resource usage of ProcessTree 
809 for container-id container_1594931466123_0002_01_07: 309.5 MB of 2 GB 
physical memory used; 2.8 GB of 4.2 GB virtual memory used CPU:143.0905 
CPU/core:35.772625
{noformat}
The proposal is to add two more fields to show vCores and Cumulative CPU ms:
{noformat}
2020-07-16 20:33:51,550 DEBUG [Container Monitor] ContainersMonitorImpl.audit 
(ContainersMonitorImpl.java:recordUsage(651)) - Resource usage of ProcessTree 
809 for container-id container_1594931466123_0002_01_07: 309.5 MB of 2 GB 
physical memory used; 2.8 GB of 4.2 GB virtual memory used CPU:143.0905 
CPU/core:35.772625 vCores:2/1 CPU-ms:4180
{noformat}
This is a snippet of a log from one of our clusters running branch-2.8 with a 
similar change.
{noformat}
2020-07-16 21:00:02,240 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 5267 for container-id 
container_e04_1594079801456_1397450_01_001992: 1.6 GB of 2.5 GB physical memory 
used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 18 of 10 CPU vCores 
used. Cumulative CPU time: 157410
2020-07-16 21:00:02,269 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 18801 for container-id 
container_e04_1594079801456_1390375_01_19: 413.2 MB of 2.5 GB physical 
memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 of 10 CPU 
vCores used. Cumulative CPU time: 113830
2020-07-16 21:00:02,298 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 5279 for container-id 
container_e04_1594079801456_1397450_01_001991: 2.2 GB of 2.5 GB physical memory 
used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 17 of 10 CPU vCores 
used. Cumulative CPU time: 128630
2020-07-16 21:00:02,339 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 24189 for container-id 
container_e04_1594079801456_1390430_01_000415: 392.7 MB of 2.5 GB physical 
memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 of 10 CPU 
vCores used. Cumulative CPU time: 96060
2020-07-16 21:00:02,367 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 6751 for container-id 
container_e04_1594079801456_1397923_01_003248: 1.3 GB of 3 GB physical memory 
used; 4.3 GB of 6.3 GB virtual memory used. CPU usage: 12 of 10 CPU vCores 
used. Cumulative CPU time: 116820
2020-07-16 21:00:02,396 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 12138 for container-id 
container_e04_1594079801456_1397760_01_44: 4.4 GB of 6 GB physical memory 
used; 6.9 GB of 12.6 GB virtual memory used. CPU usage: 15 of 10 CPU vCores 
used. Cumulative CPU time: 45900
2020-07-16 21:00:02,424 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 101918 for container-id 
container_e04_1594079801456_1391130_01_002378: 2.4 GB of 4 GB physical memory 
used; 5.8 GB of 8.4 GB virtual memory used. CPU usage: 13 of 10 CPU vCores 
used. Cumulative CPU time: 2572390
2020-07-16 21:00:02,456 [Container Monitor] DEBUG ContainersMonitorImpl.audit: 
Memory usage of ProcessTree 26596 for container-id 
container_e04_1594079801456_1390446_01_000665: 418.6 MB of 2.5 GB physical 
memory used; 3.8 GB of 5.3 GB virtual memory used. CPU usage: 0 of 10 CPU 
vCores used. Cumulative CPU time: 101210
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2020-07-16 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/205/

[Jul 15, 2020 4:39:48 AM] (noreply) HDFS-15385 Upgrade boost library to 1.72 
(#2051)
[Jul 15, 2020 4:46:20 AM] (noreply) MAPREDUCE-7284. 
TestCombineFileInputFormat#testMissingBlocks fails (#2136)
[Jul 15, 2020 5:02:25 AM] (Shashikant Banerjee) HDFS-15319. Fix 
INode#isInLatestSnapshot() API. Contributed by Shashikant Banerjee.
[Jul 15, 2020 5:31:34 AM] (Akira Ajisaka) YARN-10350. 
TestUserGroupMappingPlacementRule fails
[Jul 15, 2020 6:24:34 AM] (noreply) MAPREDUCE-7285. Junit class missing from 
hadoop-mapreduce-client-jobclient-*-tests jar. (#2139)
[Jul 15, 2020 2:53:18 PM] (Jonathan Turner Eagles) HADOOP-17101. Replace Guava 
Function with Java8+ Function
[Jul 15, 2020 4:39:06 PM] (Jonathan Turner Eagles) HADOOP-17099. Replace Guava 
Predicate with Java8+ Predicate




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   Uncallable method 
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance()
 defined in anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:[line 87] 
   Dead store to entities in 
org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At 
TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At TestTimelineReaderHBaseDown.java:[line 190] 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
   Uncallable method 
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance()
 defined in anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:[line 87] 
   Dead store to entities in 
org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At 
TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At TestTimelineReaderHBaseDown.java:[line 190] 

findbugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 
   Uncallable method 
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance()
 defined in anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:[line 87] 
   Dead store to entities in 
org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At 
TestTimelineReaderHBaseDown.java:org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At TestTimelineReaderHBaseDown.java:[line 190] 

findbugs :

   module:hadoop-yarn-project 
   Uncallable method 
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage$1.getInstance()
 defined in anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:anonymous class At 
TestTimelineReaderWebServicesHBaseStorage.java:[line 87] 
   Dead store to entities in 
org.apache.hadoop.yarn.server.timelineservice.storage.TestTimelineReaderHBaseDown.checkQuery(HBaseTimelineReaderImpl)
 At 

[jira] [Created] (YARN-10352) MultiNode Placament assigns container on stopped NodeManagers

2020-07-16 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-10352:


 Summary: MultiNode Placament assigns container on stopped 
NodeManagers
 Key: YARN-10352
 URL: https://issues.apache.org/jira/browse/YARN-10352
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM 
Active Nodes will be still having those stopped nodes until NM Liveliness 
Monitor Expires after configured timeout 
(yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, 
Multi Node Placement assigns the containers on those nodes. They need to 
exclude the nodes which has not heartbeated for configured heartbeat interval 
(yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to 
Asynchronous Capacity Scheduler Threads. 
(CapacityScheduler#shouldSkipNodeSchedule)


*Repro:*

1. Enable Multi Node Placement 
(yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery Enabled  
(yarn.node.recovery.enabled)

2. Have only one NM running say worker0

3. Stop worker0 and start any other NM say worker1

4. Submit a sleep job. The containers will timeout as assigned to stopped NM 
worker0.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-07-16 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/749/

[Jul 15, 2020 3:33:42 PM] (ekrogen) HADOOP-17127. Use RpcMetrics.TIMEUNIT to 
initialize rpc queueTime and




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint jshint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   Useless object stored in variable removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:[line 664] 
   
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeVeryOldStoppedContainersFromCache()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:[line 741] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.createStatus()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:[line 359] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.usageMetrics
 is a mutable collection which should be package protected At 
ContainerMetrics.java:which should be package protected At 
ContainerMetrics.java:[line 134] 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 
   
org.apache.hadoop.yarn.state.StateMachineFactory.generateStateGraph(String) 
makes inefficient use of keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:[line 505] 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
   
org.apache.hadoop.yarn.state.StateMachineFactory.generateStateGraph(String) 
makes inefficient use of keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:keySet iterator instead of entrySet iterator At 
StateMachineFactory.java:[line 505] 

findbugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
   Useless object stored in variable removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:[line 664] 
   
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeVeryOldStoppedContainersFromCache()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:keySet iterator instead of entrySet iterator At 
NodeStatusUpdaterImpl.java:[line 741] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.createStatus()
 makes inefficient use of keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:keySet iterator instead of entrySet iterator At 
ContainerLocalizer.java:[line 359] 
   
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.usageMetrics
 is a mutable collection which should be package protected At 
ContainerMetrics.java:which should be package protected At 
ContainerMetrics.java:[line 134] 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At