Because we are running the application with accessing local disk now, I can’t
give the “top” command’s output when running with HDFS.
But we used “top” and “pidstat” to check the CPU utilization of our
application, I can confirm the CPU utilization of our application was
increasing and the CPU utilization of datanode, namenode, resourcemanager and
NodeManager processes kept stable.
Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top
top - 15:04:58 up 33 days, 24 min, 3 users, load average: 4.05, 4.08, 3.92
Tasks: 361 total, 1 running, 360 sleeping, 0 stopped, 0 zombie
Cpu(s): 34.5%us, 2.3%sy, 0.0%ni, 63.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 66068256k total, 54013596k used, 12054660k free, 3400140k buffers
Swap: 2097144k total, 268376k used, 1828768k free, 41202752k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
33364 reportin 20 0 1628m 745m 17m S 168.2 1.2 0:05.07
/usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin 20 0 1078m 246m 18m S 130.7 0.4 0:08.10
/usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin 20 0 1613m 143m 17m S 108.1 0.2 0:03.26
/usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin 20 0 1724m 530m 18m S 8.6 0.8 4:31.44
/usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin 20 0 1679m 370m 18m S 6.6 0.6 0:09.13
/usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin 20 0 1662m 372m 18m S 6.0 0.6 0:09.22
/usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin 20 0 1639m 326m 18m S 2.0 0.5 0:42.49
/usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin 20 0 1804m 400m 18m S 2.0 0.6 0:53.13
/usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin 20 0 1622m 401m 18m S 1.7 0.6 0:42.88
/usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin 20 0 15260 1556 1012 R 0.7 0.0 0:00.04 top
2984 root 20 0 1227m 14m 1324 S 0.3 0.0 52:00.05 /usr/bin/python
/opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin 20 0 1090m 248m 18m S 0.3 0.4 0:09.21
/usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic 20 0 5333m 216m 15m S 0.3 0.3 37:31.53
/usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin 20 0 5472m 380m 13m S 0.3 0.6 3:57.15 java -Xmx2048m
-XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
1 root 20 0 19228 1100 896 S 0.0 0.0 10:53.91 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 [kthreadd]
3 root RT 0 0 0 0 S 0.0 0.0 0:27.01 [migration/0]
From: Stanley Shi [mailto:[email protected]]
Sent: 2014年9月1日 14:32
To: [email protected]
Subject: Re: CPU utilization keeps increasing when using HDFS
Would you please give the output of the "top" command? at least to show that
the HDFS process did use that much of CPU;
On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao
<[email protected]<mailto:[email protected]>> wrote:
Hi
We have written a MapReduce application based on Hadoop 2.4 which keeps reading
data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps
increasing when it is running. If we changed the application to read data from
local disk without changing any other business logic, the CPU utilization kept
stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any
solution to fix it?
[cid:[email protected]]
Thanks a lot!
BR/Shiyuan
--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]