Hi,
We have some performance issue with a hbase map-only job, that copies data from
hbase to hdfs. We profiled the job and what follows were the top 6 ranked
methods, in terms of CPU usages, and the corresponding traces.
==========================================================================
CPU SAMPLES BEGIN (total = 1162350) Tue May 29 16:54:18 2012
rank self accum count trace method
1 16.67% 16.67% 193724 312698 sun.nio.ch.EPollArrayWrapper.epollWait
2 16.66% 33.32% 193609 314324 sun.nio.ch.EPollArrayWrapper.epollWait
3 16.65% 49.98% 193573 314044 sun.nio.ch.EPollArrayWrapper.epollWait
4 16.56% 66.54% 192491 319621 sun.nio.ch.EPollArrayWrapper.epollWait
5 10.90% 77.44% 126699 319206 sun.nio.ch.EPollArrayWrapper.epollWait
6 5.64% 83.08% 65565 327226 sun.nio.ch.EPollArrayWrapper.epollWait
TRACE 312698: (thread=200010)
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
TRACE 314324: (thread=200015)
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
TRACE 314044: (thread=200013)
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
TRACE 319621: (thread=200023)
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
TRACE 319206: (thread=200022)
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
TRACE 327226: (thread=200101)
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
===============================================================
Looks like the problem may be related to Zookeeper connections. Is there
anybody knows how to improve the performance of the job? Thanks.
Ey-Chih Chow