Hi all, we just upgraded our HBase cluster from 0.90.3 to 0.92.1 and now we
have a lot of warnings like these in our clients:
2012-11-15 01:31:57,734 | WARN | <our thread> |
HConnectionManager$HConnectionImplementation
| Failed all from region=<our_table>,0d9750f9e22628e94dd33a78292d62
01,1346224022442.a42b483bb10fbaea70f8616e7f06899c., hostname=<our_host>,
port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to <our_host>:60020 failed on socket timeout exception:
java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
ja\
va.nio.channels.SocketChannel[connected local=/<some_host>:12492
remote=<our_host>]
at java.util.concurrent.FutureTask$Sync.innerGet(
FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.processBatch(HConnectionManager.java:1409)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:746)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:715)
at org.apache.hadoop.hbase.client.HTablePool$
PooledHTable.get(HTablePool.java:371)
at <our_client_code>
Caused by: java.net.SocketTimeoutException: Call to <our_host>:60020 failed
on socket timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[\
connected local=<some_host>:12492 remote=<our_host>:60020]
at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(
HBaseClient.java:949)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.
java:922)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.
invoke(WritableRpcEngine.java:150)
at $Proxy7.multi(Unknown Source)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.getRegionServerWithoutRetries(
HConnectionManager.java:1365)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation$3.call(HConnectionManager.java:1383)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation$3.call(HConnectionManager.java:1381)
at java.util.concurrent.FutureTask$Sync.innerRun(
FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
We have that on puts too (puts seems to use a lower value for the socket
timeout):
2012-11-14 20:44:55,320 | WARN | <our_thread> |
HConnectionManager$HConnectionImplementation | Failed all from
region=<our_table>,,1346224022442.e90f2b7680df46d93d8ecd13eee08265.,
hostname=<our_server>, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
20000 millis timeout while waiting for channel to be ready for connect. ch
: java.nio.channels.SocketChannel[connection-pending
remote=<our_server>:60020]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:803)
at
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:402)
at <our_client_code>
Caused by: java.net.SocketTimeoutException: 20000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending
remote=<our_server>:60020]
at org.apache.hadoop.net
.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy7.multi(Unknown Source)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
These exceptions are retried and eventually succeed, but it slows down our
services a lot.
What can we do to investigate this issue ?
It happens when several tables are being compacted and/or when there is
several scanners running.
Before the timeouts, we observe an increasing CPU load on a single region
server and if we add region servers and wait for rebalancing, we always
have the same region server causing problems like these:
2012-11-14 20:47:08,443 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server Responder, call
multi(org.apache.hadoop.hbase.client.MultiAction@2c3da1aa), rpc
version=1, client version=29, methodsFingerPrint=54742778 from
<ip>:45334: output error
2012-11-14 20:47:08,443 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server handler 3 on 60020 caught: java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.
processResponse(HBaseServer.java:924)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.
doRespond(HBaseServer.java:1003)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(
HBaseServer.java:409)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
With the same access patterns, we did not have this issue in HBase 0.90.3.