Sort of :( The server-side errors below started showing up on node5. We took node5 down and then they started on node7. Then we brought back node5 and were still seeing the errors on node7. So we bounced node7. The errors went back to node5. We bounced node5 a second time and that cleared the problem.
We are back in working order now. And working hard on upgrading to v0.98 From: lars hofhansl [mailto:[email protected]] Sent: Thursday, December 04, 2014 3:42 PM To: [email protected] Cc: Development Subject: Re: client timeout Only on that one region server? Weird. Does this persist when you bounce it? ________________________________ From: Ted Tuttle <[email protected]<mailto:[email protected]>> To: lars hofhansl <[email protected]<mailto:[email protected]>>; "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Cc: Development <[email protected]<mailto:[email protected]>> Sent: Wednesday, December 3, 2014 1:21 PM Subject: RE: client timeout Still on v0.94.16 We are seeing loads of these: 2014-12-03 12:28:32,696 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@55428f05<mailto:org.apache.hadoop.hbase.client.MultiAction@55428f05><mailto:org.apache.hadoop.hbase.client.MultiAction@55428f05<mailto:org.apache.hadoop.hbase.client.MultiAction@55428f05>>), rpc version=1, client version=29, methodsFingerPrint=-540141542 from <ip>:<port>after 131914 ms, since caller disconnected at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3944) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3854) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3835) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3878) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4804) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4777) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2194) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3754) at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) From: lars hofhansl [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, December 03, 2014 11:31 AM To: [email protected]<mailto:[email protected]> Cc: Development Subject: Re: client timeout Bad disk or network? Anything in the logs (HBase, HDFS, and System logs)? HBase 0.94, still? The easiest way to just kill the region servers, the others will pick up the regions. -- Lars ________________________________ From: Ted Tuttle <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> To: "[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>" <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Cc: Development <[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>> Sent: Wednesday, December 3, 2014 7:13 AM Subject: client timeout Hello- We are seeing recurring timeouts in communications with one our RSs. The error we see in our logs is: Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs ip>:<port> failed on socket timeout exceptio\ n: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.S\ ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs ip>:<port>] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87) at com.sun.proxy.$Proxy9.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Any ideas on what could be wrong w/ this RS? The RS is not unusually busy. Thanks, Ted
