On Tuesday, July 3, 2012 at 10:08 AM, Jay Wilson wrote:
> 2012-07-03 09:05:00,530 ERROR > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Couldn't close > log at > hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/-ROOT-/70236052/recovered.edits/0000000000000000046.temp > java.net.NoRouteToHostException: No route to host > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:429) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3567) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3522) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2720) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2915) > 2012-07-03 09:05:00,536 WARN > org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of > [devrackA-03,60020,1341328322971, devrackA-04,60020,1341328322988, > devrackA-05,60020,1341328322976] > java.net.NoRouteToHostException: No route to host > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:429) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3567) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3522) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2720) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2915) NoRouteToHostException hints at some network trouble⦠Region servers aren't able to talk to the underlying Datanode processes. That would cause grief for sure. Looks like you have been having network trouble from the ZK issues that you mentioned earlier too. Maybe your TOR is dropping packets or something like that? A ping won't tell you that though. Do you have any sort of monitoring in place that can give you insight into how the network is performing?
