Hi David, I will recommand you to run: - FSCK from your os (fsck.ext4) on this node; - FSCK from Hadoop on your HDFS - HBCK from HBase
Seems your node has some troubles to read something, just want to see if there is related issues. JM 2013/7/12 David Koch <[email protected]> > Hello, > > NOTE: I posted the same message in the the Cloudera group. > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we > systematically experience problems with region servers crashing silently > under workloads which used to pass without problems. More specifically, we > run about 30 Mapper jobs in parallel which read from HDFS and insert in > HBase. > > region server log > NOTE: no trace of crash, but server is down and shows up as such in > Cloudera Manager. > > 2013-07-12 10:22:12,050 WARN > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File > > hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286 > might be still open, length is 0 > 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: > Recovering file > > hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX > t%2C60020%2C1373616547696.1373617004286 > 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: > Finished lease recover attempt for > > hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286 > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor [.deflate] > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor [.deflate] > ... > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor [.deflate] > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor [.deflate] > < -- last log entry, region server is down here -- > > > > datanode log, same machine > > 2013-07-12 10:22:04,811 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: XXXXXXX:50010:DataXceiver > error processing WRITE_BLOCK operation src: /YYY.YY.YYY.YY:36024 dest: > /XXX.XX.XXX.XX:50010 > java.io.IOException: Premature EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) > at > > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > at > > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > at > > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414) > at > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564) > at > > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103) > at > > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67) > at > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:724) > < -- many repetitions of this -- > > > What could have caused this difference in stability? > > We did not change any configuration settings with respect to the previous > CDH 4.0.1 setup. In particular, we left ulimit and > dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete > log/configuration information. > > Thank you, > > /David >
