my cluster setup: both 6 machines are virtual machine. each machine: 4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory 192.168.10.48 namenode/jobtracker 192.168.10.47 secondary namenode 192.168.10.45 datanode/tasktracker 192.168.10.46 datanode/tasktracker 192.168.10.49 datanode/tasktracker 192.168.10.50 datanode/tasktracker
hdfs logs around 20:33 192.168.10.48 namenode log http://pastebin.com/rwgmPEXR 192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found this datanode crash first) 192.168.10.46 datanode log http://pastebin.com/aQ2emnUi 192.168.10.49 datanode log http://pastebin.com/aqsWrrL1 192.168.10.50 datanode log http://pastebin.com/V7C6tjpB hbase logs around 20:33 192.168.10.48 master log http://pastebin.com/2ZfeYA1p 192.168.10.45 region log http://pastebin.com/idCF2a7Y 192.168.10.46 region log http://pastebin.com/WEh4dA0f 192.168.10.49 region log http://pastebin.com/cGtpbTLz 192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange, not log at 20:33, but have log at 20:32 and 20:34) On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <[email protected]> wrote: > Can you post more of the data node log, around 20:33 ? > > Cheers > > > On Mon, Apr 21, 2014 at 8:57 PM, Li Li <[email protected]> wrote: > >> hadoop 1.0 >> hbase 0.94.11 >> >> datanode log from 192.168.10.45. why it shut down itself? >> >> 2014-04-21 20:33:59,309 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >> blk_-7969006819959471805_202154 received exception >> java.io.InterruptedIOException: Interruped while waiting for IO on >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout >> left. >> 2014-04-21 20:33:59,310 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(192.168.10.45:50010, >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949, >> infoPort=50075, ipcPort=50020):DataXceiver >> java.io.InterruptedIOException: Interruped while waiting for IO on >> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout >> left. >> at >> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349) >> at >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:334) >> at java.io.DataInputStream.read(DataInputStream.java:149) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107) >> at java.lang.Thread.run(Thread.java:722) >> 2014-04-21 20:33:59,310 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(192.168.10.45:50010, >> storageID=DS-1676697306-192.168.10.45-50010-1392029190949, >> infoPort=50075, ipcPort=50020):DataXceiver >> java.io.InterruptedIOException: Interruped while waiting for IO on >> channel java.nio.channels.SocketChannel[closed]. 466924 millis timeout >> left. >> at >> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349) >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99) >> at java.lang.Thread.run(Thread.java:722) >> 2014-04-21 20:34:00,291 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for >> threadgroup to exit, active threads is 0 >> 2014-04-21 20:34:00,404 INFO >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: >> Shutting down all async disk service threads... >> 2014-04-21 20:34:00,405 INFO >> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All >> async disk service threads have been shut down. >> 2014-04-21 20:34:00,413 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode >> 2014-04-21 20:34:00,424 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45 >> ************************************************************/ >> >> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <[email protected]> wrote: >> > bq. one datanode failed >> > >> > Was the crash due to out of memory error ? >> > Can you post the tail of data node log on pastebin ? >> > >> > Giving us versions of hadoop and hbase would be helpful. >> > >> > >> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <[email protected]> wrote: >> > >> >> I have a small hbase cluster with 1 namenode, 1 secondary namenode, 4 >> >> datanode. >> >> and the hbase master is on the same machine with namenode, 4 hbase >> >> slave on datanode machine. >> >> I found average requests per seconds is about 10,000. and the clusters >> >> crashed. and I found the reason is one datanode failed. >> >> >> >> the datanode configuration is about 4 cpu core and 10GB memory >> >> is my cluster overloaded? >> >> >>
