Are you logging GC activity for the datanodes?
On Mon, Mar 28, 2011 at 9:28 PM, Jack Levin <[email protected]> wrote: > Good Evening, anyone seen this in your logs? It could be something > simple that we are missing. We also seeing that Datanodes can't be > accessed from the webport 50075 every ones in a while. > > -Jack > > On Mon, Mar 28, 2011 at 4:19 PM, Jack Levin <[email protected]> wrote: >> Hello guys, we are getting those errors: >> >> >> 2011-03-28 15:08:33,485 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /10.101.6.5:50010, dest: /10.101.6.5:51365, bytes: 66564, op: >> HDFS_READ, cliI >> D: >> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053, >> offset: 4191232, srvID: DS-1528941561-10.101.6.5-50010-1299713950021, >> blockid: blk_-30874978 >> 22408705276_723501, duration: 14409579 >> 2011-03-28 15:08:33,492 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /10.101.6.5:50010, dest: /10.101.6.5:51366, bytes: 14964, op: >> HDFS_READ, cliI >> D: >> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053, >> offset: 67094016, srvID: DS-1528941561-10.101.6.5-50010-1299713950021, >> blockid: blk_-3224146 >> 686136187733_731011, duration: 8855000 >> 2011-03-28 15:08:33,495 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /10.101.6.5:50010, dest: /10.101.6.5:51368, bytes: 51600, op: >> HDFS_READ, cliI >> D: >> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053, >> offset: 0, srvID: DS-1528941561-10.101.6.5-50010-1299713950021, >> blockid: blk_-63843345833451 >> 99846_731014, duration: 2053969 >> 2011-03-28 15:08:33,503 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /10.101.6.5:50010, dest: /10.101.6.5:42553, bytes: 462336, op: >> HDFS_READ, cli >> ID: >> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053, >> offset: 327680, srvID: DS-1528941561-10.101.6.5-50010-1299713950021, >> blockid: blk_-47512832 >> 94726600221_724785, duration: 480254862706 >> 2011-03-28 15:08:33,504 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.101.6.5:50010, >> storageID=DS-1528941561-10.101.6.5-50010-1299713950021, >> infoPort=50075, ipcPort=50020):Got exception while serving >> blk_-4751283294726600221_724785 to /10.101.6.5: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting >> for channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/10.101.6.5:500 >> 10 remote=/10.101.6.5:42553] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:110) >> >> 2011-03-28 15:08:33,504 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(10.101.6.5:50010, >> storageID=DS-1528941561-10.101.6.5-50010-1299713950021 >> , infoPort=50075, ipcPort=50020):DataXceiver >> java.net.SocketTimeoutException: 480000 millis timeout while waiting >> for channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/10.101.6.5:500 >> 10 remote=/10.101.6.5:42553] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:110) >> 2011-03-28 15:08:33,504 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /10.101.6.5:50010, dest: /10.101.6.5:51369, bytes: 66564, op: >> HDFS_READ, cliI >> D: >> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053, >> offset: 4781568, srvID: DS-1528941561-10.101.6.5-50010-1299713950021, >> blockid: blk_-30874978 >> 22408705276_723501, duration: 11478016 >> 2011-03-28 15:08:33,506 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /10.101.6.5:50010, dest: /10.101.6.5:51370, bytes: 66564, op: >> HDFS_READ, cliI >> D: >> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053, >> offset: 66962944, srvID: DS-1528941561-10.101.6.5-50010-1299713950021, >> blockid: blk_-3224146 >> 686136187733_731011, duration: 7643688 >> >> >> RS talking to DN, and we are getting timeouts, there are no issues >> like ulimit afaik, as we start them with 32k. Any ideas what the deal >> is? >> >> -Jack >> >
