Hi Jean, Sorry, I am not a native speaker, what do you mean for "The error message in the datanode log is pretty obvious about the config if you really hit it."?
And a full DataNode log is like the following, I now thought it's more like the blocks are moved to some where else, but the region server didn't know that. Is there any settings I should care about? 2011-05-12 01:23:55,317 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_2440422069461309270_3925117 file /hadoop/dfs/data/current/subdir19/subdir36/blk_2440422069461309270 The data node just deleted a block, and then it tried to serve the block again as the following: 2011-05-12 01:32:37,444 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004, infoPort=50075, ipcPort=50020):Got exception while serving blk_2440422069461309270_3925117 to /10.0.2.12: java.io.IOException: Block blk_2440422069461309270_3925117 is not valid. at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2011-05-12 01:32:37,444 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Block blk_2440422069461309270_3925117 is not valid. at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) And still tried to serve the block after 12 hours: 2011-05-12 12:14:40,257 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004, infoPort=50075, ipcPort=50020):Got exception while serving blk_2440422069461309270_3925117 to /10.0.2.12: java.io.IOException: Block blk_2440422069461309270_3925117 is not valid. at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) 2011-05-12 12:14:40,257 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Block blk_2440422069461309270_3925117 is not valid. at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734) at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) at java.lang.Thread.run(Thread.java:619) If we went to the name node, we found the log like the following: 011-05-12 01:23:54,023 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.0.2.13:50010 to delete blk_-7967494102380684849_39242632011-05-12 01:23:54,023 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.0.2.14:50010 to delete *blk_2440422069461309270_3925117* blk_-946722429317009208_3986596 2011-05-12 01:24:06,032 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.0.2.12:50010 to delete blk_232241225919439230_3925136 blk_3644274217045813332_3954711 blk_-5566167900779836043_3946995 blk_7698571737096469832_3922371 blk_6567693665925295269_3925143 blk_-7967494102380684849_3924263 blk_761830835124422304_3954691 blk_22871562856813954_3924967 blk_-108718719454734123_3924500 blk_-5428140231738575121_3924529 *blk_2440422069461309270_3925117* blk_7672999872894289875_3946328 blk_563378129567617626_3924835 blk_9167573765889664081_3924480 blk_-6121243797820261553_3948685 2011-05-12 01:24:06,032 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.0.2.34:50010 to delete blk_8917364772641481342_3924301 * blk_2440422069461309270_3925117* blk_7231518316925385949_3924812 blk_5636355579130421598_3953434 blk_-8159096223500224670_3923072 blk_5958792757745258261_3948355 Best wishes, Stanley Xu On Fri, May 13, 2011 at 1:06 PM, Jean-Daniel Cryans <[email protected]>wrote: > The error message in the datanode log is pretty obvious about the > config if you really hit it. The error message you pasted from the DN > doesn't look complete either. > > J-D > > On Wed, May 11, 2011 at 7:30 AM, Stanley Xu <[email protected]> wrote: > > Dear all, > > > > We are using hadoop 0.20.2 with a couple of patches, and hbase 0.20.6, > when > > we are running a MapReduce job which contains a lots of random access to > a > > hbase table. We met a lot of logs like the following at the same time in > the > > region server and data node: > > > > For RegionServer: > > "INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block > > blk_7212216405058183301_3974453 from any node: java.io.IOException: No > live > > nodes contain current block" > > "WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to / > > 10.0.2.44:50010 for file > > /hbase/CookieTag/197333923/VisitStrength/151799904199528367 for block > > 7212216405058183301:java.io.IOException: Got error in response to > > OP_READ_BLOCK for file > > /hbase/CookieTag/197333923/VisitStrength/151799904199528367 for block > > 7212216405058183301" and > > > > For DataNode: > > "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: > > DatanodeRegistration(10.0.2.26:50010, > > storageID=DS-1332752738-192.168.11.99-50010-1285486780176, > infoPort=50075, > > ipcPort=50020):DataXceiver > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)" > > > > We changed the dfs.datanode.max.xcievers parameters to 4096 in the > > hdfs-site.xml in both hadoop and hdfs configuration. But when we use a > > VisualVM to connect to a data node, we found there is less that 100 > > threads(close to 100, we count 97 in a thread dump, I am guessing the > lost 3 > > is finished or just started) DataXceiver thread. (Thread dump logs like > the > > following) > > > > "org.apache.hadoop.hdfs.server.datanode.DataXceiver@3546286a" - Thread > > t@2941003 > > java.lang.Thread.State: RUNNABLE > > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)" > > > > I guess our setting in the hdfs-site.xml didn't really work or be > activated. > > We have restarted the hadoop cluster by stop-dfs.sh and start-dfs.sh and > > also restarted the hbase as well. > > I am wondering if anyone could tell me how could I make sure the xceiver > > parameters works or anything I should do except restart the dfs and > hbase? > > Could I do any check in the web interface or anywhere else? > > > > Thanks in advance. > > > > Best wishes, > > Stanley Xu > > >
