Hi Jean,

Sorry, I am not a native speaker, what do you mean for "The error message in
the datanode log is pretty obvious about the
config if you really hit it."?

And a full DataNode log is like the following, I now thought it's more like
the blocks are moved to some where else, but the region server didn't know
that.
Is there any settings I should care about?

2011-05-12 01:23:55,317 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_2440422069461309270_3925117 file
/hadoop/dfs/data/current/subdir19/subdir36/blk_2440422069461309270
The data node just deleted a block, and then it tried to serve the block
again as the following:

2011-05-12 01:32:37,444 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_2440422069461309270_3925117 to /10.0.2.12:
java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

2011-05-12 01:32:37,444 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

And still tried to serve the block after 12 hours:

2011-05-12 12:14:40,257 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_2440422069461309270_3925117 to /10.0.2.12:
java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

2011-05-12 12:14:40,257 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.0.2.14:50010, storageID=DS-1110633472-127.0.0.1-50010-1274814749004,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Block blk_2440422069461309270_3925117 is not valid.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:734)
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:722)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

If we went to the name node, we found the log like the following:

011-05-12 01:23:54,023 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
10.0.2.13:50010 to delete blk_-7967494102380684849_39242632011-05-12
01:23:54,023 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
10.0.2.14:50010 to delete *blk_2440422069461309270_3925117*
 blk_-946722429317009208_3986596

2011-05-12 01:24:06,032 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
10.0.2.12:50010 to delete blk_232241225919439230_3925136
blk_3644274217045813332_3954711 blk_-5566167900779836043_3946995
blk_7698571737096469832_3922371 blk_6567693665925295269_3925143
blk_-7967494102380684849_3924263 blk_761830835124422304_3954691
blk_22871562856813954_3924967 blk_-108718719454734123_3924500
blk_-5428140231738575121_3924529 *blk_2440422069461309270_3925117*
blk_7672999872894289875_3946328
blk_563378129567617626_3924835 blk_9167573765889664081_3924480
blk_-6121243797820261553_3948685

2011-05-12 01:24:06,032 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
10.0.2.34:50010 to delete blk_8917364772641481342_3924301 *
blk_2440422069461309270_3925117* blk_7231518316925385949_3924812
blk_5636355579130421598_3953434 blk_-8159096223500224670_3923072
blk_5958792757745258261_3948355

Best wishes,
Stanley Xu



On Fri, May 13, 2011 at 1:06 PM, Jean-Daniel Cryans <[email protected]>wrote:

> The error message in the datanode log is pretty obvious about the
> config if you really hit it. The error message you pasted from the DN
> doesn't look complete either.
>
> J-D
>
> On Wed, May 11, 2011 at 7:30 AM, Stanley Xu <[email protected]> wrote:
> > Dear all,
> >
> > We are using hadoop 0.20.2 with a couple of patches, and hbase 0.20.6,
> when
> > we are running a MapReduce job which contains a lots of random access to
> a
> > hbase table. We met a lot of logs like the following at the same time in
> the
> > region server and data node:
> >
> > For RegionServer:
> > "INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block
> > blk_7212216405058183301_3974453 from any node:  java.io.IOException: No
> live
> > nodes contain current block"
> > "WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /
> > 10.0.2.44:50010 for file
> > /hbase/CookieTag/197333923/VisitStrength/151799904199528367 for block
> > 7212216405058183301:java.io.IOException: Got error in response to
> > OP_READ_BLOCK for file
> > /hbase/CookieTag/197333923/VisitStrength/151799904199528367 for block
> > 7212216405058183301" and
> >
> > For DataNode:
> > "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(10.0.2.26:50010,
> > storageID=DS-1332752738-192.168.11.99-50010-1285486780176,
> infoPort=50075,
> > ipcPort=50020):DataXceiver
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)"
> >
> > We changed the dfs.datanode.max.xcievers parameters to 4096 in the
> > hdfs-site.xml in both hadoop and hdfs configuration. But when we use a
> > VisualVM to connect to a data node, we found there is less that 100
> > threads(close to 100, we count 97 in a thread dump, I am guessing the
> lost 3
> > is finished or just started) DataXceiver thread. (Thread dump logs like
> the
> > following)
> >
> > "org.apache.hadoop.hdfs.server.datanode.DataXceiver@3546286a" - Thread
> > t@2941003
> >   java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)"
> >
> > I guess our setting in the hdfs-site.xml didn't really work or be
> activated.
> > We have restarted the hadoop cluster by stop-dfs.sh and start-dfs.sh and
> > also restarted the hbase as well.
> > I am wondering if anyone could tell me how could I make sure the xceiver
> > parameters works or anything I should do except restart the dfs and
> hbase?
> > Could I do any check in the web interface or anywhere else?
> >
> > Thanks in advance.
> >
> > Best wishes,
> > Stanley Xu
> >
>

Reply via email to