We are rebuilding our cluster and starting from scratch. We are trying to
set up every layer correctly (removing raid, moving to ext4 + noatime etc. )
to try to get rid of the large number of java.io.IOExceptions we see per day
on every node (deletes being part of the problem). The hope is that it is
the root cause of some/most of the issues we have been having. Once that has
been done I will look out for this problem again.

Thanks for your help.


On Mon, Jan 10, 2011 at 2:55 PM, Stack <[email protected]> wrote:

> Our mailling list strips attachments.  Want to pastebin it and what
> version of hbase are you on?
> Thanks Wayne,
> St.Ack
>
> On Mon, Jan 10, 2011 at 11:34 AM, Wayne <[email protected]> wrote:
> > Yes it appears blocks are being referenced that have been deleted hours
> > before. The logs below are based on tracing that history.
> >
> > On Mon, Jan 10, 2011 at 2:31 PM, Stack <[email protected]> wrote:
> >
> >> Grep the blocks in namenode log.  This will associate them to files.
> >> The file association should help you build a history.  Is hbase trying
> >> to use files its already deleted elsewhere?
> >>
> >> St.Ack
> >>
> >> On Mon, Jan 10, 2011 at 9:20 AM, Wayne <[email protected]> wrote:
> >> > We are seeing a lot of warnings and errors in the HDFS logs (examples
> >> > below). I am looking for any help or recommendations anyone can
> provide.
> >> It
> >> > almost looks like compaction/splits occur and errors occur while
> looking
> >> for
> >> > the old data. Could this be true? Are these warnings and errors
> "normal"?
> >> >
> >> > 2011-01-10 16:55:48,920 WARN
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
> >> > datanode Command java.io.IOException: Error in deleting blocks.
> >> >
> >> >
> >> >
> >> > 2011-01-10 05:33:25,353 WARN
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error
> >> > trying to delete block blk_6074091327320053721_1406212. BlockInfo not
> >> > found in volumeMap.
> >> >
> >> > We also see a lot of messages like the one below.
> >> >
> >> > 2011-01-10 17:04:51,593 WARN
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > DatanodeRegistration(x.x.x.x1:50010,
> >> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> >> > ipcPort=50020):Got exception while serving
> >> > blk_-6095972096875566659_1323729 to /x.x.x.x8:
> >> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not
> valid.
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> >> >        at java.lang.Thread.run(Unknown Source)
> >> >
> >> > 2011-01-10 17:04:51,593 ERROR
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > DatanodeRegistration(x.x.x.x1:50010,
> >> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> >> > ipcPort=50020):DataXceiver
> >> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not
> valid.
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> >> >        at java.lang.Thread.run(Unknown Source)
> >> >
> >> >
> >> > I see that the block in question was deleted several hours earlier in
> the
> >> > logs on node 8.
> >> >
> >> > 2011-01-10 05:32:04,500 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
> >> > blk_-6095972096875566659_1323729 file
> >> >
> >>
> /mainraid/hadoop/hbase/data/current/subdir2/subdir15/blk_-6095972096875566659
> >> >
> >> >
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Reply via email to