Hi St.Ack and J-D,
Thanks for looking into this.

It can definitely be a configuration problem, but I seriously doubt it
is a network or infrastructure problem. It's our own operated
infrastructure (not a cloud)  and we have a lot of other services
running on it without any problem. Note that Hbase is complaining
about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I
attached the logs from just one of them but it's more or less the same
on all. Please see the NN log for the same block below.

We are using Hadoop 0.20.2-CDH3B4
and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT
2011 (I built that from the 0.90.2 branch to try to fix a problem with
replication)



This is the HDFS config file:
# cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
     <name>dfs.permissions</name>
     <value>false</value>
  </property>
  <property>
     <!-- specify this so that running 'hadoop namenode -format'
formats the right dir -->
     <name>dfs.name.dir</name>
     <value>/hadoop/dfs/name,/hadoop/dfs/name2</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/hadoop/dfs/data/</value>
  </property>
  <property>
    <name>fs.checkpoint.dir</name>
    <value>/hadoop/dfs/namesecondary/</value>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
  </property>
</configuration>


This is the NN log for the same block:
2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.allocateBlock:
/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279.
blk_1213779416283711358_54194
2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* blk_1213779416283711358_54194 recovery started,
primary=10.1.104.1:50010
2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: Targets updated: block
blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target
for block blk_1213779416283711358_54194 with size 162696
2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: Targets updated: block
blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target
for block blk_1213779416283711358_54194 with size 162696
2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: Targets updated: block
blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target
for block blk_1213779416283711358_54194 with size 162696
2011-04-10 10:12:58,295 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=54249, newlength=162696,
newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
closeFile=true, deleteBlock=false)
2011-04-10 10:12:58,340 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(newblock=blk_1213779416283711358_54249,
file=/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279,
newgenerationstamp=54249, newlength=162696,
newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010])
successful
2011-04-10 10:12:58,773 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54249,
newgenerationstamp=54250, newlength=162696,
newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
closeFile=false, deleteBlock=false)
2011-04-10 10:12:58,798 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(blk_1213779416283711358_54250) successful
2011-04-10 10:12:58,833 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010
is added to blk_1213779416283711358_54250 size 162696
2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010
is added to blk_1213779416283711358_54250 size 162696
2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.1:50010
is added to blk_1213779416283711358_54250 size 162696
2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
to invalidSet of 10.1.104.2:50010
2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
to invalidSet of 10.1.104.5:50010
2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
to invalidSet of 10.1.104.1:50010
2011-04-10 10:13:29,811 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* ask 10.1.104.2:50010 to delete  blk_-826484977486799080_54240
blk_-3859382357539098123_54247 blk_6239319017569949674_54242
blk_2498532690928849052_54243 blk_-4133675075382826161_54234
blk_-6299622736350943739_54246 blk_1213779416283711358_54250
blk_-7407505773873780456_54244 blk_1070089292787245089_54237
2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* ask 10.1.104.1:50010 to delete  blk_-826484977486799080_54240
blk_-9184330013088281910_54231 blk_842002398362359267_54239
blk_6239319017569949674_54242 blk_-4133675075382826161_54234
blk_-2420408100786396293_54245 blk_1213779416283711358_54250
blk_1475680055217098335_54241 blk_-8623426430834046353_54236
blk_-3859382357539098123_54247 blk_2498532690928849052_54243
blk_-6299622736350943739_54246 blk_-7407505773873780456_54244
blk_1070089292787245089_54237
2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* ask 10.1.104.5:50010 to delete  blk_-8623426430834046353_54236
blk_-3859382357539098123_54247 blk_-2420408100786396293_54245
blk_-4133675075382826161_54234 blk_1213779416283711358_54250
2011-04-10 10:14:29,820 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:29,820 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b8682c8) from
10.1.104.2:46012: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:29,835 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:29,836 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2bf41e61) from
10.1.104.2:46012: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:30,862 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:30,862 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@27f688d2) from
10.1.104.2:46012: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:31,888 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:31,888 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2a90a77d) from
10.1.104.2:46012: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:32,913 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:32,913 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7ee4155f) from
10.1.104.2:46012: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:33,943 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:33,943 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3037510e) from
10.1.104.2:46012: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:34,968 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:34,968 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7faa86dc) from
10.1.104.1:42703: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:35,992 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:35,993 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@147c16ff) from
10.1.104.1:42703: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:37,005 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:37,005 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5ec9c9da) from
10.1.104.1:42703: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:38,016 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:38,016 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5c664878) from
10.1.104.1:42703: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:39,029 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:39,029 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3f1fedac) from
10.1.104.1:42703: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:40,039 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:40,039 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@1efd5b14) from
10.1.104.1:42703: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:41,050 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:41,050 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b668df1) from
10.1.104.5:54533: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:42,060 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:42,060 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@79f0cf0f) from
10.1.104.5:54533: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:43,069 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:43,069 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@30aa2054) from
10.1.104.5:54533: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:44,078 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:44,078 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@713083c1) from
10.1.104.5:54533: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:45,088 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:45,088 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@4ebf4464) from
10.1.104.5:54533: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found
2011-04-10 10:14:46,098 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
deleteBlock=true)
2011-04-10 10:14:46,098 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8020, call
commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@6f23299f) from
10.1.104.5:54533: error: java.io.IOException: Block
(=blk_1213779416283711358_54194) not found
java.io.IOException: Block (=blk_1213779416283711358_54194) not found



-eran




On Sun, Apr 10, 2011 at 22:27, Jean-Daniel Cryans <[email protected]> wrote:
> My experience in debugging those kind of issues is that 95% of the
> time it's a configuration issue, 4.99% of the time it's environment
> and network issues (network splits, lost packets, etc), and the
> remaining 0.01% is actual HDFS issues.
>
> The fact that you're saying that you had issues even with no load
> makes me think it's a configuration issue. Can we see your hdfs
> config?
>
> BTW the HBase log was pointing at 10.1.104.1 as the one having an
> issue, is that the log we are looking at? (it doesn't seem so)
>
> Thx,
>
> J-D
>
> On Sun, Apr 10, 2011 at 12:05 PM, Eran Kutner <[email protected]> wrote:
>> This is how it's configured in /etc/security/limits.con on all the
>> slaves in the cluster:
>> hadoop           -       nofile          32768
>> hdfs             -       nofile          32768
>> hbase            -       nofile          32768
>> hadoop           -       nproc           32000
>> hdfs           -       nproc           32000
>> hbase           -       nproc           32000
>>
>> When hbase is loading it prints:
>> ulimit -n 32768
>>
>>
>> -eran
>

Reply via email to