On Sun, Apr 10, 2011 at 11:30 PM, Eran Kutner <[email protected]> wrote:
> Hi St.Ack and J-D,
> Thanks for looking into this.
>
> It can definitely be a configuration problem, but I seriously doubt it
> is a network or infrastructure problem. It's our own operated
> infrastructure (not a cloud)  and we have a lot of other services
> running on it without any problem.

Services that could be stealing i/o and cpu from hbase cluster?  Is
that possible?

> Note that Hbase is complaining
> about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I
> attached the logs from just one of them but it's more or less the same
> on all. Please see the NN log for the same block below.
>

An attachment? Did it come through?  Perhaps pastebin it and then add link here?


> We are using Hadoop 0.20.2-CDH3B4
> and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT
> 2011 (I built that from the 0.90.2 branch to try to fix a problem with
> replication)
>




> </configuration>
>

Your configuration looks fine.

Can you associate the block with a file?  I don't see the association
in the below.  I see us trying to delete the block (would like to know
why -- file deleteion?) and then it does exist on .2 for whatever
reason.

St.Ack

>
> This is the NN log for the same block:
> 2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.allocateBlock:
> /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279.
> blk_1213779416283711358_54194
> 2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* blk_1213779416283711358_54194 recovery started,
> primary=10.1.104.1:50010
> 2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: Targets updated: block
> blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target
> for block blk_1213779416283711358_54194 with size 162696
> 2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: Targets updated: block
> blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target
> for block blk_1213779416283711358_54194 with size 162696
> 2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: Targets updated: block
> blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target
> for block blk_1213779416283711358_54194 with size 162696
> 2011-04-10 10:12:58,295 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=54249, newlength=162696,
> newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
> closeFile=true, deleteBlock=false)
> 2011-04-10 10:12:58,340 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(newblock=blk_1213779416283711358_54249,
> file=/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279,
> newgenerationstamp=54249, newlength=162696,
> newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010])
> successful
> 2011-04-10 10:12:58,773 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54249,
> newgenerationstamp=54250, newlength=162696,
> newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
> closeFile=false, deleteBlock=false)
> 2011-04-10 10:12:58,798 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(blk_1213779416283711358_54250) successful
> 2011-04-10 10:12:58,833 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010
> is added to blk_1213779416283711358_54250 size 162696
> 2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010
> is added to blk_1213779416283711358_54250 size 162696
> 2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.1:50010
> is added to blk_1213779416283711358_54250 size 162696
> 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
> to invalidSet of 10.1.104.2:50010
> 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
> to invalidSet of 10.1.104.5:50010
> 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
> to invalidSet of 10.1.104.1:50010
> 2011-04-10 10:13:29,811 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 10.1.104.2:50010 to delete  blk_-826484977486799080_54240
> blk_-3859382357539098123_54247 blk_6239319017569949674_54242
> blk_2498532690928849052_54243 blk_-4133675075382826161_54234
> blk_-6299622736350943739_54246 blk_1213779416283711358_54250
> blk_-7407505773873780456_54244 blk_1070089292787245089_54237
> 2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 10.1.104.1:50010 to delete  blk_-826484977486799080_54240
> blk_-9184330013088281910_54231 blk_842002398362359267_54239
> blk_6239319017569949674_54242 blk_-4133675075382826161_54234
> blk_-2420408100786396293_54245 blk_1213779416283711358_54250
> blk_1475680055217098335_54241 blk_-8623426430834046353_54236
> blk_-3859382357539098123_54247 blk_2498532690928849052_54243
> blk_-6299622736350943739_54246 blk_-7407505773873780456_54244
> blk_1070089292787245089_54237
> 2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* ask 10.1.104.5:50010 to delete  blk_-8623426430834046353_54236
> blk_-3859382357539098123_54247 blk_-2420408100786396293_54245
> blk_-4133675075382826161_54234 blk_1213779416283711358_54250
> 2011-04-10 10:14:29,820 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:29,820 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b8682c8) from
> 10.1.104.2:46012: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:29,835 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:29,836 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2bf41e61) from
> 10.1.104.2:46012: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:30,862 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:30,862 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@27f688d2) from
> 10.1.104.2:46012: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:31,888 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:31,888 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2a90a77d) from
> 10.1.104.2:46012: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:32,913 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:32,913 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7ee4155f) from
> 10.1.104.2:46012: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:33,943 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:33,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3037510e) from
> 10.1.104.2:46012: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:34,968 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:34,968 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7faa86dc) from
> 10.1.104.1:42703: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:35,992 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:35,993 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@147c16ff) from
> 10.1.104.1:42703: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:37,005 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:37,005 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5ec9c9da) from
> 10.1.104.1:42703: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:38,016 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:38,016 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5c664878) from
> 10.1.104.1:42703: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:39,029 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:39,029 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3f1fedac) from
> 10.1.104.1:42703: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:40,039 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:40,039 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@1efd5b14) from
> 10.1.104.1:42703: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:41,050 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:41,050 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b668df1) from
> 10.1.104.5:54533: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:42,060 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:42,060 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@79f0cf0f) from
> 10.1.104.5:54533: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:43,069 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:43,069 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@30aa2054) from
> 10.1.104.5:54533: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:44,078 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:44,078 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@713083c1) from
> 10.1.104.5:54533: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:45,088 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:45,088 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@4ebf4464) from
> 10.1.104.5:54533: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
> 2011-04-10 10:14:46,098 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false,
> deleteBlock=true)
> 2011-04-10 10:14:46,098 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 8020, call
> commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false,
> true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@6f23299f) from
> 10.1.104.5:54533: error: java.io.IOException: Block
> (=blk_1213779416283711358_54194) not found
> java.io.IOException: Block (=blk_1213779416283711358_54194) not found
>
>
>
> -eran
>
>
>
>
> On Sun, Apr 10, 2011 at 22:27, Jean-Daniel Cryans <[email protected]> wrote:
>> My experience in debugging those kind of issues is that 95% of the
>> time it's a configuration issue, 4.99% of the time it's environment
>> and network issues (network splits, lost packets, etc), and the
>> remaining 0.01% is actual HDFS issues.
>>
>> The fact that you're saying that you had issues even with no load
>> makes me think it's a configuration issue. Can we see your hdfs
>> config?
>>
>> BTW the HBase log was pointing at 10.1.104.1 as the one having an
>> issue, is that the log we are looking at? (it doesn't seem so)
>>
>> Thx,
>>
>> J-D
>>
>> On Sun, Apr 10, 2011 at 12:05 PM, Eran Kutner <[email protected]> wrote:
>>> This is how it's configured in /etc/security/limits.con on all the
>>> slaves in the cluster:
>>> hadoop           -       nofile          32768
>>> hdfs             -       nofile          32768
>>> hbase            -       nofile          32768
>>> hadoop           -       nproc           32000
>>> hdfs           -       nproc           32000
>>> hbase           -       nproc           32000
>>>
>>> When hbase is loading it prints:
>>> ulimit -n 32768
>>>
>>>
>>> -eran
>>
>

Reply via email to