On Sun, Apr 10, 2011 at 11:30 PM, Eran Kutner <[email protected]> wrote: > Hi St.Ack and J-D, > Thanks for looking into this. > > It can definitely be a configuration problem, but I seriously doubt it > is a network or infrastructure problem. It's our own operated > infrastructure (not a cloud) and we have a lot of other services > running on it without any problem.
Services that could be stealing i/o and cpu from hbase cluster? Is that possible? > Note that Hbase is complaining > about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I > attached the logs from just one of them but it's more or less the same > on all. Please see the NN log for the same block below. > An attachment? Did it come through? Perhaps pastebin it and then add link here? > We are using Hadoop 0.20.2-CDH3B4 > and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT > 2011 (I built that from the 0.90.2 branch to try to fix a problem with > replication) > > </configuration> > Your configuration looks fine. Can you associate the block with a file? I don't see the association in the below. I see us trying to delete the block (would like to know why -- file deleteion?) and then it does exist on .2 for whatever reason. St.Ack > > This is the NN log for the same block: > 2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.allocateBlock: > /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279. > blk_1213779416283711358_54194 > 2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* blk_1213779416283711358_54194 recovery started, > primary=10.1.104.1:50010 > 2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addStoredBlock: Targets updated: block > blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target > for block blk_1213779416283711358_54194 with size 162696 > 2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addStoredBlock: Targets updated: block > blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target > for block blk_1213779416283711358_54194 with size 162696 > 2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addStoredBlock: Targets updated: block > blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target > for block blk_1213779416283711358_54194 with size 162696 > 2011-04-10 10:12:58,295 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=54249, newlength=162696, > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010], > closeFile=true, deleteBlock=false) > 2011-04-10 10:12:58,340 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(newblock=blk_1213779416283711358_54249, > file=/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279, > newgenerationstamp=54249, newlength=162696, > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010]) > successful > 2011-04-10 10:12:58,773 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54249, > newgenerationstamp=54250, newlength=162696, > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010], > closeFile=false, deleteBlock=false) > 2011-04-10 10:12:58,798 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(blk_1213779416283711358_54250) successful > 2011-04-10 10:12:58,833 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010 > is added to blk_1213779416283711358_54250 size 162696 > 2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010 > is added to blk_1213779416283711358_54250 size 162696 > 2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.1:50010 > is added to blk_1213779416283711358_54250 size 162696 > 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added > to invalidSet of 10.1.104.2:50010 > 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added > to invalidSet of 10.1.104.5:50010 > 2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added > to invalidSet of 10.1.104.1:50010 > 2011-04-10 10:13:29,811 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 10.1.104.2:50010 to delete blk_-826484977486799080_54240 > blk_-3859382357539098123_54247 blk_6239319017569949674_54242 > blk_2498532690928849052_54243 blk_-4133675075382826161_54234 > blk_-6299622736350943739_54246 blk_1213779416283711358_54250 > blk_-7407505773873780456_54244 blk_1070089292787245089_54237 > 2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 10.1.104.1:50010 to delete blk_-826484977486799080_54240 > blk_-9184330013088281910_54231 blk_842002398362359267_54239 > blk_6239319017569949674_54242 blk_-4133675075382826161_54234 > blk_-2420408100786396293_54245 blk_1213779416283711358_54250 > blk_1475680055217098335_54241 blk_-8623426430834046353_54236 > blk_-3859382357539098123_54247 blk_2498532690928849052_54243 > blk_-6299622736350943739_54246 blk_-7407505773873780456_54244 > blk_1070089292787245089_54237 > 2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange: > BLOCK* ask 10.1.104.5:50010 to delete blk_-8623426430834046353_54236 > blk_-3859382357539098123_54247 blk_-2420408100786396293_54245 > blk_-4133675075382826161_54234 blk_1213779416283711358_54250 > 2011-04-10 10:14:29,820 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:29,820 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 8 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b8682c8) from > 10.1.104.2:46012: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:29,835 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:29,836 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 6 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2bf41e61) from > 10.1.104.2:46012: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:30,862 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:30,862 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@27f688d2) from > 10.1.104.2:46012: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:31,888 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:31,888 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 5 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@2a90a77d) from > 10.1.104.2:46012: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:32,913 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:32,913 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7ee4155f) from > 10.1.104.2:46012: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:33,943 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:33,943 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 3 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3037510e) from > 10.1.104.2:46012: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:34,968 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:34,968 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 9 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@7faa86dc) from > 10.1.104.1:42703: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:35,992 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:35,993 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 6 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@147c16ff) from > 10.1.104.1:42703: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:37,005 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:37,005 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 9 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5ec9c9da) from > 10.1.104.1:42703: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:38,016 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:38,016 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 5 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@5c664878) from > 10.1.104.1:42703: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:39,029 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:39,029 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@3f1fedac) from > 10.1.104.1:42703: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:40,039 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:40,039 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 2 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@1efd5b14) from > 10.1.104.1:42703: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:41,050 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:41,050 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 3 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@b668df1) from > 10.1.104.5:54533: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:42,060 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:42,060 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 6 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@79f0cf0f) from > 10.1.104.5:54533: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:43,069 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:43,069 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 0 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@30aa2054) from > 10.1.104.5:54533: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:44,078 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:44,078 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 8 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@713083c1) from > 10.1.104.5:54533: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:45,088 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:45,088 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 8 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@4ebf4464) from > 10.1.104.5:54533: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > 2011-04-10 10:14:46,098 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194, > newgenerationstamp=0, newlength=0, newtargets=[], closeFile=false, > deleteBlock=true) > 2011-04-10 10:14:46,098 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 5 on 8020, call > commitBlockSynchronization(blk_1213779416283711358_54194, 0, 0, false, > true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@6f23299f) from > 10.1.104.5:54533: error: java.io.IOException: Block > (=blk_1213779416283711358_54194) not found > java.io.IOException: Block (=blk_1213779416283711358_54194) not found > > > > -eran > > > > > On Sun, Apr 10, 2011 at 22:27, Jean-Daniel Cryans <[email protected]> wrote: >> My experience in debugging those kind of issues is that 95% of the >> time it's a configuration issue, 4.99% of the time it's environment >> and network issues (network splits, lost packets, etc), and the >> remaining 0.01% is actual HDFS issues. >> >> The fact that you're saying that you had issues even with no load >> makes me think it's a configuration issue. Can we see your hdfs >> config? >> >> BTW the HBase log was pointing at 10.1.104.1 as the one having an >> issue, is that the log we are looking at? (it doesn't seem so) >> >> Thx, >> >> J-D >> >> On Sun, Apr 10, 2011 at 12:05 PM, Eran Kutner <[email protected]> wrote: >>> This is how it's configured in /etc/security/limits.con on all the >>> slaves in the cluster: >>> hadoop - nofile 32768 >>> hdfs - nofile 32768 >>> hbase - nofile 32768 >>> hadoop - nproc 32000 >>> hdfs - nproc 32000 >>> hbase - nproc 32000 >>> >>> When hbase is loading it prints: >>> ulimit -n 32768 >>> >>> >>> -eran >> >
