Is gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. the region that was having the issue? If so, if you looked in hadoop1-s05's logs, was this region opened around 2011-06-29 16:43:57? Was it also opened hadoop1-s02 not long after? Did you say what version of hbase you are on?
St.Ack On Fri, Jul 1, 2011 at 5:08 AM, Eran Kutner <[email protected]> wrote: > Hi Stack, > I'm not sure what the log means. I do see references to two different > servers, but that would probably happen if there was normal transition I > assume. I'm using version 0.90.3 > Here are the relevant lines from the master logs: > > 2011-06-19 21:39:37,164 INFO org.apache.hadoop.hbase.master.ServerManager: > Received REGION_SPLIT: > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.: > Daughters; > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > gs_raw_events,GSLoad_1308518810_1249_WEB204,1308533970928.46f876a4e97be04edb35eb8f8959d482. > from hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > 2011-06-19 21:43:12,983 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > Deleted daughter reference > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > qualifier=splitA, from parent > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658. > 2011-06-29 16:29:36,143 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > (offlining) > 2011-06-29 16:29:36,146 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName= > hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, load=(requests=0, > regions=1654, usedHeap=1870, maxHeap=12483) for region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > 2011-06-29 16:29:38,327 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned > node: /hbase/unassigned/584dac5cc70d8682f71c4675a843c309 > (region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > state=RS_ZK_REGION_CLOSED) > 2011-06-29 16:29:38,327 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSED, > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > region=584dac5cc70d8682f71c4675a843c309 > 2011-06-29 16:30:53,742 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED > event for 584dac5cc70d8682f71c4675a843c309 > 2011-06-29 16:30:53,742 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so > deleting ZK node and removing from regions in transition, skipping > assignment of region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > 2011-06-29 16:30:53,742 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Deleting existing unassigned node for > 584dac5cc70d8682f71c4675a843c309 that is in expected state > RS_ZK_REGION_CLOSED > 2011-06-29 16:30:53,801 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Successfully deleted unassigned node for > region 584dac5cc70d8682f71c4675a843c309 in expected state > RS_ZK_REGION_CLOSED > 2011-06-29 16:34:01,453 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > Updated region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > in META > 2011-06-29 16:37:12,247 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Creating (or updating) unassigned node for > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state > 2011-06-29 16:37:12,576 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition > plan was found (or we are ignoring an existing plan) for > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > so generated a random one; > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076; 5 (online=5, > exclude=null) available servers > 2011-06-29 16:37:12,576 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > 2011-06-29 16:37:13,102 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > region=584dac5cc70d8682f71c4675a843c309 > 2011-06-29 16:39:54,075 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for 584dac5cc70d8682f71c4675a843c309; deleting unassigned node > 2011-06-29 16:39:54,075 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Deleting existing unassigned node for > 584dac5cc70d8682f71c4675a843c309 that is in expected state > RS_ZK_REGION_OPENED > 2011-06-29 16:39:54,192 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Successfully deleted unassigned node for > region 584dac5cc70d8682f71c4675a843c309 in expected state > RS_ZK_REGION_OPENED > 2011-06-29 16:39:54,326 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > 2011-06-29 16:40:00,598 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Creating (or updating) unassigned node for > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state > 2011-06-29 16:40:00,877 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition > plan was found (or we are ignoring an existing plan) for > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > so generated a random one; > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076; 5 (online=5, > exclude=null) available servers > 2011-06-29 16:40:00,877 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > 2011-06-29 16:43:57,879 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition > timed out: > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > state=PENDING_OPEN, ts=1309380052723 > 2011-06-29 16:43:57,880 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been > PENDING_OPEN for too long, reassigning > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > 2011-06-29 16:43:57,936 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > state=PENDING_OPEN, ts=1309380052723 > 2011-06-29 16:43:57,936 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition > plan was found (or we are ignoring an existing plan) for > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > so generated a random one; > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > src=, dest=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544; 5 (online=5, > exclude=null) available servers > 2011-06-29 16:43:57,936 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > to hadoop1-s02.farm-ny.gigya.com,60020,1306919627544 > 2011-06-29 16:43:59,022 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > region=584dac5cc70d8682f71c4675a843c309 > 2011-06-29 16:43:59,221 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > region=584dac5cc70d8682f71c4675a843c309 > 2011-06-29 16:43:59,226 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > region=584dac5cc70d8682f71c4675a843c309 > 2011-06-29 16:43:59,274 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for 584dac5cc70d8682f71c4675a843c309; deleting unassigned node > 2011-06-29 16:43:59,274 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Deleting existing unassigned node for > 584dac5cc70d8682f71c4675a843c309 that is in expected state > RS_ZK_REGION_OPENED > 2011-06-29 16:43:59,296 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x13004a31d7804c4 Successfully deleted unassigned node for > region 584dac5cc70d8682f71c4675a843c309 in expected state > RS_ZK_REGION_OPENED > 2011-06-29 16:43:59,375 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Overwriting > 584dac5cc70d8682f71c4675a843c309 on > serverName=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > load=(requests=0, regions=1273, usedHeap=2676, maxHeap=12483) > 2011-06-29 16:43:59,375 DEBUG > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > on hadoop1-s02.farm-ny.gigya.com,60020,1306919627544 > > Thanks. > > -eran > > > > On Fri, Jul 1, 2011 at 09:05, Stack <[email protected]> wrote: > >> So, Eran, it seems as though two RegionServers were carrying the >> region? One deleted a file (compaction on its side)? Can you figure >> if indeed two servers had same region? (Check master logs for this >> regions assignments). >> >> What version of hbase? >> >> St.Ack >> >> >> On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner <[email protected]> wrote: >> > Hi, >> > I have a cluster of 5 nodes with one large table that currently has >> around >> > 12000 regions. Everything was working fine for relatively long time, >> until >> > now. >> > Yesterday I significantly reduced the TTL on the table and initiated >> major >> > compaction. This should have reduced the table size to about 20% of its >> > original size. >> > Today, I'm getting errors of inaccessible files on HDFS, for example: >> > java.io.IOException: Got error in response to OP_READ_BLOCK self=/ >> > 10.1.104.2:58047, remote=/10.1.104.2:50010 for file >> > >> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360 >> > for block 3674866614142268536_674205 >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1487) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811) >> > at >> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) >> > at java.io.DataInputStream.read(DataInputStream.java:132) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105) >> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >> > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) >> > at >> > >> org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:128) >> > at >> > >> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:68) >> > at >> > >> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75) >> > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) >> > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) >> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) >> > at >> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094) >> > at >> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036) >> > at >> > >> org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1433) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:139) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96) >> > at >> > >> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77) >> > at >> > org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1341) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2269) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1126) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1118) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1102) >> > at >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1781) >> > at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >> > at >> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) >> > >> > I checked and the file, indeed doesn't exist on HDFS, here is the name >> node >> > logs for this block, apparently because it was deleted: >> > 2011-06-19 21:39:36,651 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> > NameSystem.allocateBlock: >> > >> /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp/2096863423111131624. >> > blk_3674866614142268536_674205 >> > 2011-06-19 21:40:11,954 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010 is added >> to >> > blk_3674866614142268536_674205 size 67108864 >> > 2011-06-19 21:40:11,954 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.3:50010 is added >> to >> > blk_3674866614142268536_674205 size 67108864 >> > 2011-06-19 21:40:11,955 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> > NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010 is added >> to >> > blk_3674866614142268536_674205 size 67108864 >> > 2011-06-29 20:20:01,662 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> ask >> > 10.1.104.2:50010 to delete blk_3674866614142268536_674205 >> > 2011-06-29 20:20:13,671 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> ask >> > 10.1.104.5:50010 to delete blk_-4056387895369608597_675174 >> > blk_-5017882805850873821_672281 blk_702373987100607684_672288 >> > blk_-5357157478043290010_668506 blk_7118175133735412789_674903 >> > blk_-3569812563715986384_675231 blk_8296855057240604851_669285 >> > blk_-6483679172530609101_674268 blk_8738539715363739108_673682 >> > blk_1744841904626813502_675238 blk_-6035315106100051103_674266 >> > blk_-1789501623010070237_674908 blk_1944054629336265129_673689 >> > blk_3674866614142268536_674205 blk_7930425446738143892_647410 >> > blk_-3007186753042268449_669296 blk_-5482302621772778061_647416 >> > blk_-3765735404924932181_672004 blk_7476090998956811081_675169 >> > blk_7862291659285127712_646890 blk_-2666244746343584727_672013 >> > blk_6039172613960915602_674206 blk_-8470884397893086564_646899 >> > blk_4558230221166712802_668510 >> > 2011-06-29 20:20:46,698 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >> ask >> > 10.1.104.3:50010 to delete blk_-7851606440036350812_671552 >> > blk_9214649160203453845_647566 blk_702373987100607684_672288 >> > blk_5958099369749234073_668143 blk_-5172218034084903173_673109 >> > blk_-2934555181472719276_646476 blk_-1409986679370073931_672552 >> > blk_-2786034325506235869_669086 blk_3674866614142268536_674205 >> > blk_510158930393283118_673225 blk_916244738216205237_677068 >> > blk_-4317027806407316617_670379 blk_8555705688850972639_673485 >> > blk_-3765735404924932181_672004 blk_-5482302621772778061_647416 >> > blk_-2461801145731752623_674605 blk_-8737702908048998927_672549 >> > blk_-8470884397893086564_646899 blk_4558230221166712802_668510 >> > blk_-4056387895369608597_675174 blk_-8675430610673886073_647695 >> > blk_-6642870230256028318_668211 blk_-3890408516362176771_677483 >> > blk_-3569812563715986384_675231 blk_-5007142629771321873_674548 >> > blk_-3345355191863431669_667066 blk_8296855057240604851_669285 >> > blk_-6595462308187757470_672420 blk_-2583945228783203947_674607 >> > blk_-346988625120916345_677063 blk_4449525876338684218_674496 >> > blk_2617172363857549730_668201 blk_8738539715363739108_673682 >> > blk_-208904675456598428_679286 blk_-497549341281882641_646477 >> > blk_-6035315106100051103_674266 blk_-2356539038067297411_672388 >> > blk_-3881703084497103249_668137 blk_2214397881104950315_646643 >> > blk_-5907671443455357710_673223 blk_-2431880309956605679_669204 >> > blk_6039172613960915602_674206 blk_5053643911633142711_669194 >> > blk_-2636977729205236686_674664 >> > >> > I assume the file loss is somehow related to this change and the major >> > compaction that followed because the same scan that is failing now was >> > working fine yesterday and that is the only changed that happened on the >> > cluster. Any suggestions what to do now? >> > >> > Thanks. >> > >> > -eran >> > >> >
