Doesn't the DroppedSnapshotException also include a stack trace? Else it's going to be very hard to guess why it's failing (maybe lzo isn't properly configured, I see you're using it).
It seems the first region in your table split once, but the parent somehow revived. At least that's one way it could happen. You should definitely grep around the master log and figure which came first. If you'd like to do some hardcore debugging, please drop by the IRC channel where we'll gladly give you a hand. J-D On Fri, Nov 12, 2010 at 8:14 AM, Jeff Whiting <[email protected]> wrote: > Something else I'm seeing in the web interface: > > Master has: > surveySession,,1289578320747.edb037e879e0df212556205751f60a39. > ds4.internal:60030 > > But ds4 has: > surveySession,,1289521116693.92f678a946ea7dfc999564c202132ecf. > stores=2, storefiles=0, storefileSizeMB=0, memstoreSizeMB=0, > storefileIndexSizeMB=0 > > Why would the region names be different between the master and the region > server? > > ~Jeff > > On 11/12/2010 8:59 AM, Jeff Whiting wrote: >> >> I'm unit testing some of our internal framework that interacts with hbase >> (version cdh3b3 => 0.89.20100924+28). The unit test is small inserting and >> updating ~100 rows. My unit test expects the hbase table to be empty when >> starting. So I go into hbase shell table disable the table, drop it, and >> then recreate it. However hbase is having problems when I try to disable >> the table "surveySession". In the region server holding the region I see >> the following in the log: >> >> 2010-11-11 16:44:39,387 INFO org.apache.hadoop.hbase. >> regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:39,387 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:39,387 DEBUG >> org.apache.hadoop.hbase.regionserver.HRegion: Closing >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07.: disabling >> compactions & flushes >> 2010-11-11 16:44:39,387 DEBUG >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:39,387 DEBUG >> org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for >> region surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07.. >> Current region memstore size 242.7k >> 2010-11-11 16:44:39,388 DEBUG >> org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, >> commencing flushing stores >> 2010-11-11 16:44:39,426 INFO >> com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library >> 2010-11-11 16:44:39,454 INFO com.hadoop.compression.lzo.LzoCodec: >> Successfully loaded & initialized native-lzo library [hadoop-lzo rev >> 6a0ca93ae7b5070f61cca4be2d96a29d184a48ef] >> 2010-11-11 16:44:39,456 INFO org.apache.hadoop.io.compress.CodecPool: Got >> brand-new compressor >> 2010-11-11 16:44:39,478 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> org.apache.hadoop.hbase.DroppedSnapshotException: region: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:39,478 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:40,391 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:40,391 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:41,395 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:41,395 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:42,399 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:42,399 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:44,408 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:44,408 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:46,416 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:46,416 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:50,436 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:50,436 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:54,454 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:44:54,454 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:45:02,487 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:45:02,488 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:45:18,557 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> 2010-11-11 16:45:18,557 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: >> MSG_REGION_CLOSE: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> >> The only potential problem I'm seeing is the WARN: >> 2010-11-11 16:44:39,478 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> org.apache.hadoop.hbase.DroppedSnapshotException: region: >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. >> >> Then if I run a hbase hbck I get: >> ERROR: Region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. is not served >> by any region server but is listed in META to be on server >> ds2.internal:60020 >> >> The region server on port 60020 says that it is not servering the region. >> >> At this point the table is "stuck." I can't disable it, I can't insert >> any records into it (RegionNotServingException), nor do anything with it >> really. >> >> I can "fix" the problem by shutting down the region server forcing the >> master reassign all the regions then disable it on the new region server. It >> does seem to be somewhat time dependent because I have to try to disable the >> table after insert the records. Waiting a couple of hours then doing it >> there seems to be no problem. >> >> Master Log entries at the time of disabling: >> 2010-11-11 16:44:38,650 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:38,650 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:38,651 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:39,669 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:39,669 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:39,670 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:40,674 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:40,675 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:40,675 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:41,689 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:41,689 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:41,689 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:43,704 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:43,704 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:43,704 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:45,718 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:45,718 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:45,719 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:49,733 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:49,733 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:49,733 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:53,747 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:44:53,747 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:44:53,747 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:44:54,157 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scanning meta region {server: 192.168.1.168:60020 >> <http://192.168.1.168:60020/>, regionname: .META.,,1.1028785192, startKey: >> <>} >> 2010-11-11 16:44:54,167 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scan of 21 row(s) of meta region {server: >> 192.168.1.168:60020 <http://192.168.1.168:60020/>, regionname: >> .META.,,1.1028785192, startKey: <>} complete >> 2010-11-11 16:44:54,167 INFO org.apache.hadoop.hbase.master.BaseScanner: >> All 1 .META. region(s) scanned >> 2010-11-11 16:45:01,762 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:45:01,762 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:45:01,762 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:45:06,616 INFO org.apache.hadoop.hbase.master.ServerManager: >> 4 region servers, 0 dead, average load 5.5 >> 2010-11-11 16:45:13,556 DEBUG >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit >> for row <> in tableName surveySession: location server ds1.internal:60020, >> location region name >> surveySession,,1289352541018.ecd63368b7933af078b9a714b715d6d4. >> 2010-11-11 16:45:13,557 DEBUG >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit >> for row <> in tableName .META.: location server ds2.internal:60020, location >> region name .META.,,1.1028785192 >> 2010-11-11 16:45:13,564 DEBUG >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: >> Rowscanned=1, rowsOffline=0 >> 2010-11-11 16:45:16,110 DEBUG >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit >> for row <> in tableName surveySession: location server ds1.internal:60020, >> location region name >> surveySession,,1289352541018.ecd63368b7933af078b9a714b715d6d4. >> 2010-11-11 16:45:16,111 DEBUG >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache hit >> for row <> in tableName .META.: location server ds2.internal:60020, location >> region name .META.,,1.1028785192 >> 2010-11-11 16:45:16,117 DEBUG >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: >> Rowscanned=1, rowsOffline=0 >> 2010-11-11 16:45:17,780 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing unserved regions >> 2010-11-11 16:45:17,780 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Processing regions >> currently being served >> 2010-11-11 16:45:17,780 DEBUG >> org.apache.hadoop.hbase.master.ChangeTableState: Adding region >> surveySession,,1289518197042.29b693fa3ef2edc940fd979ba9ce2c07. to setClosing >> list >> 2010-11-11 16:45:30,993 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scanning meta region {server: 192.168.1.169:60020 >> <http://192.168.1.169:60020/>, regionname: -ROOT-,,0.70236052, startKey: <>} >> 2010-11-11 16:45:30,996 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.rootScanner scan of 1 row(s) of meta region {server: >> 192.168.1.169:60020 <http://192.168.1.169:60020/>, regionname: >> -ROOT-,,0.70236052, startKey: <>} complete >> 2010-11-11 16:45:54,160 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scanning meta region {server: 192.168.1.168:60020 >> <http://192.168.1.168:60020/>, regionname: .META.,,1.1028785192, startKey: >> <>} >> 2010-11-11 16:45:54,172 INFO org.apache.hadoop.hbase.master.BaseScanner: >> RegionManager.metaScanner scan of 21 row(s) of meta region {server: >> 192.168.1.168:60020 <http://192.168.1.168:60020/>, regionname: >> .META.,,1.1028785192, startKey: <>} complete >> 2010-11-11 16:45:54,172 INFO org.apache.hadoop.hbase.master.BaseScanner: >> All 1 .META. region(s) scanned >> 2010-11-11 16:46:06,623 INFO org.apache.hadoop.hbase.master.ServerManager: >> 4 region servers, 0 dead, average load 5.5 >> >> ~Jeff >> >> >> -- >> Jeff Whiting >> Qualtrics Senior Software Engineer >> [email protected] <mailto:[email protected]> > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [email protected] > >
