We don't have appends enabled (CDH 2). hbase version is '0.90.1, r1069174'. there does not seem to be any data loss.
thanks, Raghu. On Fri, May 13, 2011 at 12:34 AM, Stack <[email protected]> wrote: > Raghu, are you running on an hdfs which has append? > St.Ack > > On Fri, May 13, 2011 at 12:07 AM, Raghu Angadi <[email protected]> wrote: > > This happened twice in two days: > > > > clients can't access one of the regions and consistently fails > > with NotServingRegionException > > "org.apache.hadoop.hbase.NotServingRegionException: > > Region is not online: > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f." ... > > > > restarting the region server fixes this. > > > > portions from RS and master are below. Please let me know if more logs > are > > needed. we will continue to look into this more. > > > > *on RegionServer* : > > > > seems to happen right after compaction. grep of '6136400' on region > > server smf1-afz-19-sr1 : > > [...] > > 011-05-12 12:05:08,125 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open > > region: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:08,126 DEBUG > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > Processing > > open of users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:08,127 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62 > > Retrieved 114 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f; > > > data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE > > 2011-05-12 12:05:08,129 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: > > Opening region: REGION => {NAME => > > 'users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.', > STARTKEY > > => '61364002', ENDKEY => '613930521', ENCODED => > > a0bf035ac417cdd0697464f1c48f387f, TABLE => {{NAME => 'users', > MAX_FILESIZE > > => '130023424', FAMILIES => [{NAME => 'columns', BLOOMFILTER => 'NONE', > > REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'LZO', TTL => > > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => > > 'true'}, {NAME => 'extracted', BLOOMFILTER => 'NONE', REPLICATION_SCOPE > => > > '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', > BLOCKSIZE > > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => > > 'protobuf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => > > '2', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', > > IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'time', BLOOMFILTER > => > > 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION > => > > 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', > > BLOCKCACHE => 'true'}]}} > > 2011-05-12 12:05:08,130 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: > > Instantiated > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:20,977 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; > next > > sequenceid=208908217 > > 2011-05-12 12:05:20,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62 > > Retrieved 128 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f; > > > data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220, > > state=RS_ZK_REGION_OPENING > > 2011-05-12 12:05:20,982 DEBUG > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction > > requested for > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > because Region has too many store files; priority=21, compaction queue > > size=0 > > 2011-05-12 12:05:20,982 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Starting compaction on region > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > > Updated row > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > in region .META.,,1 with server=smf1-afz-19-sr1.prod.twitter.com:60020, > > startcode=1302734205220 > > 2011-05-12 12:05:20,984 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; > next > > sequenceid=208908217 > > 2011-05-12 12:05:20,985 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62 > > Retrieved 128 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f; > > > data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220, > > state=RS_ZK_REGION_OPENING > > 2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62 > > Retrieved 128 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f; > > > data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220, > > state=RS_ZK_REGION_OPENING > > 2011-05-12 12:05:20,986 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: > > Closing users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.: > > disabling compactions & flushes > > 2011-05-12 12:05:20,986 DEBUG > org.apache.hadoop.hbase.regionserver.HRegion: > > Updates disabled for region > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:20,986 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Closed users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:20,987 DEBUG > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:23,392 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > completed compaction on region > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. after 2sec > > 2011-05-12 12:40:44,829 DEBUG > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > NotServingRegionException; Region is not online: > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:40:45,862 DEBUG > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > NotServingRegionException; Region is not online: > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > > > [...] > > > > *on Master* : from grep '61364002,1297594642368' : > > [...] > > 2011-05-12 12:03:54,786 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > was=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > state=CLOSED, ts=1305201834784 > > 2011-05-12 12:03:54,790 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6 > > Retrieved 114 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set > watcher; > > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE > > [...] > > 2011-05-12 12:04:31,813 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. to > > smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220 > > 2011-05-12 12:04:39,333 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6 > > Retrieved 128 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set > watcher; > > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220, > > state=RS_ZK_REGION_OPENING > > [...] > > 2011-05-12 12:05:08,122 INFO > > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition > > timed out: > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > state=OPENING, ts=1305201871850 > > 2011-05-12 12:05:08,122 INFO > > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING > > for too long, reassigning > > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > > 2011-05-12 12:05:08,122 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6 > > Retrieved 128 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f; > > > data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220, > > state=RS_ZK_REGION_OPENING > > 2011-05-12 12:05:08,124 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > > > master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6 > > Retrieved 114 byte(s) of data from znode > > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set > watcher; > > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f., > > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE > > 2011-05-12 12:05:08,124 INFO > > org.apache.hadoop.hbase.master.AssignmentManager: Successfully > transitioned > > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. > into > > OFFLINE and forcing a new assignment > > > > Thanks for looking into this. > > > > Raghu. > > >
