This happened twice in two days:

clients can't access one of the regions and consistently fails
with NotServingRegionException
"org.apache.hadoop.hbase.NotServingRegionException:
Region is not online:
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f." ...

restarting the region server fixes this.

portions from RS and master are below. Please let me know if more logs are
needed. we will continue to look into this more.

*on RegionServer* :

    seems to happen right after compaction. grep of  '6136400' on region
server smf1-afz-19-sr1 :
[...]
011-05-12 12:05:08,125 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
region: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:08,126 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing
open of users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:08,127 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 114 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
2011-05-12 12:05:08,129 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Opening region: REGION => {NAME =>
'users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.', STARTKEY
=> '61364002', ENDKEY => '613930521', ENCODED =>
a0bf035ac417cdd0697464f1c48f387f, TABLE => {{NAME => 'users', MAX_FILESIZE
=> '130023424', FAMILIES => [{NAME => 'columns', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'LZO', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}, {NAME => 'extracted', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
'0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'protobuf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
'2', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'time', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION =>
'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]}}
2011-05-12 12:05:08,130 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Instantiated users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,977 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; next
sequenceid=208908217
2011-05-12 12:05:20,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:20,982 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
because Region has too many store files; priority=21, compaction queue
size=0
2011-05-12 12:05:20,982 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
in region .META.,,1 with server=smf1-afz-19-sr1.prod.twitter.com:60020,
startcode=1302734205220
2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; next
sequenceid=208908217
2011-05-12 12:05:20,985 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.:
disabling compactions & flushes
2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,986 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,987 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:23,392 INFO org.apache.hadoop.hbase.regionserver.HRegion:
completed compaction on region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. after 2sec
2011-05-12 12:40:44,829 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online:
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:40:45,862 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online:
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.

[...]

*on Master* : from grep '61364002,1297594642368' :
[...]
2011-05-12 12:03:54,786 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
state=CLOSED, ts=1305201834784
2011-05-12 12:03:54,790 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 114 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
[...]
2011-05-12 12:04:31,813 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. to
smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220
2011-05-12 12:04:39,333 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
[...]
2011-05-12 12:05:08,122 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
state=OPENING, ts=1305201871850
2011-05-12 12:05:08,122 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
for too long, reassigning
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:08,122 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:08,124 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 114 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
2011-05-12 12:05:08,124 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Successfully transitioned
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. into
OFFLINE and forcing a new assignment

Thanks for looking into this.

Raghu.

Reply via email to