We don't have appends enabled (CDH 2). hbase version is '0.90.1, r1069174'.
there does not seem to be any data loss.

thanks,
Raghu.

On Fri, May 13, 2011 at 12:34 AM, Stack <[email protected]> wrote:

> Raghu, are you running on an hdfs which has append?
> St.Ack
>
> On Fri, May 13, 2011 at 12:07 AM, Raghu Angadi <[email protected]> wrote:
> > This happened twice in two days:
> >
> > clients can't access one of the regions and consistently fails
> > with NotServingRegionException
> > "org.apache.hadoop.hbase.NotServingRegionException:
> > Region is not online:
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f." ...
> >
> > restarting the region server fixes this.
> >
> > portions from RS and master are below. Please let me know if more logs
> are
> > needed. we will continue to look into this more.
> >
> > *on RegionServer* :
> >
> >    seems to happen right after compaction. grep of  '6136400' on region
> > server smf1-afz-19-sr1 :
> > [...]
> > 011-05-12 12:05:08,125 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to
> open
> > region: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:08,126 DEBUG
> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> Processing
> > open of users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:08,127 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 114 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> > 2011-05-12 12:05:08,129 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Opening region: REGION => {NAME =>
> > 'users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.',
> STARTKEY
> > => '61364002', ENDKEY => '613930521', ENCODED =>
> > a0bf035ac417cdd0697464f1c48f387f, TABLE => {{NAME => 'users',
> MAX_FILESIZE
> > => '130023424', FAMILIES => [{NAME => 'columns', BLOOMFILTER => 'NONE',
> > REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'LZO', TTL =>
> > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > 'true'}, {NAME => 'extracted', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
> =>
> > '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647',
> BLOCKSIZE
> > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> > 'protobuf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> > '2', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536',
> > IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'time', BLOOMFILTER
> =>
> > 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION
> =>
> > 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> > BLOCKCACHE => 'true'}]}}
> > 2011-05-12 12:05:08,130 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Instantiated
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,977 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.;
> next
> > sequenceid=208908217
> > 2011-05-12 12:05:20,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:20,982 DEBUG
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > requested for
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > because Region has too many store files; priority=21, compaction queue
> > size=0
> > 2011-05-12 12:05:20,982 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Starting compaction on region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> > Updated row
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > in region .META.,,1 with server=smf1-afz-19-sr1.prod.twitter.com:60020,
> > startcode=1302734205220
> > 2011-05-12 12:05:20,984 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.;
> next
> > sequenceid=208908217
> > 2011-05-12 12:05:20,985 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:20,986 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closing users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.:
> > disabling compactions & flushes
> > 2011-05-12 12:05:20,986 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Updates disabled for region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,986 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,987 DEBUG
> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:23,392 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > completed compaction on region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. after 2sec
> > 2011-05-12 12:40:44,829 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > NotServingRegionException; Region is not online:
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:40:45,862 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > NotServingRegionException; Region is not online:
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> >
> > [...]
> >
> > *on Master* : from grep '61364002,1297594642368' :
> > [...]
> > 2011-05-12 12:03:54,786 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> > was=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > state=CLOSED, ts=1305201834784
> > 2011-05-12 12:03:54,790 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 114 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set
> watcher;
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> > [...]
> > 2011-05-12 12:04:31,813 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. to
> > smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220
> > 2011-05-12 12:04:39,333 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set
> watcher;
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > [...]
> > 2011-05-12 12:05:08,122 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:
>  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > state=OPENING, ts=1305201871850
> > 2011-05-12 12:05:08,122 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
> > for too long, reassigning
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:08,122 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:08,124 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 114 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set
> watcher;
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> > 2011-05-12 12:05:08,124 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Successfully
> transitioned
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> into
> > OFFLINE and forcing a new assignment
> >
> > Thanks for looking into this.
> >
> > Raghu.
> >
>

Reply via email to