Thanks for your reply.

Hbase version : 0.90.1

I find other abnormal logs. 
Hmaster receives twice RS_ZK_REGION_OPENED event. 
It reproduces in balance processing.
But I can't root out what brings on ?

Hmaster logs:
2011-04-15 14:54:21,885 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., 
src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
2011-04-15 14:54:21,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. 
(offlining)
2011-04-15 14:54:21,886 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to serverName=C4C3.site,60020,1302849920122, load=(requests=0, 
regions=588, usedHeap=122, maxHeap=8175) for region 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling new unassigned node: 
/hbase/unassigned/89e2c4452de2864bf3362a8f16e758d0 
(region=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0., 
server=C4C3.site,60020,1302849920122, state=RS_ZK_REGION_CLOSED)
2011-04-15 14:54:25,333 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, server=C4C3.site,60020,1302849920122, 
region=89e2c4452de2864bf3362a8f16e758d0
2011-04-15 14:55:14,364 DEBUG 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
event for 89e2c4452de2864bf3362a8f16e758d0
2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. 
state=CLOSED, ts=1302850513941
2011-04-15 14:55:14,380 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x2f57e8192e0000 Creating (or updating) unassigned node for 
89e2c4452de2864bf3362a8f16e758d0 with OFFLINE state
2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; 
plan=hri=ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.,
 src=C4C3.site,60020,1302849920122, dest=C4C5.site,60020,1302849920117
2011-04-15 14:55:14,614 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. to 
C4C5.site,60020,1302849920117
2011-04-15 14:55:14,705 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=C4C5.site,60020,1302849920117, 
region=89e2c4452de2864bf3362a8f16e758d0

2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, 
region=89e2c4452de2864bf3362a8f16e758d0

//repetition
2011-04-15 14:55:14,714 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=C4C5.site,60020,1302849920117, 
region=89e2c4452de2864bf3362a8f16e758d0

// Hmaster abnormal logs 
2011-04-15 14:55:14,715 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region 89e2c4452de2864bf3362a8f16e758d0 from server 
C4C5.site,60020,1302849920117 but region was in  the state 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. 
state=OPEN, ts=1302850514694 and not in expected PENDING_OPEN or OPENING states

2011-04-15 14:55:42,362 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for 89e2c4452de2864bf3362a8f16e758d0; deleting unassigned node
2011-04-15 14:55:42,362 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x2f57e8192e0000 Deleting existing unassigned node for 
89e2c4452de2864bf3362a8f16e758d0 that is in expected state RS_ZK_REGION_OPENED
2011-04-15 14:55:42,392 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x2f57e8192e0000 Successfully deleted unassigned node for region 
89e2c4452de2864bf3362a8f16e758d0 in expected state RS_ZK_REGION_OPENED
2011-04-15 14:55:42,411 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. on 
C4C5.site,60020,1302849920117


Region server logs:
2011-04-15 14:55:14,603 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
region: ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:55:14,617 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open 
of ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:55:14,617 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x2f57e8192e0004 Attempting to transition node 
89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_OPENING
2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 
89e2c4452de2864bf3362a8f16e758d0 from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_OPENING
2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Opening region: REGION => {NAME => 
'ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.', 
STARTKEY => '1088613823076918', ENDKEY => '1088613826923071', ENCODED => 
89e2c4452de2864bf3362a8f16e758d0, TABLE => {{NAME => 'ufdrA', FAMILIES => 
[{NAME => 'value', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => 
'3', COMPRESSION => 'GZ', TTL => '86400', BLOCKSIZE => '65536', IN_MEMORY => 
'false', BLOCKCACHE => 'true'}]}}
2011-04-15 14:55:14,644 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Instantiated 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.
2011-04-15 14:55:14,648 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.; 
next sequenceid=1
2011-04-15 14:55:14,648 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x2f57e8192e0004 Attempting to transition node 
89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENING
2011-04-15 14:55:14,677 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 
89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENING
2011-04-15 14:55:14,680 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Updated row 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0. in 
region .META.,,1 with server=C4C5.site:60020, startcode=1302849920117
2011-04-15 14:55:14,680 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x2f57e8192e0004 Attempting to transition node 
89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENED
2011-04-15 14:55:14,702 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x2f57e8192e0004 Successfully transitioned node 
89e2c4452de2864bf3362a8f16e758d0 from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENED
2011-04-15 14:55:14,702 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
ufdrA,1088613823076918,1301646236458.89e2c4452de2864bf3362a8f16e758d0.



-----邮件原件-----
发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans
发送时间: 2011年4月15日 3:20
收件人: [email protected]
主题: Re: a lots of error about Region has been OPEN for too long

Taking a more deeper look at the code, the assumption I had that the
message was sitting in the queue for too long is erroneous. The reason
is that when the master says "Handling
transition=RS_ZK_REGION_OPENED", it updates the timestamp that we use
to keep track of the timeout which means that unless it stayed more
than 60 seconds in the queue (it took only 3 here it seems), the issue
is elsewhere.

So logically the other reason would be that the master got the
information too late, since we use the timestamp provided by the
region server. Looking in the region server log, I was able to find:

2011-04-05 15:11:12,946 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x22e286f0b9c98f7-0x22e286f0b9c98f7 Successfully
transitioned node 0db7d1f58e4fced0a371aded0ddec281 from
RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED

This is 1 minute 7 seconds before the master got the message (I
verified that our clocks are in perfect sync). This means this is
another queue that's too slow (the ones that processes all the events
coming from zookeeper).

I'll keep digging.

J-D

On Thu, Apr 14, 2011 at 10:58 AM, Jean-Daniel Cryans
<[email protected]> wrote:
> (please don't leave unrelated discussions at the tail of your emails)
>
> So I thought I never got that issue but wanted to make sure so I
> grepped my logs and indeed saw I got it, so I what I did is that I
> grepped the name of one of the regions that got the issue and looked
> at what was happening at that time (which you should do in the
> future). I see something like this:
>
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from
> znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set
> watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281.,
> server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED
> 2011-04-05 15:12:19,037 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191,
> region=0db7d1f58e4fced0a371aded0ddec281
> ...
> 2011-04-05 15:12:19,585 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in
> transition timed out:
> tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281.
> state=OPEN, ts=1302041472920
> 2011-04-05 15:12:19,585 ERROR
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN
> for too long, we don't know where region was opened so can't do
> anything
> ...
> 2011-04-05 15:12:22,504 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling
> OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned
> node
>
>
> So if I understand this correctly, the master already got the message
> via ZooKeeper but it stayed in a queue for just long enough that the
> RIT timed out and finally the OpenedRegionHandler was able to process
> it. So in the end nothing looks broken, it just means that the master
> is processing a LOT of regions being opened, while it also took the
> region server a long time to get the region opened.
>
> There are currently a few states that don't get refreshed in ZK, for
> example when a region is sitting in the region server's queue of
> regions to be opened. Very often, when there's a lot of regions to
> open (and worse if the RS has to replay recovered edits), some regions
> in that state will timeout. This needs more thinking.
>
> J-D
>
> 2011/4/13 Gaojinchao <[email protected]>:
>> In hbase version 0.90.1 .
>>
>> Is there any experience ?
>>
>> Hmaster Logs :
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,384 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>> 2011-04-08 16:33:09,385 ERROR 
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for 
>> too long, we don't know where region was opened so can't do anything
>>
>

Reply via email to