(please don't leave unrelated discussions at the tail of your emails) So I thought I never got that issue but wanted to make sure so I grepped my logs and indeed saw I got it, so I what I did is that I grepped the name of one of the regions that got the issue and looked at what was happening at that time (which you should do in the future). I see something like this:
2011-04-05 15:12:19,037 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x42ec2cece810b68 Retrieved 115 byte(s) of data from znode /prodjobs/unassigned/0db7d1f58e4fced0a371aded0ddec281 and set watcher; region=tsdb,�M<��,1297818092053.0db7d1f58e4fced0a371aded0ddec281., server=sv4borg36,60020,1300313562191, state=RS_ZK_REGION_OPENED 2011-04-05 15:12:19,037 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg36,60020,1300313562191, region=0db7d1f58e4fced0a371aded0ddec281 ... 2011-04-05 15:12:19,585 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: tsdb,\x00\x03\xCBM<}\x08\x00\x00\x01\x00\x00\x8A\x00\x00\x1D\x00\x01\xD1,1297818092053.0db7d1f58e4fced0a371aded0ddec281. state=OPEN, ts=1302041472920 2011-04-05 15:12:19,585 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything ... 2011-04-05 15:12:22,504 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 0db7d1f58e4fced0a371aded0ddec281; deleting unassigned node So if I understand this correctly, the master already got the message via ZooKeeper but it stayed in a queue for just long enough that the RIT timed out and finally the OpenedRegionHandler was able to process it. So in the end nothing looks broken, it just means that the master is processing a LOT of regions being opened, while it also took the region server a long time to get the region opened. There are currently a few states that don't get refreshed in ZK, for example when a region is sitting in the region server's queue of regions to be opened. Very often, when there's a lot of regions to open (and worse if the RS has to replay recovered edits), some regions in that state will timeout. This needs more thinking. J-D 2011/4/13 Gaojinchao <[email protected]>: > In hbase version 0.90.1 . > > Is there any experience ? > > Hmaster Logs : > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,384 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything > 2011-04-08 16:33:09,385 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for > too long, we don't know where region was opened so can't do anything >
