That sounds reasonable Jieshan.   Would you mind filing an issue
referring to this mail thread?  If you have a patch, that'd be
excellent.
St.Ack

2011/5/23 bijieshan <[email protected]>:
> There's 2 references about assignRoot():
>
> 1.
> HMaster# assignRootAndMeta:
>
>    if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>      this.assignmentManager.assignRoot();
>      this.catalogTracker.waitForRoot();
>      assigned++;
>    }
>
> 2.
> ServerShutdownHandler# process:
>
>      if (isCarryingRoot()) { // -ROOT-
>        try {
>           this.services.getAssignmentManager().assignRoot();
>        } catch (KeeperException e) {
>           this.server.abort("In server shutdown processing, assigning root", 
> e);
>           throw new IOException("Aborting", e);
>        }
>      }
>
> I think each time call the method of assignRoot(), we should verify Root 
> Region's Location first. Because before the assigning, the ROOT region could 
> have been assigned by another place.
> Expecting for anyone's reply.
>
> Thanks!
>
> Regards,
> Jieshan Bean
>
>
> -----邮件原件-----
> 发件人: bijieshan [mailto:[email protected]]
> 发送时间: 2011年5月20日 15:34
> 收件人: [email protected]
> 抄送: Chenjian
> 主题: ROOT region appeared in two regionserver's onlineRegions at the same time
>
> This could be happen under the following steps with little probability:
> (I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 
> 10,000 regions in the cluster)
>
> 1.Root region was opened in RS1.
> 2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
> 3.ServerShutdownHandler process start.
> 4.HMaster was restarted, during the finishInitialization's handling, ROOT 
> region was unsetted, and assigned to RS2.
> 5.Root region was opened successfully in RS2.
> 6.But after while, ROOT region was unsetted again by RS1's 
> ServerShutdownHandler. Then it was reassigned. Before that, the RS1 was 
> restarted. So there's two possibilities:
>  Case a:
>   ROOT region was assigned to RS1.
>   It seemed nothing would be affected. But the root region was still online 
> in RS2.
>
>  Case b:
>   ROOT region was assigned to RS2.
>   The ROOT Region couldn't be opened until it would be reassigned to other 
> regionserver, because it was showed online in this regionserver.
>
> This could be proved from the logs:
>
> 1. ROOT region was opened with two times:
> 2011-05-17 10:32:59,188 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
> 2011-05-17 10:33:01,536 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212
>
> 2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, 
> but already online on this server:
> 10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: 
> Received request to open region: -ROOT-,,0.70236052
> 10:49:30,920 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
> open of -ROOT-,,0.70236052
> 10:49:30,920 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
> open of -ROOT-,,0.70236052 but already online on this server
>
> This could be cause a long break of ROOT region offline, though it happened 
> under a special scenario. And I have checked the code, it seems a tiny bug 
> here.
>
> Thanks!
>
> Regards,
> Jieshan Bean
>
>
>
>
>
>

Reply via email to