Yes, it seems pretty strange.
You're right, there's so many logs, so I just picked off part of the logs.
I feel a hiding bug here. And I'm still working on it.

I feel the function of AssignmentManager#regionOnline has a problem:
1. ServerShutdownHandler will remove the server from the 
AssignmentManager.servers.
2. regionOnline will add the server into AssignmentManager.servers again.

What about a competition happened here?

  public void regionOnline(HRegionInfo regionInfo, HServerInfo serverInfo) {  
    synchronized (this.regionsInTransition) {
        ---------------
    }
    synchronized (this.regions) {
      // Add check
      ---------------
      addToServers(hsiWithoutLoad, regionInfo);
      this.regions.notifyAll();
    }
    // Remove plan if one.
    clearRegionPlan(regionInfo);
    // Update timers for all regions in transition going against this server.
    updateTimers(serverInfo);
  }
Thanks J-D.

Jieshan Bean



------------------------------------------------

As far as I can tell your server is alive because it opened the region
just fine:

Handling transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000,
region=612342de1fe4733f72299d70addb6d11
Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-11,20020,1308106004710,
region=612342de1fe4733f72299d70addb6d11
Handling transition=RS_ZK_REGION_OPENING,
server=167-6-1-11,20020,1308106004710,
region=612342de1fe4733f72299d70addb6d11
Handling transition=RS_ZK_REGION_OPENED,
server=167-6-1-11,20020,1308106004710,
region=612342de1fe4733f72299d70addb6d11

It has a different start code from when it was assigned (1308106004710
vs 1308105402003) which is pretty strange, and on top of that the line
with M_ZK_REGION_OFFLINE shows port 20000 instead of 20020.

I believe that this is only a small snippet of a bigger log, right? I
bet there is more we could learn in there, also it'd be nice if you
told us what was happening on your cluster at the same time,
understanding the context would help.

J-D

On Wed, Jun 22, 2011 at 2:31 AM, bijieshan <bijies...@huawei.com> wrote:
> 1.The Rs 167-6-1-11,20020,1308105402003 finished shutdown at "10:46:37,774":
>   10:46:37,774 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
> processing of shutdown of 167-6-1-11,20020,1308105402003
>
> 2.Overwriting happened, it seemed the RS was still exist in the set of 
> AssignmentManager#regions:
> 10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
> Overwriting 612342de1fe4733f72299d70addb6d11 on 
> serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, 
> usedHeap=0, maxHeap=0)
>
> 3.Region was assigned to this dead RS at "10:50:20,671":
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
> Assigning region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 
> 167-6-1-11,20020,1308105402003
>
> I'm still working on this issue. I hope anyone can give out any suggestions.
>
> Thanks!
>
> Jieshan Bean
>
>
> Here's the logs tracked the region of "612342de1fe4733f72299d70addb6d11":
>
> 10:39:51,467 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Async create of unassigned node for 
> 612342de1fe4733f72299d70addb6d11 with OFFLINE state
> 10:39:51,994 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
>  
> rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. 
> state=OFFLINE, ts=1308105591467, server=167-6-1-11,20020,1308105402003
> 10:39:52,182 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
>  
> rs=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. 
> state=OFFLINE, ts=1308105591467
> 10:41:43,316 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308105402003, 
> region=612342de1fe4733f72299d70addb6d11
> 10:43:22,938 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
> 10:43:22,938 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Deleting existing unassigned node for 
> 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
> 10:43:23,153 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Successfully deleted unassigned node for 
> region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
> 10:43:23,157 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 
> 167-6-1-11,20020,1308105402003
> 10:43:52,300 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 
> 2994 region(s) that 167-6-1-11,20020,1308105402003 was carrying (skipping 0 
> regions(s) that are already in transition)
> 10:45:54,914 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Creating (or updating) unassigned node for 
> 612342de1fe4733f72299d70addb6d11 with OFFLINE state
> 10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No 
> previous transition plan was found (or we are ignoring an existing plan) for 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so 
> generated a random one; 
> hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) 
> available servers
> 10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No 
> previous transition plan was found (or we are ignoring an existing plan) for 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. so 
> generated a random one; 
> hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  src=, dest=167-6-1-12,20020,1308105397174; 2 (online=2, exclude=null) 
> available servers
> 10:45:54,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
> Assigning region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 
> 167-6-1-12,20020,1308105397174
> 10:45:55,029 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=167-6-1-12,20020,1308105397174, 
> region=612342de1fe4733f72299d70addb6d11
> 10:45:55,063 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=167-6-1-12,20020,1308105397174, 
> region=612342de1fe4733f72299d70addb6d11
> 10:45:55,063 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
> 10:45:55,063 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Deleting existing unassigned node for 
> 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
> 10:45:55,072 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Successfully deleted unassigned node for 
> region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
> 10:45:55,081 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
> Overwriting 612342de1fe4733f72299d70addb6d11 on 
> serverName=167-6-1-11,20020,1308105402003, load=(requests=0, regions=0, 
> usedHeap=0, maxHeap=0)
> 10:45:55,081 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 
> 167-6-1-12,20020,1308105397174
> 10:46:37,774 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
> processing of shutdown of 167-6-1-11,20020,1308105402003
> 10:48:45,067 INFO org.apache.hadoop.hbase.master.HMaster: balance 
> hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
> 10:48:45,067 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting 
> unassignment of region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. 
> (offlining)
> 10:48:45,068 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent 
> CLOSE to serverName=167-6-1-12,20020,1308105397174, load=(requests=0, 
> regions=0, usedHeap=0, maxHeap=0) for region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
> 10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 
> (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
> 10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> new unassigned node: /hbase/unassigned/612342de1fe4733f72299d70addb6d11 
> (region=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  server=167-6-1-12,20020,1308105397174, state=RS_ZK_REGION_CLOSED)
> 10:48:45,439 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=167-6-1-12,20020,1308105397174, 
> region=612342de1fe4733f72299d70addb6d11
> 10:50:20,527 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 612342de1fe4733f72299d70addb6d11
> 10:50:20,580 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing 
> OFFLINE; 
> was=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.
>  state=CLOSED, ts=1308106125379
> 10:50:20,580 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Creating (or updating) unassigned node for 
> 612342de1fe4733f72299d70addb6d11 with OFFLINE state
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using 
> pre-existing plan for region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; 
> plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using 
> pre-existing plan for region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.; 
> plan=hri=Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11.,
>  src=167-6-1-12,20020,1308105397174, dest=167-6-1-11,20020,1308105402003
> 10:50:20,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
> Assigning region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. to 
> 167-6-1-11,20020,1308105402003
> 10:50:20,692 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=167-6-1-11:20000, 
> region=612342de1fe4733f72299d70addb6d11
> 10:50:20,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, 
> region=612342de1fe4733f72299d70addb6d11
> 10:50:20,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=167-6-1-11,20020,1308106004710, 
> region=612342de1fe4733f72299d70addb6d11
> 10:50:20,779 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=167-6-1-11,20020,1308106004710, 
> region=612342de1fe4733f72299d70addb6d11
> 10:51:33,245 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 612342de1fe4733f72299d70addb6d11; deleting unassigned node
> 10:51:33,245 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Deleting existing unassigned node for 
> 612342de1fe4733f72299d70addb6d11 that is in expected state RS_ZK_REGION_OPENED
> 10:51:33,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x330912836370000 Successfully deleted unassigned node for 
> region 612342de1fe4733f72299d70addb6d11 in expected state RS_ZK_REGION_OPENED
> 10:51:33,312 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> Jeason10,08058613800000030,1308032774777.612342de1fe4733f72299d70addb6d11. on 
> 167-6-1-11,20020,1308106004710
>

Reply via email to