[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623088#comment-16623088
 ] 

Allan Yang edited comment on HBASE-21213 at 9/21/18 5:28 AM:
-------------------------------------------------------------

+1 for the patch, It is my fault that I forgot that the RTP will attach to the 
RegionState. We should detach it when bypassing. Otherwise, bypassing is 
useless, since later procedure against this region will fail.
As for [~Apache9]'s concern,  I think it is safe to do so,  before bypassing, 
we already did a lot of check to ensure that there is no race condition, and 
bypassing is only for HBCK and experienced operators(not exposed to client 
directly). As in my original design, bypass should clean up all the mess.  If 
we don't do it in bypassing, we need introduce  another mechanism to do it. I 
don't want more 'hacking' mechanism.


was (Author: allan163):
+1 for the patch, It is my fault that I forgot that the RTP will attach to the 
RegionState before. We should detach it when bypassing. Otherwise, bypassing is 
useless, since later procedure against this region will fail.
As for [~Apache9]'s concern,  I think it is safe to do so,  before bypassing, 
we already did a lot of check to ensure that there is no race condition, and 
bypassing is only for HBCK and experienced operators(not exposed to client 
directly). As in my original design, bypass should clean up all the mess.  If 
we don't do it in bypassing, we need introduce  another mechanism to do it. I 
don't want more 'hacking' mechanism.

> [hbck2] Need more cleanup needed on bypass; old Procedure left in 
> RegionStateNodes
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-21213
>                 URL: https://issues.apache.org/jira/browse/HBASE-21213
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2, hbck2
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>             Fix For: 2.1.1
>
>         Attachments: HBASE-21213.branch-2.1.001.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to