[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582444#comment-16582444 ] Hudson commented on HBASE-21050: Results for branch master [build #436 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/436/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/436//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/436//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/436//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581968#comment-16581968 ] Hudson commented on HBASE-21050: Results for branch branch-2.0 [build #683 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581933#comment-16581933 ] Hudson commented on HBASE-21050: Results for branch branch-2.1 [build #195 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581702#comment-16581702 ] stack commented on HBASE-21050: --- Ok. Test is hard. All is happening down inside load procedures. The lock is getting restored post crash and the child is being marked completed because it is 'finished' ... so it is not being rescheduled. Messing, trying to test this, the child procedure evaporates before I can get a hold on it. I had various attempts at an 'entity' lock that had a lifecycle independent of Procedure but what is wanted is exercising the locking we do inside the MasterProcedureScheduler where it, an independent entity, has special mechanism for keeping up region locks. Building up a test case that has MasterProcedureScheduler at its core with Region entities would be a good bit of work. I'm passing on it for now. Let me commit this patch. At the very least, after being in here a while, patch makes even more sense. > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580710#comment-16580710 ] stack commented on HBASE-21050: --- [~allan163] Let me give it a go then... will be back soon. > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580707#comment-16580707 ] Allan Yang commented on HBASE-21050: Yes, [~stack] you are right, if the master was killed just between updateStoreOnExec and release lock, the problem will show up. But, the procedure need to be a child procedure, other wise, it will be marked as completed when master restarts(Only root procedure can be treat as finished). > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580691#comment-16580691 ] stack commented on HBASE-21050: --- Is this a failure near the end of PE#execProcedure {code} ... // Submit the new subprocedures if (subprocs != null && !procedure.isFailed()) { submitChildrenProcedures(subprocs); } // <<<= BEFORE HERE // we need to log the release lock operation before waking up the parent procedure, as there // could be race that the parent procedure may call updateStoreOnExec ahead of us and remove all // the sub procedures from store and cause problems... releaseLock(procedure, false); // if the procedure is complete and has a parent, count down the children latch. // If 'suspended', do nothing to change state -- let other threads handle unsuspend event. if (!suspended && procedure.isFinished() && procedure.hasParent()) { countDownChildren(procStack, procedure); } {code} ... so child of parent has completed, SUCCESS, and we are exiting the execution of the child... on our way out about to release the lock and then call countDownChildren which makes the parent RUNNABLE again BUT we fail after child completes but BEFORE we get to the release lock? If so, I can make a test for this. The machinery added to test HBASE-20978 will work for here. Let me know if you think this whats up [~allan163] and I'll give the test a go. > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580569#comment-16580569 ] Duo Zhang commented on HBASE-21050: --- +1. Let's add a UT? > Exclusive lock may be held by a SUCCESS state procedure forever > --- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever
[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580031#comment-16580031 ] Hadoop QA commented on HBASE-21050: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 38s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 8s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 25s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 10s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21050 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935568/HBASE-21050.branch-2.0.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux c908623e0d76 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 8435f2bc72 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14039/testReport/ | | Max. process+thread count | 263 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output |