[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582444#comment-16582444
 ] 

Hudson commented on HBASE-21050:


Results for branch master
[build #436 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/436/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/436//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/436//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/436//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581968#comment-16581968
 ] 

Hudson commented on HBASE-21050:


Results for branch branch-2.0
[build #683 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/683//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-15 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581933#comment-16581933
 ] 

Hudson commented on HBASE-21050:


Results for branch branch-2.1
[build #195 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/195//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.0.2, 2.1.1
>
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-15 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581702#comment-16581702
 ] 

stack commented on HBASE-21050:
---

Ok. Test is hard. All is happening down inside load procedures. The lock is 
getting restored post crash and the child is being marked completed because it 
is 'finished' ... so it is not being rescheduled. Messing, trying to test this, 
the child procedure evaporates before I can get a hold on it. I had various 
attempts at an 'entity' lock that had a lifecycle independent of Procedure but 
what is wanted is exercising the locking we do inside the 
MasterProcedureScheduler where it, an independent entity, has special mechanism 
for keeping up region locks. Building up a test case that has 
MasterProcedureScheduler at its core with Region entities would be a good bit 
of work. I'm passing on it for now.

Let me commit this patch. At the very least, after being in here a while, patch 
makes even more sense.

> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-14 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580710#comment-16580710
 ] 

stack commented on HBASE-21050:
---

[~allan163] Let me give it a go then... will be back soon.

> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-14 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580707#comment-16580707
 ] 

Allan Yang commented on HBASE-21050:


Yes, [~stack] you are right, if the master was killed just between 
updateStoreOnExec and release lock, the problem will show up. But, the 
procedure need to be a child procedure, other wise, it will be marked as 
completed when master restarts(Only root procedure can be treat as finished). 

> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-14 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580691#comment-16580691
 ] 

stack commented on HBASE-21050:
---

Is this a failure near the end of PE#execProcedure

{code}
...

// Submit the new subprocedures
if (subprocs != null && !procedure.isFailed()) {
  submitChildrenProcedures(subprocs);
}

// <<<= BEFORE HERE

// we need to log the release lock operation before waking up the parent 
procedure, as there
// could be race that the parent procedure may call updateStoreOnExec ahead 
of us and remove all
// the sub procedures from store and cause problems...
releaseLock(procedure, false);

// if the procedure is complete and has a parent, count down the children 
latch.
// If 'suspended', do nothing to change state -- let other threads handle 
unsuspend event.
if (!suspended && procedure.isFinished() && procedure.hasParent()) {
  countDownChildren(procStack, procedure);
}
{code}

... so child of parent has completed, SUCCESS, and we are exiting the execution 
of the child... on our way out about to release the lock and then call 
countDownChildren which makes the parent RUNNABLE again BUT we fail after child 
completes but BEFORE we get to the release lock?

If so, I can make a test for this. The machinery added to test HBASE-20978 will 
work for here. Let me know if you think this whats up [~allan163] and I'll give 
the test a go.

> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-14 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580569#comment-16580569
 ] 

Duo Zhang commented on HBASE-21050:
---

+1. Let's add a UT?

> Exclusive lock may be held by a SUCCESS state procedure forever
> ---
>
> Key: HBASE-21050
> URL: https://issues.apache.org/jira/browse/HBASE-21050
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21050.branch-2.0.001.patch
>
>
> After HBASE-20846, we restore lock info for procedures. But, there is a case 
> that the lock and be held by a already success procedure. Since the procedure 
> won't execute again, the lock will held by the procedure forever.
> 1. All children for pid=1208 had been finished, but before procedure 1208 
> awake, the master was killed
> {code}
> 2018-08-05 02:20:14,465 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, 
> ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure 
> hri=c2a23a735f16df57299
> dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034; resume parent processing.
> 2018-08-05 02:20:14,466 INFO  [PEWorker-8] 
> procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, 
> state=SUCCESS, hasLock=false; AssignProcedure 
> table=IntegrationTestBigLinkedList, region=c2a
> 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 
> in 1.5060sec
> {code}
> 2. Master restarts, since procedure 1208 held the lock before restart, so the 
> lock was resotore for it
> {code}
> 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
> Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; 
> MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source=
> e010125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034
> 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): 
> pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
> hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
> a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
> the lock before restarting, call acquireLock to restore it.
> 2018-08-05 02:20:30,818 INFO  [Thread-15] 
> procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
> hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
> source=e0
> 10125050127.bja,60020,1533403109034, 
> destination=e010125050127.bja,60020,1533403109034 checking lock on 
> c2a23a735f16df57299dba6fd4599f2f
> {code}
> 3. Since procedure 1208 is success, it won't execute later, so the lock will 
> be held by it forever
> We need to check the state of the procedure before restoring locks, if the 
> procedure is already finished (success or rollback), we do not need to 
> acquire lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580031#comment-16580031
 ] 

Hadoop QA commented on HBASE-21050:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
38s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 8s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
25s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
10s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
41s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21050 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935568/HBASE-21050.branch-2.0.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c908623e0d76 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 8435f2bc72 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14039/testReport/ |
| Max. process+thread count | 263 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output |