[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657809#comment-16657809
 ] 

Hudson commented on HBASE-21291:


Results for branch branch-2.1
[build #496 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-20 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657801#comment-16657801
 ] 

Hudson commented on HBASE-21291:


Results for branch branch-2.0
[build #979 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656211#comment-16656211
 ] 

Hudson commented on HBASE-21291:


Results for branch branch-2.1
[build #486 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656181#comment-16656181
 ] 

Hudson commented on HBASE-21291:


Results for branch branch-2.0
[build #971 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655396#comment-16655396
 ] 

stack commented on HBASE-21291:
---

bq. Thus if the procedure is stuck at a state (like stuck in a while loop), 
bypassing will not work until we restart the master and submit the procedure 
again.

How we let operator know they need to restart Master? How we let operator know 
the Procedure is stuck? On complete of Procedure, why we don't release the 
lock? Let me read code again. Thanks.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655381#comment-16655381
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack]
I think if we need restart master depends on if the procedure could be 
executed again. Since bypassing is actually setting a flag of the procedure and 
then when PE execute this procedure, it will return null immediately to finish 
the procedure. Thus if the procedure is stuck at a state (like stuck in a while 
loop), bypassing will not work until we restart the master and submit the 
procedure again.
Thus if we cannot grab a lock for a procedure, it is mostly like already 
stuck at a state. Restarting master will be necessary.



> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655218#comment-16655218
 ] 

stack commented on HBASE-21291:
---

[~allan163] HBASE-21335 is to make a change in another repo, the hbck2 tool 
repo.

[~tianjingyun]
bq. ...Thus restarting master is needed to resolve the problem.

Because the bypass finishes the procedure but the lock is still held? On 
restart, we will find the finished procedure and will not reinstitute its lock?

Restarting the Master twice last night 'fixed' my stuck lock problem.

Do we have to restart Master to undo the locks? My impression last night was 
that the Procedure would re-run after it had been bypassed, the PE would notice 
it 'finished' and then it would release its locks. How is this not the case? 
Thanks.



> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655140#comment-16655140
 ] 

Allan Yang commented on HBASE-21291:


Is HBASE-21335 filed to resolve the problem here?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-18 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654846#comment-16654846
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] I add the condition check to all procedures, not only the state 
machine procedure. 
{quote}
So, if override is set, make the waitTime some nominal amount – say 10ms? This 
way we wait on the lock for a little while but will proceed after 10ms even if 
we don't get the lock?
{quote}
Yes, it will wait 10ms to try to get lock. Then if we didn't get the lock but 
override is set, the bypass will be processed however. But the lock is released 
only when the stuck procedure finished.
{quote}
 finally {
  if (lockEntry != null) {
procExecutionLock.releaseLockEntry(lockEntry);
  }
}
{quote}
Thus restarting master is needed to resolve the problem.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654628#comment-16654628
 ] 

stack commented on HBASE-21291:
---

So, if override is set, make the waitTime some nominal amount -- say 10ms? This 
way we wait on the lock for a little while but will proceed after 10ms even if 
we don't get the lock?

So, doing bypass on an AssignProcedure, I see that it reports this:

2424243 2341740 RUNNABLE(Bypass)

... but it still has exclusive lock held.

{code}
REGION: 2651cb48574979f2dccc64e3c02ad5e0

Lock type: EXCLUSIVE

Owner procedure: { ID => '2424243', PARENT_ID => '2341740', STATE => 
'RUNNABLE', OWNER => 'hbase', TYPE => 'AssignProcedure 
table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0', START_TIME => 'Wed Oct 17 12:41:37 
PDT 2018', LAST_UPDATE => 'Wed Oct 17 15:29:53 PDT 2018', PARAMETERS => [ { 
transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => 
'1534357375831', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 
'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdF8yMDE4MDgxNTEwNDA0NA==' }, startKey => 
'Np+lMnWysqQ=', endKey => 'NqGeiPHP0F8=', offline => 'false', split => 'false', 
replicaId => '0' } } ] }
{code}

Maybe the lock is held because we are not running running this Procedure...  PE 
is jammed up?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654621#comment-16654621
 ] 

stack commented on HBASE-21291:
---

Also interesting is that I must pass waitTime of non-zero even when trying to 
bypass an Assign Procedure even though it is not a state machine procedure.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654619#comment-16654619
 ] 

stack commented on HBASE-21291:
---

Is it possible that bypass no longer works. IIRC, I could bypass a stuck 
Assign... now it does this but it stays stuck. Says it is bypassed but lock is 
still held:

{code}
2018-10-17 21:12:21,051 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Begin bypass pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0 with lockWait=1000, override=true, 
recursive=true
2018-10-17 21:12:21,051 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0
2018-10-17 21:12:21,260 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2341740, 
state=WAITING:SERVER_CRASH_HANDLE_RIT2, bypass=LOG-REDACTED 
ServerCrashProcedure server=vb1406.halxg.cloudera.com,22101,1539750561781, 
splitWal=true, meta=false
2018-10-17 21:12:21,386 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, 
ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED 
AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, 
region=2651cb48574979f2dccc64e3c02ad5e0 and its ancestors successfully, adding 
to queue
{code}

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654608#comment-16654608
 ] 

stack commented on HBASE-21291:
---

0 means wait forever I suppose? But I don't want to wait at all (especially if 
I have 10k regions that need bypassing). What should we do in this case?

I also notice that hbck2 calls this param waitTime but exception says lockWait. 
I need to make them match.

Thanks.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654561#comment-16654561
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] hbase-operator-tools set the lockwait to 0 if we don't set any value 
for it. Let me change the default value for lockWait of hbase-operator-tools?
{code}
long waitTime = 0;
if (commandLine.hasOption(wait.getOpt())) {
  waitTime = Integer.valueOf(commandLine.getOptionValue(wait.getOpt()));
  waitTime *= 1000; // Because time is in seconds.
}
{code}

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654551#comment-16654551
 ] 

stack commented on HBASE-21291:
---

i.e. previously, I did not have to pass a lockWait value...

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654548#comment-16654548
 ] 

stack commented on HBASE-21291:
---

[~tianjingyun] With this patch applied,  now when I do bypass it does the 
below

{code}
18/10/17 19:36:20 ERROR client.HBaseHbck: 2441732
org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: lockWait should be positive
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.lang.IllegalArgumentException: lockWait should be positive
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1050)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1043)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.bypassProcedure(MasterRpcServices.java:2421)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
... 3 more

at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.bypassProcedure(MasterProtos.java)
at org.apache.hadoop.hbase.client.HBaseHbck$1.call(HBaseHbck.java:145)
at org.apache.hadoop.hbase.client.HBaseHbck$1.call(HBaseHbck.java:141)
at 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.call(ProtobufUtil.java:2945)
at 
org.apache.hadoop.hbase.client.HBaseHbck.bypassProcedure(HBaseHbck.java:140)
at org.apache.hbase.HBCK2.bypass(HBCK2.java:183)
at org.apache.hbase.HBCK2.run(HBCK2.java:342)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hbase.HBCK2.main(HBCK2.java:389)
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
java.io.IOException: lockWait should be positive
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: java.lang.IllegalArgumentException: lockWait should be positive
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1050)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1043)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.bypassProcedure(MasterRpcServices.java:2421)
...
{code}

That what you expect sir?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
>  

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653509#comment-16653509
 ] 

Hudson commented on HBASE-21291:


Results for branch master
[build #551 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/551/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/551//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/551//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/551//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-16 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652720#comment-16652720
 ] 

stack commented on HBASE-21291:
---

Let me test and see if we should pull it back further, into 2.0 and 2.1.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652661#comment-16652661
 ] 

Hudson commented on HBASE-21291:


Results for branch branch-2
[build #1399 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-16 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651776#comment-16651776
 ] 

Allan Yang commented on HBASE-21291:


Sure, can you give me your E-mail, so I can include you name and E-mail in the 
commit message

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-16 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651476#comment-16651476
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] Could you help commit this patch?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-14 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649695#comment-16649695
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~allan163] Could you help commit this patch?

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648902#comment-16648902
 ] 

Hadoop QA commented on HBASE-21291:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
25s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943764/HBASE-21291.master.005.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 702b3cfe2db5 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7464e2ef9d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14688/testReport/ |
| Max. process+thread count | 283 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14688/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add a test 

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648834#comment-16648834
 ] 

Hadoop QA commented on HBASE-21291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
48s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 13 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  0m 
10s{color} | {color:red} patch has 7 errors when building our shaded downstream 
artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 29s{color} 
| {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
46s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943746/HBASE-21291.master.005.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 4045ce499342 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build 

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648753#comment-16648753
 ] 

stack commented on HBASE-21291:
---

Ok. Let me try testing it on cluster. Thanks.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch, HBASE-21291.master.005.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648751#comment-16648751
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] Yes, I think pass one second or a certain number should be good. 
Otherwise HBCK will get stuck, that will make user confusing.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648750#comment-16648750
 ] 

stack commented on HBASE-21291:
---

Sorry for the dumb questions, so problem is that we've been passing '0' so we 
wait for ever?

Idea is that instead, when bypassing, we'd pass one second or something?

Thanks [~tianjingyun]

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648745#comment-16648745
 ] 

Hadoop QA commented on HBASE-21291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
37s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hbase-procedure: The patch generated 13 new + 13 
unchanged - 0 fixed = 26 total (was 13) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
30s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
12s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943742/HBASE-21291.master.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 16a568ae067b 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7464e2ef9d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14682/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14682/testReport/ |
| Max. process+thread count | 268 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| 

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648739#comment-16648739
 ] 

Jingyun Tian commented on HBASE-21291:
--

{code}
IdLock.Entry lockEntry = 
procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
if (lockEntry == null && !force) {
  LOG.debug("Waited {} ms, but {} is still running, skipping bypass with 
force={}",
  lockWait, procedure, force);
  return false;
} else if (lockEntry == null) {
  LOG.debug("Waited {} ms, but {} is still running, begin bypass with 
force={}",
  lockWait, procedure, force);
}
{code}
[~stack] For a stuck procedure, set the force flag to true will skip grabbing 
the lock. But lockWait must be > 0 otherwise tryLockEntry will wait for the 
lock forever. Thus my simple fix is add a condition check that lockWait should 
be positive.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648735#comment-16648735
 ] 

stack commented on HBASE-21291:
---

Is there a fix here [~tianjingyun]? If bypass is called what happens? Thanks.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648730#comment-16648730
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] Just uploaded the fixed one. Sorry for the late.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, 
> HBASE-21291.master.004.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648216#comment-16648216
 ] 

stack commented on HBASE-21291:
---

Is the fix in the latest patch? If so, I don't see it (Pardon my blindness).

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647751#comment-16647751
 ] 

Allan Yang commented on HBASE-21291:



{quote}
+Preconditions.checkArgument(lockWait > 0 || (lockWait == 0 && force == 
false),
+  "if force is true, lockWait must be greater than 0, or lockWait >= 0");
{quote}
Just be simple, check lockWait > 0 is enough.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647700#comment-16647700
 ] 

Hadoop QA commented on HBASE-21291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
14s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hbase-procedure: The patch generated 13 new + 13 
unchanged - 0 fixed = 26 total (was 13) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
12s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
5s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943611/HBASE-21291.master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 76214b97faf1 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 9e9a1e0f0d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14669/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14669/testReport/ |
| Max. process+thread count | 316 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console 

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-12 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647579#comment-16647579
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] We can skip getting lock by set force to true. But the lockWait time 
need to be positive, otherwise it will wait forever.
[~allan163] I've got code formatted. And I add a condition check before we do 
bypass:
{code}
Preconditions.checkArgument(lockWait > 0 || (lockWait == 0 && force == 
false));
{code}
Because if lockWait is 0 then force is true, it will still wait for the lock 
forever, which will make people confusing.
Please check this out.


> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch, 
> HBASE-21291.master.002.patch, HBASE-21291.master.003.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647484#comment-16647484
 ] 

Hadoop QA commented on HBASE-21291:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
 2s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
17s{color} | {color:red} hbase-procedure: The patch generated 3 new + 3 
unchanged - 0 fixed = 6 total (was 3) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
10s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
16m 19s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
31s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21291 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943590/HBASE-21291.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 05fdeb462b03 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 9e9a1e0f0d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14664/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14664/testReport/ |
| Max. process+thread count | 287 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output 

[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-11 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647466#comment-16647466
 ] 

stack commented on HBASE-21291:
---

[~tianjingyun] Great. What [~allan163] said.

How we fix this so StateMachineProcedures are bypassable (I think I've seen 
this issue in the wild -- but not sure.. unable to bypass a DisableTable...). 
If bypass and override skip getting lock? (I've not done the study...).

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-11 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647450#comment-16647450
 ] 

Allan Yang commented on HBASE-21291:


{code}
+ProcedureTestingUtility.restart(procExecutor);
{code}
For this case, restart is not needed.
Some line is way too long, you can modify your patch after checkstyle results 
come out.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures

2018-10-11 Thread Jingyun Tian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647345#comment-16647345
 ] 

Jingyun Tian commented on HBASE-21291:
--

[~stack] A patch contains stuck state machine procedure is uploaded. pls check 
it out.

> Add a test for bypassing stuck state-machine procedures
> ---
>
> Key: HBASE-21291
> URL: https://issues.apache.org/jira/browse/HBASE-21291
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Attachments: HBASE-21291.master.001.patch
>
>
> {code}
>   if (!procedure.isFailed()) {
> if (subprocs != null) {
>   if (subprocs.length == 1 && subprocs[0] == procedure) {
> // Procedure returned itself. Quick-shortcut for a state 
> machine-like procedure;
> // i.e. we go around this loop again rather than go back out on 
> the scheduler queue.
> subprocs = null;
> reExecute = true;
> LOG.trace("Short-circuit to next step on pid={}", 
> procedure.getProcId());
>   } else {
> // Yield the current procedure, and make the subprocedure runnable
> // subprocs may come back 'null'.
> subprocs = initializeChildren(procStack, procedure, subprocs);
> LOG.info("Initialized subprocedures=" +
>   (subprocs == null? null:
> Stream.of(subprocs).map(e -> "{" + e.toString() + "}").
> collect(Collectors.toList()).toString()));
>   }
> } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) {
>   LOG.debug("Added to timeoutExecutor {}", procedure);
>   timeoutExecutor.add(procedure);
> } else if (!suspended) {
>   // No subtask, so we are done
>   procedure.setState(ProcedureState.SUCCESS);
> }
>   }
> {code}
> Currently implementation of ProcedureExecutor will set the reExcecute to true 
> for state machine like procedure. Then if this procedure is stuck at one 
> certain state, it will loop forever.
> {code}
>   IdLock.Entry lockEntry = 
> procExecutionLock.getLockEntry(proc.getProcId());
>   try {
> executeProcedure(proc);
>   } catch (AssertionError e) {
> LOG.info("ASSERT pid=" + proc.getProcId(), e);
> throw e;
>   } finally {
> procExecutionLock.releaseLockEntry(lockEntry);
> {code}
> Since procedure will get the IdLock and release it after execution done, 
> state machine procedure will never release IdLock until it is finished.
> Then bypassProcedure doesn't work because is will try to grab the IdLock at 
> first.
> {code}
> IdLock.Entry lockEntry = 
> procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)