[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646891#comment-16646891 ] stack commented on HBASE-21291: --- Thanks [~tianjingyun]. Can you manufacture the state in a test? Thank you. > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646433#comment-16646433 ] Jingyun Tian commented on HBASE-21291: -- I got your point. Setting force to true can solve the problem. But bypass field cannot set to true since the IdLock is never released during the while loop. > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646332#comment-16646332 ] Allan Yang commented on HBASE-21291: {quote} * @param force if force set to true, we will bypass the procedure even if it is executing. * This is for procedures which can't break out during executing(due to bug, mostly) * In this case, bypassing the procedure is not enough, since it is already stuck * there. We need to restart the master after bypassing, and letting the problematic * procedure to execute wth bypass=true, so in that condition, the procedure can be * successfully bypassed. {quote} For procedures which can't break out during executing, master restart is needed, which is mentioned in the comments. As for your problem, I think we can check bypass field before setting reExecute to true. > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646323#comment-16646323 ] Allan Yang commented on HBASE-21291: {quote} The problem is bypass need to grab a lock that state machine procedure will never release when it gets stuck. Let me try to add a UT for this. {quote} That's why bypass has a parameter named 'force', if force=true, it will bypass the procedure even it can't grab the lock. Yes, you can write a UT, that will be more clear. > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646311#comment-16646311 ] Jingyun Tian commented on HBASE-21291: -- [~allan163] The problem is bypass need to grab a lock that state machine procedure will never release when it gets stuck. Let me try to add a UT for this. > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646297#comment-16646297 ] Allan Yang commented on HBASE-21291: When bypassing, it will return null for whatever procedure, so subprocs will equal to null. it will go to the last branch: {code} else if (!suspended) { // No subtask, so we are done procedure.setState(ProcedureState.SUCCESS); } {code} You can try some case in TestProcedureBypass. > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Bypass doesn't work for state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646272#comment-16646272 ] Jingyun Tian commented on HBASE-21291: -- [~stack] [~Apache9] Maybe we should just remove this feature? Let the procedure schedule again? > Bypass doesn't work for state-machine procedures > > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Priority: Major > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)