[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657809#comment-16657809 ] Hudson commented on HBASE-21291: Results for branch branch-2.1 [build #496 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/496//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657801#comment-16657801 ] Hudson commented on HBASE-21291: Results for branch branch-2.0 [build #979 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/979//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656211#comment-16656211 ] Hudson commented on HBASE-21291: Results for branch branch-2.1 [build #486 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/486//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656181#comment-16656181 ] Hudson commented on HBASE-21291: Results for branch branch-2.0 [build #971 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/971//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655396#comment-16655396 ] stack commented on HBASE-21291: --- bq. Thus if the procedure is stuck at a state (like stuck in a while loop), bypassing will not work until we restart the master and submit the procedure again. How we let operator know they need to restart Master? How we let operator know the Procedure is stuck? On complete of Procedure, why we don't release the lock? Let me read code again. Thanks. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655381#comment-16655381 ] Jingyun Tian commented on HBASE-21291: -- [~stack] I think if we need restart master depends on if the procedure could be executed again. Since bypassing is actually setting a flag of the procedure and then when PE execute this procedure, it will return null immediately to finish the procedure. Thus if the procedure is stuck at a state (like stuck in a while loop), bypassing will not work until we restart the master and submit the procedure again. Thus if we cannot grab a lock for a procedure, it is mostly like already stuck at a state. Restarting master will be necessary. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655218#comment-16655218 ] stack commented on HBASE-21291: --- [~allan163] HBASE-21335 is to make a change in another repo, the hbck2 tool repo. [~tianjingyun] bq. ...Thus restarting master is needed to resolve the problem. Because the bypass finishes the procedure but the lock is still held? On restart, we will find the finished procedure and will not reinstitute its lock? Restarting the Master twice last night 'fixed' my stuck lock problem. Do we have to restart Master to undo the locks? My impression last night was that the Procedure would re-run after it had been bypassed, the PE would notice it 'finished' and then it would release its locks. How is this not the case? Thanks. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655140#comment-16655140 ] Allan Yang commented on HBASE-21291: Is HBASE-21335 filed to resolve the problem here? > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654846#comment-16654846 ] Jingyun Tian commented on HBASE-21291: -- [~stack] I add the condition check to all procedures, not only the state machine procedure. {quote} So, if override is set, make the waitTime some nominal amount – say 10ms? This way we wait on the lock for a little while but will proceed after 10ms even if we don't get the lock? {quote} Yes, it will wait 10ms to try to get lock. Then if we didn't get the lock but override is set, the bypass will be processed however. But the lock is released only when the stuck procedure finished. {quote} finally { if (lockEntry != null) { procExecutionLock.releaseLockEntry(lockEntry); } } {quote} Thus restarting master is needed to resolve the problem. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654628#comment-16654628 ] stack commented on HBASE-21291: --- So, if override is set, make the waitTime some nominal amount -- say 10ms? This way we wait on the lock for a little while but will proceed after 10ms even if we don't get the lock? So, doing bypass on an AssignProcedure, I see that it reports this: 2424243 2341740 RUNNABLE(Bypass) ... but it still has exclusive lock held. {code} REGION: 2651cb48574979f2dccc64e3c02ad5e0 Lock type: EXCLUSIVE Owner procedure: { ID => '2424243', PARENT_ID => '2341740', STATE => 'RUNNABLE', OWNER => 'hbase', TYPE => 'AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, region=2651cb48574979f2dccc64e3c02ad5e0', START_TIME => 'Wed Oct 17 12:41:37 PDT 2018', LAST_UPDATE => 'Wed Oct 17 15:29:53 PDT 2018', PARAMETERS => [ { transitionState => 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1534357375831', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdF8yMDE4MDgxNTEwNDA0NA==' }, startKey => 'Np+lMnWysqQ=', endKey => 'NqGeiPHP0F8=', offline => 'false', split => 'false', replicaId => '0' } } ] } {code} Maybe the lock is held because we are not running running this Procedure... PE is jammed up? > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654621#comment-16654621 ] stack commented on HBASE-21291: --- Also interesting is that I must pass waitTime of non-zero even when trying to bypass an Assign Procedure even though it is not a state machine procedure. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654619#comment-16654619 ] stack commented on HBASE-21291: --- Is it possible that bypass no longer works. IIRC, I could bypass a stuck Assign... now it does this but it stays stuck. Says it is bypassed but lock is still held: {code} 2018-10-17 21:12:21,051 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Begin bypass pid=2424243, ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, region=2651cb48574979f2dccc64e3c02ad5e0 with lockWait=1000, override=true, recursive=true 2018-10-17 21:12:21,051 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, region=2651cb48574979f2dccc64e3c02ad5e0 2018-10-17 21:12:21,260 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2341740, state=WAITING:SERVER_CRASH_HANDLE_RIT2, bypass=LOG-REDACTED ServerCrashProcedure server=vb1406.halxg.cloudera.com,22101,1539750561781, splitWal=true, meta=false 2018-10-17 21:12:21,386 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Bypassing pid=2424243, ppid=2341740, state=RUNNABLE:REGION_TRANSITION_DISPATCH, bypass=LOG-REDACTED AssignProcedure table=IntegrationTestBigLinkedList_20180815104044, region=2651cb48574979f2dccc64e3c02ad5e0 and its ancestors successfully, adding to queue {code} > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654608#comment-16654608 ] stack commented on HBASE-21291: --- 0 means wait forever I suppose? But I don't want to wait at all (especially if I have 10k regions that need bypassing). What should we do in this case? I also notice that hbck2 calls this param waitTime but exception says lockWait. I need to make them match. Thanks. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654561#comment-16654561 ] Jingyun Tian commented on HBASE-21291: -- [~stack] hbase-operator-tools set the lockwait to 0 if we don't set any value for it. Let me change the default value for lockWait of hbase-operator-tools? {code} long waitTime = 0; if (commandLine.hasOption(wait.getOpt())) { waitTime = Integer.valueOf(commandLine.getOptionValue(wait.getOpt())); waitTime *= 1000; // Because time is in seconds. } {code} > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654551#comment-16654551 ] stack commented on HBASE-21291: --- i.e. previously, I did not have to pass a lockWait value... > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654548#comment-16654548 ] stack commented on HBASE-21291: --- [~tianjingyun] With this patch applied, now when I do bypass it does the below {code} 18/10/17 19:36:20 ERROR client.HBaseHbck: 2441732 org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: lockWait should be positive at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: java.lang.IllegalArgumentException: lockWait should be positive at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1050) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1043) at org.apache.hadoop.hbase.master.MasterRpcServices.bypassProcedure(MasterRpcServices.java:2421) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) ... 3 more at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:336) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:571) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.bypassProcedure(MasterProtos.java) at org.apache.hadoop.hbase.client.HBaseHbck$1.call(HBaseHbck.java:145) at org.apache.hadoop.hbase.client.HBaseHbck$1.call(HBaseHbck.java:141) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.call(ProtobufUtil.java:2945) at org.apache.hadoop.hbase.client.HBaseHbck.bypassProcedure(HBaseHbck.java:140) at org.apache.hbase.HBCK2.bypass(HBCK2.java:183) at org.apache.hbase.HBCK2.run(HBCK2.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hbase.HBCK2.main(HBCK2.java:389) Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: lockWait should be positive at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:472) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: java.lang.IllegalArgumentException: lockWait should be positive at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1050) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.bypassProcedure(ProcedureExecutor.java:1043) at org.apache.hadoop.hbase.master.MasterRpcServices.bypassProcedure(MasterRpcServices.java:2421) ... {code} That what you expect sir? > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable >
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653509#comment-16653509 ] Hudson commented on HBASE-21291: Results for branch master [build #551 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/551/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/551//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/551//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/551//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652720#comment-16652720 ] stack commented on HBASE-21291: --- Let me test and see if we should pull it back further, into 2.0 and 2.1. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652661#comment-16652661 ] Hudson commented on HBASE-21291: Results for branch branch-2 [build #1399 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1399//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651776#comment-16651776 ] Allan Yang commented on HBASE-21291: Sure, can you give me your E-mail, so I can include you name and E-mail in the commit message > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651476#comment-16651476 ] Jingyun Tian commented on HBASE-21291: -- [~stack] Could you help commit this patch? > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649695#comment-16649695 ] Jingyun Tian commented on HBASE-21291: -- [~allan163] Could you help commit this patch? > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648902#comment-16648902 ] Hadoop QA commented on HBASE-21291: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 25s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 2s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 37m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21291 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943764/HBASE-21291.master.005.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 702b3cfe2db5 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 7464e2ef9d | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14688/testReport/ | | Max. process+thread count | 283 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14688/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add a test
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648834#comment-16648834 ] Hadoop QA commented on HBASE-21291: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 18s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 48s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 13 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 0m 10s{color} | {color:red} patch has 7 errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 41s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 29s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 46s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21291 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943746/HBASE-21291.master.005.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 4045ce499342 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648753#comment-16648753 ] stack commented on HBASE-21291: --- Ok. Let me try testing it on cluster. Thanks. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch, HBASE-21291.master.005.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648751#comment-16648751 ] Jingyun Tian commented on HBASE-21291: -- [~stack] Yes, I think pass one second or a certain number should be good. Otherwise HBCK will get stuck, that will make user confusing. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648750#comment-16648750 ] stack commented on HBASE-21291: --- Sorry for the dumb questions, so problem is that we've been passing '0' so we wait for ever? Idea is that instead, when bypassing, we'd pass one second or something? Thanks [~tianjingyun] > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648745#comment-16648745 ] Hadoop QA commented on HBASE-21291: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 37s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s{color} | {color:red} hbase-procedure: The patch generated 13 new + 13 unchanged - 0 fixed = 26 total (was 13) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 30s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 12s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21291 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943742/HBASE-21291.master.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 16a568ae067b 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 7464e2ef9d | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14682/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14682/testReport/ | | Max. process+thread count | 268 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | |
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648739#comment-16648739 ] Jingyun Tian commented on HBASE-21291: -- {code} IdLock.Entry lockEntry = procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); if (lockEntry == null && !force) { LOG.debug("Waited {} ms, but {} is still running, skipping bypass with force={}", lockWait, procedure, force); return false; } else if (lockEntry == null) { LOG.debug("Waited {} ms, but {} is still running, begin bypass with force={}", lockWait, procedure, force); } {code} [~stack] For a stuck procedure, set the force flag to true will skip grabbing the lock. But lockWait must be > 0 otherwise tryLockEntry will wait for the lock forever. Thus my simple fix is add a condition check that lockWait should be positive. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648735#comment-16648735 ] stack commented on HBASE-21291: --- Is there a fix here [~tianjingyun]? If bypass is called what happens? Thanks. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648730#comment-16648730 ] Jingyun Tian commented on HBASE-21291: -- [~stack] Just uploaded the fixed one. Sorry for the late. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch, > HBASE-21291.master.004.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648216#comment-16648216 ] stack commented on HBASE-21291: --- Is the fix in the latest patch? If so, I don't see it (Pardon my blindness). > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647751#comment-16647751 ] Allan Yang commented on HBASE-21291: {quote} +Preconditions.checkArgument(lockWait > 0 || (lockWait == 0 && force == false), + "if force is true, lockWait must be greater than 0, or lockWait >= 0"); {quote} Just be simple, check lockWait > 0 is enough. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647700#comment-16647700 ] Hadoop QA commented on HBASE-21291: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 14s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s{color} | {color:red} hbase-procedure: The patch generated 13 new + 13 unchanged - 0 fixed = 26 total (was 13) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 12s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 5s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21291 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943611/HBASE-21291.master.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 76214b97faf1 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 9e9a1e0f0d | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14669/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14669/testReport/ | | Max. process+thread count | 316 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647579#comment-16647579 ] Jingyun Tian commented on HBASE-21291: -- [~stack] We can skip getting lock by set force to true. But the lockWait time need to be positive, otherwise it will wait forever. [~allan163] I've got code formatted. And I add a condition check before we do bypass: {code} Preconditions.checkArgument(lockWait > 0 || (lockWait == 0 && force == false)); {code} Because if lockWait is 0 then force is true, it will still wait for the lock forever, which will make people confusing. Please check this out. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch, > HBASE-21291.master.002.patch, HBASE-21291.master.003.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647484#comment-16647484 ] Hadoop QA commented on HBASE-21291: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 2s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s{color} | {color:red} hbase-procedure: The patch generated 3 new + 3 unchanged - 0 fixed = 6 total (was 3) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 10s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 16m 19s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 31s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21291 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943590/HBASE-21291.master.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 05fdeb462b03 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 9e9a1e0f0d | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14664/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14664/testReport/ | | Max. process+thread count | 287 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647466#comment-16647466 ] stack commented on HBASE-21291: --- [~tianjingyun] Great. What [~allan163] said. How we fix this so StateMachineProcedures are bypassable (I think I've seen this issue in the wild -- but not sure.. unable to bypass a DisableTable...). If bypass and override skip getting lock? (I've not done the study...). > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647450#comment-16647450 ] Allan Yang commented on HBASE-21291: {code} +ProcedureTestingUtility.restart(procExecutor); {code} For this case, restart is not needed. Some line is way too long, you can modify your patch after checkstyle results come out. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21291) Add a test for bypassing stuck state-machine procedures
[ https://issues.apache.org/jira/browse/HBASE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647345#comment-16647345 ] Jingyun Tian commented on HBASE-21291: -- [~stack] A patch contains stuck state machine procedure is uploaded. pls check it out. > Add a test for bypassing stuck state-machine procedures > --- > > Key: HBASE-21291 > URL: https://issues.apache.org/jira/browse/HBASE-21291 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Jingyun Tian >Assignee: Jingyun Tian >Priority: Major > Attachments: HBASE-21291.master.001.patch > > > {code} > if (!procedure.isFailed()) { > if (subprocs != null) { > if (subprocs.length == 1 && subprocs[0] == procedure) { > // Procedure returned itself. Quick-shortcut for a state > machine-like procedure; > // i.e. we go around this loop again rather than go back out on > the scheduler queue. > subprocs = null; > reExecute = true; > LOG.trace("Short-circuit to next step on pid={}", > procedure.getProcId()); > } else { > // Yield the current procedure, and make the subprocedure runnable > // subprocs may come back 'null'. > subprocs = initializeChildren(procStack, procedure, subprocs); > LOG.info("Initialized subprocedures=" + > (subprocs == null? null: > Stream.of(subprocs).map(e -> "{" + e.toString() + "}"). > collect(Collectors.toList()).toString())); > } > } else if (procedure.getState() == ProcedureState.WAITING_TIMEOUT) { > LOG.debug("Added to timeoutExecutor {}", procedure); > timeoutExecutor.add(procedure); > } else if (!suspended) { > // No subtask, so we are done > procedure.setState(ProcedureState.SUCCESS); > } > } > {code} > Currently implementation of ProcedureExecutor will set the reExcecute to true > for state machine like procedure. Then if this procedure is stuck at one > certain state, it will loop forever. > {code} > IdLock.Entry lockEntry = > procExecutionLock.getLockEntry(proc.getProcId()); > try { > executeProcedure(proc); > } catch (AssertionError e) { > LOG.info("ASSERT pid=" + proc.getProcId(), e); > throw e; > } finally { > procExecutionLock.releaseLockEntry(lockEntry); > {code} > Since procedure will get the IdLock and release it after execution done, > state machine procedure will never release IdLock until it is finished. > Then bypassProcedure doesn't work because is will try to grab the IdLock at > first. > {code} > IdLock.Entry lockEntry = > procExecutionLock.tryLockEntry(procedure.getProcId(), lockWait); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)