[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842253#comment-17842253
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2
[build #1045 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1045/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1045/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1045/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1045/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1045/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-29 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842248#comment-17842248
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2.6
[build #108 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/108/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/108/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/108/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/108/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/108/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-29 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842246#comment-17842246
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2.4
[build #728 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/]:
 (/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-29 Thread Daniel Roudnitsky (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842144#comment-17842144
 ] 

Daniel Roudnitsky commented on HBASE-28405:
---

HBASE-28533 provides a partially similar patch for split procedure rollback to 
correct the in memory RegionState in HMaster to reflect that the region is 
still online on the RS after the failed split. HBASE-19126 seems relevant to 
both issues. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-29 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842147#comment-17842147
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2.5
[build #520 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/520/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/520/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/520/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/520/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/520/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841841#comment-17841841
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2
[build #1044 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1044/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1044/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1044/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1044/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1044/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841837#comment-17841837
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2.4
[build #727 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/727/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/727/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/727/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/727/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/727/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841819#comment-17841819
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2.6
[build #107 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/107/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/107/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/107/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/107/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/107/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841772#comment-17841772
 ] 

Hudson commented on HBASE-28405:


Results for branch master
[build #1063 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1063/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1063/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1063/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1063/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841771#comment-17841771
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-3
[build #195 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/195/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/195/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/195/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/195/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841694#comment-17841694
 ] 

Hudson commented on HBASE-28405:


Results for branch branch-2.5
[build #519 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/519/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/519/General_20Nightly_20Build_20Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/519/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/519/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/519/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 2.4.17, 2.5.8
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9
>
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834715#comment-17834715
 ] 

Viraj Jasani commented on HBASE-28405:
--

Btw in this whole investigation, we know that we do have real RIT because the 
region assign as part of the "region merge rollback" could not be completed, 
and this definitely needs to be fixed.

However, from HBase client perspective, read/write should not be affected on 
the merging region right? Because the region state is OPEN even in meta, only 
master's in-memory image has the state as MERGING. This doesn't change the fact 
that RIT needs to be fixed, it's definitely a bug, triggers alerts, requires 
manual hbck intervention which we need to minimize as much as possible, but I 
hope that at least clients should be fine in this whole situation.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>  Labels: pull-request-available
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Aman Poonia (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834661#comment-17834661
 ] 

Aman Poonia commented on HBASE-28405:
-

This is the fix i am testing currently
{code:java}
// code placeholder
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
index 813caa47d3..84f45a59a3 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java
@@ -639,8 +639,27 @@ public class MergeTableRegionsProcedure
    * Rollback close regions
    **/
   private void rollbackCloseRegionsForMerge(MasterProcedureEnv env) throws 
IOException {
-    AssignmentManagerUtil.reopenRegionsForRollback(env, 
Arrays.asList(regionsToMerge),
-      getRegionReplication(env), getServerName(env));
+    // At this point we should check if region was actually closed. If it was 
not closed then we
+    // don't need to repoen the region and we can just change the regionNode 
state to OPEN.
+    // if it is alredy closed then we need to do a reopen of region
+    ServerName serverName = getServerName(env);
+    List regionsOnServer = 
env.getAssignmentManager().getRegionsOnServer(serverName);
+    List toAssign = new ArrayList<>();
+    for (RegionInfo rinfo : regionsToMerge) {
+      if (!regionsOnServer.contains(rinfo)) {
+        toAssign.add(rinfo);
+      } else {
+        // Change the region state to OPEN from MERGING
+        boolean success = 
env.getAssignmentManager().getRegionStates().getRegionStateNode(rinfo)
+          .setState(State.OPEN, State.MERGING);
+        if (!success) {
+          LOG.warn("Region {} was not in expected state MERGING while rolling 
back",
+            rinfo.getEncodedName());
+        }
+      }
+    }
+    AssignmentManagerUtil.reopenRegionsForRollback(env, toAssign, 
getRegionReplication(env),
+      getServerName(env));
   }   private TransitRegionStateProcedure[] 
createUnassignProcedures(MasterProcedureEnv env) {code}
The idea is that before we create TRSP we should check in rollback step if it 
is needed for that particular region.

[~zhangduo] what do you think about this fix. here i am specifically fixing the 
rollback part of failed merge and not trying to touch any other unnecessary 
code. 

[~vjasani] FYI

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834626#comment-17834626
 ] 

Duo Zhang commented on HBASE-28405:
---

The root cause here is we should not issue a TRSP to bring the region online if 
the region is not closed.

I could provide a PR to show how to fix this problem.

There are other pending problems, like whether we should always ignore the 
redundant open request at rs side, how to deal with MERGING state in 
checkOnlineRegions, is it the correct way to set region to SPLITTING/MERGING 
before making it offline. We can file other issues for these problems.

Thanks.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834623#comment-17834623
 ] 

Viraj Jasani commented on HBASE-28405:
--

checkOnlineRegionsReport() is only called by reportOnlineRegions(), which is 
only called by regionServerReport().

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
> handler.AssignRegionHandler - Received OPEN for 
> table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already 
> online_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834622#comment-17834622
 ] 

Duo Zhang commented on HBASE-28405:
---

This is not the same problem, please do not mess up different problems…

The ‘does not match’ log is for checkOnlineRegions, not report region 
transition…

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
> handler.AssignRegionHandler - Received OPEN for 
> table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already 
> online_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834621#comment-17834621
 ] 

Viraj Jasani commented on HBASE-28405:
--

[~zhangduo] the problem here is that master will receive region report that 
will not match the state:
{code:java}
2024-04-05 18:56:55,397 WARN  
[iority.RWQ.Fifo.write.handler=0,queue=0,port=61000] 
assignment.AssignmentManager - Reporting xyz-12,61020,1712280779333 state does 
not match state=MERGING, location=xyz-12,61020,1712280779333, table=T1, 
region=fc6965af717fcd35182f85f0f192965e (time since last update=42224ms)

2024-04-05 18:56:56,370 INFO  [PEWorker-55] 
assignment.TransitRegionStateProcedure - Starting pid=36794357, 
state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
TransitRegionStateProcedure table=T1, region=fc6965af717fcd35182f85f0f192965e, 
ASSIGN; state=MERGING, location=xyz-12,61020,1712280779333; forceNewPlan=false, 
retain=false {code}

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834615#comment-17834615
 ] 

Duo Zhang commented on HBASE-28405:
---

Seems the intention here is to prevent concurrent split/merge, and after the 
TRSP executes, it will change the state according to its logic.

The problem here is that, after TRSP finishes, the region will be in CLOSED 
state, and we should have another way to prevent issuing a merge/split 
procedure at this time. Then why we do not use the same way at the beginning?



> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
> 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834613#comment-17834613
 ] 

Duo Zhang commented on HBASE-28405:
---

Checked the code, we do set region state to merging in memory, without changing 
the state in meta.

Let me dig more on why this is necessary.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
> handler.AssignRegionHandler - Received OPEN for 
> table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already 
> online_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-07 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834611#comment-17834611
 ] 

Duo Zhang commented on HBASE-28405:
---

As I explained above, the current logic depends on both master and rs side.

While a region is in transition, master is free to issue redundant open region 
requests many times to the same region server, the region server will ignore 
the redundant requests and make sure it reports the result to master at least 
once.

But if a region is not in transition state at all, master should not issue any 
requests to region server, as I also explained above, it will be fine if we 
issue the open region request to the same rs and we just report to master that 
it is already online, but what if we issue the open region request to other rs? 
It will cause the region to be online on two region servers…

So the key point here, is that while rolling back the 
MergeTableRegionsProcedure, we need to figure out whether we have actually 
issued a TRSP to bring the regions offline, if not, we should not issue the  a 
TRSP to bring the region online…

And it is a bit strange that, I think we should only change the region’s state 
to MERGING after we offline it? Why do we set it to MERGING while it is still 
online?


> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-06 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834600#comment-17834600
 ] 

Viraj Jasani commented on HBASE-28405:
--

I believe this should fix the issue:
{code:java}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
index a9ab6f502a..6beb0fcab7 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
@@ -99,9 +99,16 @@ public class AssignRegionHandler extends EventHandler {
     String encodedName = regionInfo.getEncodedName();
     byte[] encodedNameBytes = regionInfo.getEncodedNameAsBytes();
     String regionName = regionInfo.getRegionNameAsString();
-    Region onlineRegion = rs.getRegion(encodedName);
+    HRegion onlineRegion = rs.getRegion(encodedName);
     if (onlineRegion != null) {
       LOG.warn("Received OPEN for {} which is already online", regionName);
+      if (!rs.reportRegionStateTransition(
+        new RegionStateTransitionContext(TransitionCode.OPENED, 
onlineRegion.getOpenSeqNum(),
+          openProcId, masterSystemTime, onlineRegion.getRegionInfo( {
+        throw new IOException(
+          "Failed to report opened region to master: " + 
onlineRegion.getRegionInfo()
+            .getRegionNameAsString());
+      }
       // Just follow the old behavior, do we need to call 
reportRegionStateTransition? Maybe not?
       // For normal case, it could happen that the rpc call to schedule this 
handler is succeeded,
       // but before returning to master the connection is broken. And when 
master tries again, we {code}
This would make assign region an idempotent operation.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-04-06 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834599#comment-17834599
 ] 

Viraj Jasani commented on HBASE-28405:
--

{quote}Also, there is a related and interesting finding.
Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.
{quote}
Yes, assign should have been idempotent, but it's not so we need to fix this. 
Came across 3 similar incidents this week, out of which 2 were similar to this 
Jira i.e. merge transition is rolled back due to one of the parent regions was 
in transition. However, the rollback does not get successfully completed 
because assign is not somehow treated idempotent.

 
{quote}It would be ideal if the master does not make redundant requests, but if 
it does make one, the RS should handle the request and return success to the 
master because the request to open an already open region on the same server is 
idempotent with the earlier request that caused the region to be opened there 
in the first place.
{quote}
+1, if assign was idempotent, we would not have run into merge rollback getting 
stuck indefinitely.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821361#comment-17821361
 ] 

Andrew Kyle Purtell commented on HBASE-28405:
-

{quote}Maybe since region is online just changing the state in region state 
node (meta) from MERGING to OPEN would have sifficed in such cases.
{quote}
+1, I think this is what Duo was getting at. The solution is fixing the state 
in the master to reflect the region is still online on the RS after the failed 
merge.

Here:
{noformat}
2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
handler.AssignRegionHandler -
Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. 
which is already online
{noformat}
One option here is the RS can tell the master the assign succeeded, because the 
OPEN request is idempotent when the region is already open on the RS.


[~zhangduo] I don't understand this part:

bq. So the problem is that, we should not issue a TRSP if the region is already 
online, when rollbacking the MergeTableRegionsProcedure. If we assign the 
region to the same RS, it will hang the rollback

It is unnecessary to make a new TRSP to assign a region already online on the 
regionserver, but if we do make one, the RS should handle the request and 
return success to the master because the request to open an already open region 
on the same server is idempotent with the earlier request that caused the 
region to be opened there in the first place. So why would this hang the 
rollback? Maybe because today the RS won't ack the new request. So we can 
change the RS code to do that if so.

Although it would be good to optimize the master so it isn't making redundant 
requests. 

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Aman Poonia (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821350#comment-17821350
 ] 

Aman Poonia commented on HBASE-28405:
-

[~zhangduo] Thanks for the insight.
{noformat}
The logic at RS side is that, at the end of assign a region, it will retry 
forever on reporting this to master. So if we find out that the region is 
already online, we should just ignore it, as we can make sure that there is 
someone else will finally report it to master, to avoid double report and cause 
issues.
{noformat}
When i checked the log of OpenRegionProcedure on RS there were no logs for that 
rpocedure. Similarly when we look at the master logs there were no logs about 
the parent procedure and its progress. SO we were stuck in this state infinitely

One another though we had to execute TRSP but maybe not the assign because the 
state of region was merging in region state node.

This is the difference. when we check for state in region we use 
regionstatenode and when we check if region is online on RS we use the 
onlineregions map of RS to see if region is online. So basically we are looking 
at two different places in same flow.

Maybe since region is online we just change the state in region state node 
(meta) from MERGING to OPEN

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821232#comment-17821232
 ] 

Duo Zhang commented on HBASE-28405:
---

The logic at RS side is that, at the end of assign a region, it will retry 
forever on reporting this to master. So if we find out that the region is 
already online, we should just ignore it, as we can make sure that there is 
someone else will finally report it to master, to avoid double report and cause 
issues.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
> handler.AssignRegionHandler - Received OPEN for 
> 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821231#comment-17821231
 ] 

Duo Zhang commented on HBASE-28405:
---

So the problem is that, we should not issue a TRSP if the region is already 
online, when rollbacking the MergeTableRegionsProcedure. If we assign the 
region to the same RS, it will hang the rollback, a worse scenario is we try to 
assign it to another region server, then it will lead to double assign and 
cause data loss...

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This 
> handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - 
> pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, 
> regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - 
> Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; 
> OpenRegionProcedure a92008b76ccae47d55c590930b837036, 
> server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] 
> handler.AssignRegionHandler - Received OPEN for 
> 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Aman Poonia (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821227#comment-17821227
 ] 

Aman Poonia commented on HBASE-28405:
-

If we had executed postDeploy here the region would have come out of RIT.
{code:java}
// code placeholder
// From here on out, this is PONR. We can not revert back. The only way to 
address an
// exception from here on out is to abort the region server.
rs.postOpenDeployTasks(new PostOpenDeployContext(region, openProcId, 
masterSystemTime));
rs.addRegion(region);
LOG.info("Opened {}", regionName);
// Cache the open region procedure id after report region transition succeed.
rs.finishRegionProcedure(openProcId);
Boolean current = 
rs.getRegionsInTransitionInRS().remove(regionInfo.getEncodedNameAsBytes());
if (current == null) {
  // Should NEVER happen, but let's be paranoid.
  LOG.error("Bad state: we've just opened {} which was NOT in transition", 
regionName);
} else if (!current) {
  // Should NEVER happen, but let's be paranoid.
  LOG.error("Bad state: we've just opened {} which was closing", regionName);
} {code}
I don't see any harm in executing the above piece of code in this scenario.  
Even if we do this multiple times this piece of code seems idempotent.

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where 
> region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. 
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure. When we run TRSP 
> it sends a 

[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

2024-02-27 Thread Aman Poonia (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821214#comment-17821214
 ] 

Aman Poonia commented on HBASE-28405:
-

Looking at the code comments it says that it might be a mistake and master is 
checking on RS again and last response has not reached to HMaster yet. But our 
current issue seems to be little different. Here first call itself is returned 
without any reportRegionStateTransition
{code:java}
// code placeholder
String regionName = regionInfo.getRegionNameAsString();
Region onlineRegion = rs.getRegion(encodedName);
if (onlineRegion != null) {
  LOG.warn("Received OPEN for {} which is already online", regionName);
  // Just follow the old behavior, do we need to call 
reportRegionStateTransition? Maybe not?
  // For normal case, it could happen that the rpc call to schedule this 
handler is succeeded,
  // but before returning to master the connection is broken. And when master 
tries again, we
  // have already finished the opening. For this case we do not need to call
  // reportRegionStateTransition any more.
  return;
}
Boolean previous = 
rs.getRegionsInTransitionInRS().putIfAbsent(encodedNameBytes, Boolean.TRUE); 
{code}
[~apurtell] [~vjasani] FYI

> Region open procedure silently returns without notifying the parent proc
> 
>
> Key: HBASE-28405
> URL: https://issues.apache.org/jira/browse/HBASE-28405
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 2.5.7
>Reporter: Aman Poonia
>Assignee: Aman Poonia
>Priority: Major
>
> We had a scenario in production where a merge operation had failed as below
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] 
> assignment.MergeTableRegionsProcedure - Error trying to merge 
> [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in 
> table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, 
> location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> Now when we do rollback of failed merge operation we see a issue where region 
> is in state opened until the RS holding it stopped.
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - 
> Stored [pid=26674602, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1, 
> region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> and rollback finished successfully
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - 
> Rolled back pid=26673594, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.HBaseIOException via 
> master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent 
> region state=MERGING, location=rs-229,60020,1707587658182, table=table1, 
> region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; 
> MergeTableRegionsProcedure table=table1, 
> regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], 
> force=false exec-time=1.4820 sec_
> We create a procedure to open the region a92008b76ccae47d55c590930b837036
> Intrestingly we didnt close the region as creation of procedure to close 
> regions had thrown exception and not execution of procedure.
> Now when we run TRSP it sends a OpenRegionProcedure which