[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834661#comment-17834661 ] Aman Poonia commented on HBASE-28405: - This is the fix i am testing currently {code:java} // code placeholder diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java index 813caa47d3..84f45a59a3 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java @@ -639,8 +639,27 @@ public class MergeTableRegionsProcedure * Rollback close regions **/ private void rollbackCloseRegionsForMerge(MasterProcedureEnv env) throws IOException { - AssignmentManagerUtil.reopenRegionsForRollback(env, Arrays.asList(regionsToMerge), - getRegionReplication(env), getServerName(env)); + // At this point we should check if region was actually closed. If it was not closed then we + // don't need to repoen the region and we can just change the regionNode state to OPEN. + // if it is alredy closed then we need to do a reopen of region + ServerName serverName = getServerName(env); + List regionsOnServer = env.getAssignmentManager().getRegionsOnServer(serverName); + List toAssign = new ArrayList<>(); + for (RegionInfo rinfo : regionsToMerge) { + if (!regionsOnServer.contains(rinfo)) { + toAssign.add(rinfo); + } else { + // Change the region state to OPEN from MERGING + boolean success = env.getAssignmentManager().getRegionStates().getRegionStateNode(rinfo) + .setState(State.OPEN, State.MERGING); + if (!success) { + LOG.warn("Region {} was not in expected state MERGING while rolling back", + rinfo.getEncodedName()); + } + } + } + AssignmentManagerUtil.reopenRegionsForRollback(env, toAssign, getRegionReplication(env), + getServerName(env)); } private TransitRegionStateProcedure[] createUnassignProcedures(MasterProcedureEnv env) {code} The idea is that before we create TRSP we should check in rollback step if it is needed for that particular region. [~zhangduo] what do you think about this fix. here i am specifically fixing the rollback part of failed merge and not trying to touch any other unnecessary code. [~vjasani] FYI > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback
[jira] [Commented] (HBASE-28420) Aborting Active HMaster is not rejecting remote Procedure Reports
[ https://issues.apache.org/jira/browse/HBASE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830899#comment-17830899 ] Aman Poonia commented on HBASE-28420: - Summarising the offfline discussion with [~umesh9414] Most procedures are stored in proc store. So if we start storing this proc also in store and when we get a report from RS we will try to update the state to success in the store. Since active master has changed, the old master would not be able to update the store. This will ensure that old master doesn't process the request and instead throws an exception. [~zhangduo] Are you suggesting a similar approach? [~umesh9414] do you want to create a draft PR to give an idea of what we are planning? > Aborting Active HMaster is not rejecting remote Procedure Reports > - > > Key: HBASE-28420 > URL: https://issues.apache.org/jira/browse/HBASE-28420 > Project: HBase > Issue Type: Bug > Components: master, proc-v2 >Affects Versions: 2.5.7 >Reporter: Umesh Kumar Kumawat >Assignee: Umesh Kumar Kumawat >Priority: Critical > > When the Active Hmaster is in the process of abortion and another HMaster is > becoming Active HMaster,at the same time if any region server reports the > completion of the remote procedure, it generally goes to the old active > HMaster because of the cached value of rssStub -> > [code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L2829] > ([caller > method|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L3941]). > On the Master side > ([code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L2381]), > It did check if the service is started but that returns true if the master > is in the process of abortion(I didn't see when we are setting this flag > false while abortion). > This issue becomes *critical* when *ServerCrash of meta hosting RS and master > failover* happens at the same time and hbase:meta got stuck in the offline > state. > Logs for abortion start of HMaster > {noformat} > 2024-02-02 07:33:11,581 ERROR [PEWorker-6] master.HMaster - * ABORTING > master server4-1xxx,61000,1705169084562: > FAILED persisting region=52d36581218e00a2668776cfea897132 state=CLOSING > *{noformat} > {noformat} > 2024-02-02 07:33:40,999 INFO [master/server4-1xxx:61000] > regionserver.HRegionServer - Exiting; > stopping=hbase2b-mnds4-1-ia2.ops.sfdc.net,61000,1705169084562; zookeeper > connection closed.{noformat} > it took almost 30 seconds to abort the HMaster. > > Logs of starting SCP for meta carrying host. (This SCP is started by the new > active HMaster) > {noformat} > 2024-02-02 07:33:32,622 INFO [aster/server3-1xxx61000:becomeActiveMaster] > assignment.AssignmentManager - Scheduled > ServerCrashProcedure pid=3305546 for server5-1xxx61020,1706857451955 > (carryingMeta=true) server5-1- > xxx61020,1706857451955/CRASHED/regionCount=1/lock=java.util.concurrent.locks.ReentrantReadWriteLock@1b0a5293[Write > > locks = 1, Read locks = 0], oldState=ONLINE.{noformat} > initialization of remote procedure > {noformat} > 2024-02-02 07:33:33,178 INFO [PEWorker-4] procedure2.ProcedureExecutor - > Initialized subprocedures=[{pid=3305548, > ppid=3305547, state=RUNNABLE; SplitWALRemoteProcedure server5-1- > t%2C61020%2C1706857451955.meta.1706858156058.meta, > worker=server4-1-,61020,1705169180881}]{noformat} > Logs of remote procedure handling on Old Active Hmaster(server4-1xxx,61000) > (in the process of abortion) > {noformat} > 2024-02-02 07:33:37,990 DEBUG > [r.default.FPBQ.Fifo.handler=243,queue=9,port=61000] master.HMaster - Remote > procedure > done, pid=3305548{noformat} > This should be handled by the new active HMaster so that it can wake up the > suspended Procedure on the new Active Hmaster. As the new ActiveHMaster was > not able to wake that up, SCP procedure got stuck thus meta stayed OFFLINE. > > Logs of Hmaster trying to becomeActivehmaster but stuck- > {noformat} > 2024-02-02 07:33:43,159 WARN [aster/server3-1-ia2:61000:becomeActiveMaster] > master.HMaster - hbase:meta,,1.1588230740 > is NOT online; state={1588230740 state=OPEN, ts=1706859212481, > server=server5-1-xxx,61020,1706857451955}; > ServerCrashProcedures=true. Master startup cannot progress, in > holding-pattern until region onlined.{noformat} > After this master was stuck till we did hmaster failover to come out of this > situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28420) Aborting Active HMaster is not rejecting remote Procedure Reports
[ https://issues.apache.org/jira/browse/HBASE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830863#comment-17830863 ] Aman Poonia commented on HBASE-28420: - [~umesh9414] Thank you for the clarification. I was wondering what happens if the new master also restarts in between and another host becomes master or is in process of becoming active master? > Aborting Active HMaster is not rejecting remote Procedure Reports > - > > Key: HBASE-28420 > URL: https://issues.apache.org/jira/browse/HBASE-28420 > Project: HBase > Issue Type: Bug > Components: master, proc-v2 >Affects Versions: 2.5.7 >Reporter: Umesh Kumar Kumawat >Assignee: Umesh Kumar Kumawat >Priority: Critical > > When the Active Hmaster is in the process of abortion and another HMaster is > becoming Active HMaster,at the same time if any region server reports the > completion of the remote procedure, it generally goes to the old active > HMaster because of the cached value of rssStub -> > [code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L2829] > ([caller > method|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L3941]). > On the Master side > ([code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L2381]), > It did check if the service is started but that returns true if the master > is in the process of abortion(I didn't see when we are setting this flag > false while abortion). > This issue becomes *critical* when *ServerCrash of meta hosting RS and master > failover* happens at the same time and hbase:meta got stuck in the offline > state. > Logs for abortion start of HMaster > {noformat} > 2024-02-02 07:33:11,581 ERROR [PEWorker-6] master.HMaster - * ABORTING > master server4-1xxx,61000,1705169084562: > FAILED persisting region=52d36581218e00a2668776cfea897132 state=CLOSING > *{noformat} > {noformat} > 2024-02-02 07:33:40,999 INFO [master/server4-1xxx:61000] > regionserver.HRegionServer - Exiting; > stopping=hbase2b-mnds4-1-ia2.ops.sfdc.net,61000,1705169084562; zookeeper > connection closed.{noformat} > it took almost 30 seconds to abort the HMaster. > > Logs of starting SCP for meta carrying host. (This SCP is started by the new > active HMaster) > {noformat} > 2024-02-02 07:33:32,622 INFO [aster/server3-1xxx61000:becomeActiveMaster] > assignment.AssignmentManager - Scheduled > ServerCrashProcedure pid=3305546 for server5-1xxx61020,1706857451955 > (carryingMeta=true) server5-1- > xxx61020,1706857451955/CRASHED/regionCount=1/lock=java.util.concurrent.locks.ReentrantReadWriteLock@1b0a5293[Write > > locks = 1, Read locks = 0], oldState=ONLINE.{noformat} > initialization of remote procedure > {noformat} > 2024-02-02 07:33:33,178 INFO [PEWorker-4] procedure2.ProcedureExecutor - > Initialized subprocedures=[{pid=3305548, > ppid=3305547, state=RUNNABLE; SplitWALRemoteProcedure server5-1- > t%2C61020%2C1706857451955.meta.1706858156058.meta, > worker=server4-1-,61020,1705169180881}]{noformat} > Logs of remote procedure handling on Old Active Hmaster(server4-1xxx,61000) > (in the process of abortion) > {noformat} > 2024-02-02 07:33:37,990 DEBUG > [r.default.FPBQ.Fifo.handler=243,queue=9,port=61000] master.HMaster - Remote > procedure > done, pid=3305548{noformat} > This should be handled by the new active HMaster so that it can wake up the > suspended Procedure on the new Active Hmaster. As the new ActiveHMaster was > not able to wake that up, SCP procedure got stuck thus meta stayed OFFLINE. > > Logs of Hmaster trying to becomeActivehmaster but stuck- > {noformat} > 2024-02-02 07:33:43,159 WARN [aster/server3-1-ia2:61000:becomeActiveMaster] > master.HMaster - hbase:meta,,1.1588230740 > is NOT online; state={1588230740 state=OPEN, ts=1706859212481, > server=server5-1-xxx,61020,1706857451955}; > ServerCrashProcedures=true. Master startup cannot progress, in > holding-pattern until region onlined.{noformat} > After this master was stuck till we did hmaster failover to come out of this > situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28420) Aborting Active HMaster is not rejecting remote Procedure Reports
[ https://issues.apache.org/jira/browse/HBASE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830850#comment-17830850 ] Aman Poonia commented on HBASE-28420: - [~umesh9414] i think the issue here is that new master doesn't know that a procedure is scheduled as it is not stored in procedure store. A simple check might not fix it. A proper solution is to store this type(SplitWALRemoteProcedure) of procedure in proc store so when a new master comes up it reads the proc store for current ongoing procs and gets to know that there is a remote proc schedued and it needs to check for the progress of that. In the above case new master doesn't even know that a procedure was scheduled by old master. > Aborting Active HMaster is not rejecting remote Procedure Reports > - > > Key: HBASE-28420 > URL: https://issues.apache.org/jira/browse/HBASE-28420 > Project: HBase > Issue Type: Bug > Components: master, proc-v2 >Affects Versions: 2.5.7 >Reporter: Umesh Kumar Kumawat >Assignee: Umesh Kumar Kumawat >Priority: Critical > > When the Active Hmaster is in the process of abortion and another HMaster is > becoming Active HMaster,at the same time if any region server reports the > completion of the remote procedure, it generally goes to the old active > HMaster because of the cached value of rssStub -> > [code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L2829] > ([caller > method|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L3941]). > On the Master side > ([code|https://github.com/apache/hbase/blob/branch-2.5/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L2381]), > It did check if the service is started but that returns true if the master > is in the process of abortion(I didn't see when we are setting this flag > false while abortion). > This issue becomes *critical* when *ServerCrash of meta hosting RS and master > failover* happens at the same time and hbase:meta got stuck in the offline > state. > Logs for abortion start of HMaster > {noformat} > 2024-02-02 07:33:11,581 ERROR [PEWorker-6] master.HMaster - * ABORTING > master server4-1xxx,61000,1705169084562: > FAILED persisting region=52d36581218e00a2668776cfea897132 state=CLOSING > *{noformat} > {noformat} > 2024-02-02 07:33:40,999 INFO [master/server4-1xxx:61000] > regionserver.HRegionServer - Exiting; > stopping=hbase2b-mnds4-1-ia2.ops.sfdc.net,61000,1705169084562; zookeeper > connection closed.{noformat} > it took almost 30 seconds to abort the HMaster. > > Logs of starting SCP for meta carrying host. (This SCP is started by the new > active HMaster) > {noformat} > 2024-02-02 07:33:32,622 INFO [aster/server3-1xxx61000:becomeActiveMaster] > assignment.AssignmentManager - Scheduled > ServerCrashProcedure pid=3305546 for server5-1xxx61020,1706857451955 > (carryingMeta=true) server5-1- > xxx61020,1706857451955/CRASHED/regionCount=1/lock=java.util.concurrent.locks.ReentrantReadWriteLock@1b0a5293[Write > > locks = 1, Read locks = 0], oldState=ONLINE.{noformat} > initialization of remote procedure > {noformat} > 2024-02-02 07:33:33,178 INFO [PEWorker-4] procedure2.ProcedureExecutor - > Initialized subprocedures=[{pid=3305548, > ppid=3305547, state=RUNNABLE; SplitWALRemoteProcedure server5-1- > t%2C61020%2C1706857451955.meta.1706858156058.meta, > worker=server4-1-,61020,1705169180881}]{noformat} > Logs of remote procedure handling on Old Active Hmaster(server4-1xxx,61000) > (in the process of abortion) > {noformat} > 2024-02-02 07:33:37,990 DEBUG > [r.default.FPBQ.Fifo.handler=243,queue=9,port=61000] master.HMaster - Remote > procedure > done, pid=3305548{noformat} > This should be handled by the new active HMaster so that it can wake up the > suspended Procedure on the new Active Hmaster. As the new ActiveHMaster was > not able to wake that up, SCP procedure got stuck thus meta stayed OFFLINE. > > Logs of Hmaster trying to becomeActivehmaster but stuck- > {noformat} > 2024-02-02 07:33:43,159 WARN [aster/server3-1-ia2:61000:becomeActiveMaster] > master.HMaster - hbase:meta,,1.1588230740 > is NOT online; state={1588230740 state=OPEN, ts=1706859212481, > server=server5-1-xxx,61020,1706857451955}; > ServerCrashProcedures=true. Master startup cannot progress, in > holding-pattern until region onlined.{noformat} > After this master was stuck till we did hmaster failover to come out of this > situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821350#comment-17821350 ] Aman Poonia edited comment on HBASE-28405 at 2/27/24 5:23 PM: -- [~zhangduo] Thanks for the insight. {noformat} The logic at RS side is that, at the end of assign a region, it will retry forever on reporting this to master. So if we find out that the region is already online, we should just ignore it, as we can make sure that there is someone else will finally report it to master, to avoid double report and cause issues. {noformat} When i checked the log of OpenRegionProcedure on RS there were no logs for that rpocedure. Similarly when we look at the master logs there were no logs about the parent procedure and its progress. SO we were stuck in this state infinitely One another though, we had to execute TRSP but maybe not the assign because the state of region was merging in region state node. Dfference - when we check for state in region we use regionstatenode and when we check if region is online on RS we use the onlineregions map of RS to see if region is online. So basically we are looking at two different places in same flow. Maybe since region is online just changing the state in region state node (meta) from MERGING to OPEN would have sifficed in such cases. was (Author: mnpoonia): [~zhangduo] Thanks for the insight. {noformat} The logic at RS side is that, at the end of assign a region, it will retry forever on reporting this to master. So if we find out that the region is already online, we should just ignore it, as we can make sure that there is someone else will finally report it to master, to avoid double report and cause issues. {noformat} When i checked the log of OpenRegionProcedure on RS there were no logs for that rpocedure. Similarly when we look at the master logs there were no logs about the parent procedure and its progress. SO we were stuck in this state infinitely One another though, we had to execute TRSP but maybe not the assign because the state of region was merging in region state node. Dfference - when we check for state in region we use regionstatenode and when we check if region is online on RS we use the onlineregions map of RS to see if region is online. So basically we are looking at two different places in same flow. Maybe since region is online we just change the state in region state node (meta) from MERGING to OPEN > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback of failed merge operation we see a issue where >
[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821350#comment-17821350 ] Aman Poonia edited comment on HBASE-28405 at 2/27/24 5:08 PM: -- [~zhangduo] Thanks for the insight. {noformat} The logic at RS side is that, at the end of assign a region, it will retry forever on reporting this to master. So if we find out that the region is already online, we should just ignore it, as we can make sure that there is someone else will finally report it to master, to avoid double report and cause issues. {noformat} When i checked the log of OpenRegionProcedure on RS there were no logs for that rpocedure. Similarly when we look at the master logs there were no logs about the parent procedure and its progress. SO we were stuck in this state infinitely One another though, we had to execute TRSP but maybe not the assign because the state of region was merging in region state node. Dfference - when we check for state in region we use regionstatenode and when we check if region is online on RS we use the onlineregions map of RS to see if region is online. So basically we are looking at two different places in same flow. Maybe since region is online we just change the state in region state node (meta) from MERGING to OPEN was (Author: mnpoonia): [~zhangduo] Thanks for the insight. {noformat} The logic at RS side is that, at the end of assign a region, it will retry forever on reporting this to master. So if we find out that the region is already online, we should just ignore it, as we can make sure that there is someone else will finally report it to master, to avoid double report and cause issues. {noformat} When i checked the log of OpenRegionProcedure on RS there were no logs for that rpocedure. Similarly when we look at the master logs there were no logs about the parent procedure and its progress. SO we were stuck in this state infinitely One another though we had to execute TRSP but maybe not the assign because the state of region was merging in region state node. This is the difference. when we check for state in region we use regionstatenode and when we check if region is online on RS we use the onlineregions map of RS to see if region is online. So basically we are looking at two different places in same flow. Maybe since region is online we just change the state in region state node (meta) from MERGING to OPEN > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback of failed merge operation we see a issue where > region is in state opened
[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821350#comment-17821350 ] Aman Poonia commented on HBASE-28405: - [~zhangduo] Thanks for the insight. {noformat} The logic at RS side is that, at the end of assign a region, it will retry forever on reporting this to master. So if we find out that the region is already online, we should just ignore it, as we can make sure that there is someone else will finally report it to master, to avoid double report and cause issues. {noformat} When i checked the log of OpenRegionProcedure on RS there were no logs for that rpocedure. Similarly when we look at the master logs there were no logs about the parent procedure and its progress. SO we were stuck in this state infinitely One another though we had to execute TRSP but maybe not the assign because the state of region was merging in region state node. This is the difference. when we check for state in region we use regionstatenode and when we check if region is online on RS we use the onlineregions map of RS to see if region is online. So basically we are looking at two different places in same flow. Maybe since region is online we just change the state in region state node (meta) from MERGING to OPEN > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback of failed merge operation we see a issue where > region is in state opened until the RS holding it stopped.* > Rollback create a TRSP as below > _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - > Stored [pid=26674602, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1, > region=a92008b76ccae47d55c590930b837036, ASSIGN]_ > *and rollback finished successfully* > _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - > Rolled back pid=26673594, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.HBaseIOException via > master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent > region state=MERGING, location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; > MergeTableRegionsProcedure table=table1, > regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], > force=false exec-time=1.4820 sec_ > *We create a procedure to open the region a92008b76ccae47d55c590930b837036. > Intrestingly we didnt close the region as creation of procedure to close > regions had thrown exception and not execution of procedure. When we
[jira] [Comment Edited] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821227#comment-17821227 ] Aman Poonia edited comment on HBASE-28405 at 2/27/24 1:14 PM: -- If we had executed postDeploy here the region would have come out of RIT. {code:java} // code placeholder // From here on out, this is PONR. We can not revert back. The only way to address an // exception from here on out is to abort the region server. rs.postOpenDeployTasks(new PostOpenDeployContext(region, openProcId, masterSystemTime)); rs.addRegion(region); LOG.info("Opened {}", regionName); // Cache the open region procedure id after report region transition succeed. rs.finishRegionProcedure(openProcId); Boolean current = rs.getRegionsInTransitionInRS().remove(regionInfo.getEncodedNameAsBytes()); if (current == null) { // Should NEVER happen, but let's be paranoid. LOG.error("Bad state: we've just opened {} which was NOT in transition", regionName); } else if (!current) { // Should NEVER happen, but let's be paranoid. LOG.error("Bad state: we've just opened {} which was closing", regionName); } {code} I don't see any harm in using the above piece of code in this scenario. Even if we do this multiple times this piece of code seems idempotent. was (Author: mnpoonia): If we had executed postDeploy here the region would have come out of RIT. {code:java} // code placeholder // From here on out, this is PONR. We can not revert back. The only way to address an // exception from here on out is to abort the region server. rs.postOpenDeployTasks(new PostOpenDeployContext(region, openProcId, masterSystemTime)); rs.addRegion(region); LOG.info("Opened {}", regionName); // Cache the open region procedure id after report region transition succeed. rs.finishRegionProcedure(openProcId); Boolean current = rs.getRegionsInTransitionInRS().remove(regionInfo.getEncodedNameAsBytes()); if (current == null) { // Should NEVER happen, but let's be paranoid. LOG.error("Bad state: we've just opened {} which was NOT in transition", regionName); } else if (!current) { // Should NEVER happen, but let's be paranoid. LOG.error("Bad state: we've just opened {} which was closing", regionName); } {code} I don't see any harm in executing the above piece of code in this scenario. Even if we do this multiple times this piece of code seems idempotent. > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback of failed merge operation we see a issue where > region is in state opened until the RS holding it stopped.* > Rollback create
[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821227#comment-17821227 ] Aman Poonia commented on HBASE-28405: - If we had executed postDeploy here the region would have come out of RIT. {code:java} // code placeholder // From here on out, this is PONR. We can not revert back. The only way to address an // exception from here on out is to abort the region server. rs.postOpenDeployTasks(new PostOpenDeployContext(region, openProcId, masterSystemTime)); rs.addRegion(region); LOG.info("Opened {}", regionName); // Cache the open region procedure id after report region transition succeed. rs.finishRegionProcedure(openProcId); Boolean current = rs.getRegionsInTransitionInRS().remove(regionInfo.getEncodedNameAsBytes()); if (current == null) { // Should NEVER happen, but let's be paranoid. LOG.error("Bad state: we've just opened {} which was NOT in transition", regionName); } else if (!current) { // Should NEVER happen, but let's be paranoid. LOG.error("Bad state: we've just opened {} which was closing", regionName); } {code} I don't see any harm in executing the above piece of code in this scenario. Even if we do this multiple times this piece of code seems idempotent. > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback of failed merge operation we see a issue where > region is in state opened until the RS holding it stopped.* > Rollback create a TRSP as below > _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - > Stored [pid=26674602, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1, > region=a92008b76ccae47d55c590930b837036, ASSIGN]_ > *and rollback finished successfully* > _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - > Rolled back pid=26673594, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.HBaseIOException via > master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent > region state=MERGING, location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; > MergeTableRegionsProcedure table=table1, > regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], > force=false exec-time=1.4820 sec_ > *We create a procedure to open the region a92008b76ccae47d55c590930b837036. > Intrestingly we didnt close the region as creation of procedure to close > regions had thrown exception and not execution of procedure. When we run TRSP > it sends a
[jira] [Updated] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-28405: Description: *We had a scenario in production where a merge operation had failed as below* _2024-02-11 10:53:57,715 ERROR [PEWorker-31] assignment.MergeTableRegionsProcedure - Error trying to merge [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ _at org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ _at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ _at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ *Now when we do rollback of failed merge operation we see a issue where region is in state opened until the RS holding it stopped.* Rollback create a TRSP as below _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - Stored [pid=26674602, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=table1, region=a92008b76ccae47d55c590930b837036, ASSIGN]_ *and rollback finished successfully* _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - Rolled back pid=26673594, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.HBaseIOException via master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; MergeTableRegionsProcedure table=table1, regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], force=false exec-time=1.4820 sec_ *We create a procedure to open the region a92008b76ccae47d55c590930b837036. Intrestingly we didnt close the region as creation of procedure to close regions had thrown exception and not execution of procedure. When we run TRSP it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This handlers on execution suggests that region is already online* Sequence of events are as follow _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, regionState=OPENING, regionLocation=rs-210,60020,1707596461539_ _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; OpenRegionProcedure a92008b76ccae47d55c590930b837036, server=rs-210,60020,1707596461539}]_ _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] handler.AssignRegionHandler - Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already online_ was: We had a scenario in production where a merge operation had failed as below _2024-02-11 10:53:57,715 ERROR [PEWorker-31] assignment.MergeTableRegionsProcedure - Error trying to merge [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ _at org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ _at
[jira] [Commented] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821214#comment-17821214 ] Aman Poonia commented on HBASE-28405: - Looking at the code comments it says that it might be a mistake and master is checking on RS again and last response has not reached to HMaster yet. But our current issue seems to be little different. Here first call itself is returned without any reportRegionStateTransition {code:java} // code placeholder String regionName = regionInfo.getRegionNameAsString(); Region onlineRegion = rs.getRegion(encodedName); if (onlineRegion != null) { LOG.warn("Received OPEN for {} which is already online", regionName); // Just follow the old behavior, do we need to call reportRegionStateTransition? Maybe not? // For normal case, it could happen that the rpc call to schedule this handler is succeeded, // but before returning to master the connection is broken. And when master tries again, we // have already finished the opening. For this case we do not need to call // reportRegionStateTransition any more. return; } Boolean previous = rs.getRegionsInTransitionInRS().putIfAbsent(encodedNameBytes, Boolean.TRUE); {code} [~apurtell] [~vjasani] FYI > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > We had a scenario in production where a merge operation had failed as below > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > Now when we do rollback of failed merge operation we see a issue where region > is in state opened until the RS holding it stopped. > Rollback create a TRSP as below > _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - > Stored [pid=26674602, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1, > region=a92008b76ccae47d55c590930b837036, ASSIGN]_ > and rollback finished successfully > _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - > Rolled back pid=26673594, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.HBaseIOException via > master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent > region state=MERGING, location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; > MergeTableRegionsProcedure table=table1, > regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], > force=false exec-time=1.4820 sec_ > We create a procedure to open the region a92008b76ccae47d55c590930b837036 > Intrestingly we didnt close the region as creation of procedure to close > regions had thrown exception and not execution of procedure. > Now when we run TRSP it sends a OpenRegionProcedure which
[jira] [Updated] (HBASE-28405) Region open procedure silently returns without notifying the parent proc
[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-28405: Summary: Region open procedure silently returns without notifying the parent proc (was: Region open procedure silently does nothing without notifying the parent proc) > Region open procedure silently returns without notifying the parent proc > > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.7 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > We had a scenario in production where a merge operation had failed as below > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > Now when we do rollback of failed merge operation we see a issue where region > is in state opened until the RS holding it stopped. > Rollback create a TRSP as below > _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - > Stored [pid=26674602, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1, > region=a92008b76ccae47d55c590930b837036, ASSIGN]_ > and rollback finished successfully > _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - > Rolled back pid=26673594, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.HBaseIOException via > master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent > region state=MERGING, location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; > MergeTableRegionsProcedure table=table1, > regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], > force=false exec-time=1.4820 sec_ > We create a procedure to open the region a92008b76ccae47d55c590930b837036 > Intrestingly we didnt close the region as creation of procedure to close > regions had thrown exception and not execution of procedure. > Now when we run TRSP it sends a OpenRegionProcedure which is handled by > AssignRegionHandler > This handlers on execution suggests that region is already online > Sequence of events are as follow > _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - > pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, > regionState=OPENING, regionLocation=rs-210,60020,1707596461539_ > _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - > Initialized subprocedures=[\{pid=26675798, ppid=26674602, state=RUNNABLE; > OpenRegionProcedure a92008b76ccae47d55c590930b837036, > server=rs-210,60020,1707596461539}]_ > _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] > handler.AssignRegionHandler - Received OPEN for > table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already > online_ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28405) Region open procedure silently does nothing without notifying the parent proc
Aman Poonia created HBASE-28405: --- Summary: Region open procedure silently does nothing without notifying the parent proc Key: HBASE-28405 URL: https://issues.apache.org/jira/browse/HBASE-28405 Project: HBase Issue Type: Bug Components: proc-v2 Affects Versions: 2.5.7 Reporter: Aman Poonia Assignee: Aman Poonia We had a scenario in production where a merge operation had failed as below _2024-02-11 10:53:57,715 ERROR [PEWorker-31] assignment.MergeTableRegionsProcedure - Error trying to merge [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ _at org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ _at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ _at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ Now when we do rollback of failed merge operation we see a issue where region is in state opened until the RS holding it stopped. Rollback create a TRSP as below _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - Stored [pid=26674602, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=table1, region=a92008b76ccae47d55c590930b837036, ASSIGN]_ and rollback finished successfully _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - Rolled back pid=26673594, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.HBaseIOException via master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; MergeTableRegionsProcedure table=table1, regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], force=false exec-time=1.4820 sec_ We create a procedure to open the region a92008b76ccae47d55c590930b837036 Intrestingly we didnt close the region as creation of procedure to close regions had thrown exception and not execution of procedure. Now when we run TRSP it sends a OpenRegionProcedure which is handled by AssignRegionHandler This handlers on execution suggests that region is already online Sequence of events are as follow _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, regionState=OPENING, regionLocation=rs-210,60020,1707596461539_ _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - Initialized subprocedures=[\{pid=26675798, ppid=26674602, state=RUNNABLE; OpenRegionProcedure a92008b76ccae47d55c590930b837036, server=rs-210,60020,1707596461539}]_ _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] handler.AssignRegionHandler - Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already online_ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27960) Broken build because of cycloneDX
[ https://issues.apache.org/jira/browse/HBASE-27960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739610#comment-17739610 ] Aman Poonia commented on HBASE-27960: - Sorry for the noise. This is already fixed in latest 2.4 branch and above. I was using old branch. > Broken build because of cycloneDX > - > > Key: HBASE-27960 > URL: https://issues.apache.org/jira/browse/HBASE-27960 > Project: HBase > Issue Type: Bug > Environment: macos 13.4 > openjdk version "1.8.0_362" > OpenJDK Runtime Environment (Zulu 8.68.0.20-SA-macosx) (build 1.8.0_362-b08) > OpenJDK 64-Bit Server VM (Zulu 8.68.0.20-SA-macosx) (build 25.362-b08, mixed > mode) > > Apache Maven 3.9.3 (21122926829f1ead511c958d89bd2f672198ae9f) >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > [INFO] CycloneDX: Resolving Dependencies > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3439) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3361) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3213) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:654) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:669) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:328) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:316) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:212) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:174) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:75) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:162) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:159) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:206) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:283) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:226) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:407) > at
[jira] [Resolved] (HBASE-27960) Broken build because of cycloneDX
[ https://issues.apache.org/jira/browse/HBASE-27960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia resolved HBASE-27960. - Resolution: Duplicate > Broken build because of cycloneDX > - > > Key: HBASE-27960 > URL: https://issues.apache.org/jira/browse/HBASE-27960 > Project: HBase > Issue Type: Bug > Environment: macos 13.4 > openjdk version "1.8.0_362" > OpenJDK Runtime Environment (Zulu 8.68.0.20-SA-macosx) (build 1.8.0_362-b08) > OpenJDK 64-Bit Server VM (Zulu 8.68.0.20-SA-macosx) (build 25.362-b08, mixed > mode) > > Apache Maven 3.9.3 (21122926829f1ead511c958d89bd2f672198ae9f) >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > [INFO] CycloneDX: Resolving Dependencies > [ERROR] An error occurred attempting to read POM > org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml > decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen version="1.0" encoding="ISO-8859-1"... @1:42) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDeclWithVersion > (MXParser.java:3439) > at org.codehaus.plexus.util.xml.pull.MXParser.parseXmlDecl > (MXParser.java:3361) > at org.codehaus.plexus.util.xml.pull.MXParser.parsePI (MXParser.java:3213) > at org.codehaus.plexus.util.xml.pull.MXParser.parseProlog > (MXParser.java:1828) > at org.codehaus.plexus.util.xml.pull.MXParser.nextImpl > (MXParser.java:1757) > at org.codehaus.plexus.util.xml.pull.MXParser.next (MXParser.java:1375) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:627) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:654) > at org.apache.maven.model.io.xpp3.MavenXpp3Reader.read > (MavenXpp3Reader.java:669) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:759) > at org.cyclonedx.maven.BaseCycloneDxMojo.readPom > (BaseCycloneDxMojo.java:746) > at org.cyclonedx.maven.BaseCycloneDxMojo.retrieveParentProject > (BaseCycloneDxMojo.java:694) > at org.cyclonedx.maven.BaseCycloneDxMojo.getClosestMetadata > (BaseCycloneDxMojo.java:524) > at org.cyclonedx.maven.BaseCycloneDxMojo.convert > (BaseCycloneDxMojo.java:481) > at org.cyclonedx.maven.CycloneDxMojo.execute (CycloneDxMojo.java:70) > at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo > (DefaultBuildPluginManager.java:126) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 > (MojoExecutor.java:328) > at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute > (MojoExecutor.java:316) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:212) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:174) > at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 > (MojoExecutor.java:75) > at org.apache.maven.lifecycle.internal.MojoExecutor$1.run > (MojoExecutor.java:162) > at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute > (DefaultMojosExecutionStrategy.java:39) > at org.apache.maven.lifecycle.internal.MojoExecutor.execute > (MojoExecutor.java:159) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:105) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject > (LifecycleModuleBuilder.java:73) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build > (SingleThreadedBuilder.java:53) > at org.apache.maven.lifecycle.internal.LifecycleStarter.execute > (LifecycleStarter.java:118) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261) > at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173) > at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101) > at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906) > at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283) > at org.apache.maven.cli.MavenCli.main (MavenCli.java:206) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced > (Launcher.java:283) > at org.codehaus.plexus.classworlds.launcher.Launcher.launch > (Launcher.java:226) > at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode > (Launcher.java:407) > at org.codehaus.plexus.classworlds.launcher.Launcher.main > (Launcher.java:348) -- This message was sent by Atlassian Jira
[jira] [Created] (HBASE-27960) Broken build because of cycloneDX
Aman Poonia created HBASE-27960: --- Summary: Broken build because of cycloneDX Key: HBASE-27960 URL: https://issues.apache.org/jira/browse/HBASE-27960 Project: HBase Issue Type: Bug Environment: macos 13.4 openjdk version "1.8.0_362" OpenJDK Runtime Environment (Zulu 8.68.0.20-SA-macosx) (build 1.8.0_362-b08) OpenJDK 64-Bit Server VM (Zulu 8.68.0.20-SA-macosx) (build 25.362-b08, mixed mode) Apache Maven 3.9.3 (21122926829f1ead511c958d89bd2f672198ae9f) Reporter: Aman Poonia Assignee: Aman Poonia [INFO] CycloneDX: Resolving Dependencies [ERROR] An error occurred attempting to read POM org.codehaus.plexus.util.xml.pull.XmlPullParserException: UTF-8 BOM plus xml decl of ISO-8859-1 is incompatible (position: START_DOCUMENT seen
[jira] [Resolved] (HBASE-24969) Back-port "HBASE-20289 Fix comparator for NormalizationPlan" to branch-1
[ https://issues.apache.org/jira/browse/HBASE-24969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia resolved HBASE-24969. - Resolution: Won't Fix > Back-port "HBASE-20289 Fix comparator for NormalizationPlan" to branch-1 > > > Key: HBASE-24969 > URL: https://issues.apache.org/jira/browse/HBASE-24969 > Project: HBase > Issue Type: Bug >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27384) Backport HBASE-27064 to branch 2.4
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-27384: Summary: Backport HBASE-27064 to branch 2.4 (was: Concurrent modification in RegionNormalizerWorkQueue) > Backport HBASE-27064 to branch 2.4 > --- > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerWorkQueue.take(RegionNormalizerWorkQueue.java:192) > at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.lambda$testTake$3(TestRegionNormalizerWorkQueue.java:192) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > at > java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175){}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27384) Backport HBASE-27064 to branch 2.4
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-27384: Parent: HBASE-27064 Issue Type: Sub-task (was: Bug) > Backport HBASE-27064 to branch 2.4 > --- > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Sub-task > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerWorkQueue.take(RegionNormalizerWorkQueue.java:192) > at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.lambda$testTake$3(TestRegionNormalizerWorkQueue.java:192) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > at > java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175){}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27384) Concurrent modification in RegionNormalizerWorkQueue
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608138#comment-17608138 ] Aman Poonia commented on HBASE-27384: - Looks like this is already fixed in branch-2.5 in HBASE-27064. We can backport it to 2.4 > Concurrent modification in RegionNormalizerWorkQueue > > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerWorkQueue.take(RegionNormalizerWorkQueue.java:192) > at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.lambda$testTake$3(TestRegionNormalizerWorkQueue.java:192) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > at > java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175){}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-27384) Concurrent modification in RegionNormalizerWorkQueue
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608018#comment-17608018 ] Aman Poonia edited comment on HBASE-27384 at 9/22/22 1:03 AM: -- The issue here is we use LinkedHashSet which would throws exception when we modify the Set after a iterator is created. [https://github.com/apache/hbase/blob/branch-2.4/hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/RegionNormalizerWorkQueue.java#L184] {code:java} // code placeholder public E take() throws InterruptedException { E x; takeLock.lockInterruptibly(); try { while (delegate.isEmpty()) { notEmpty.await(); } final Iterator iter = delegate.iterator(); x = iter.next(); iter.remove(); if (!delegate.isEmpty()) { notEmpty.signal(); } } finally { takeLock.unlock(); } return x; } {code} [LinkedHasSet javadoc|https://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashSet.html] As we can see in above code while we are reading the set, we don't take putLock and only use takeLock which leaves the Set open for modification. was (Author: mnpoonia): The issue here is we use LinkedHashSet which would throws exception when we modify the Set after a iterator is created. [https://github.com/apache/hbase/blob/branch-2.4/hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/RegionNormalizerWorkQueue.java#L184] {code:java} // code placeholder public E take() throws InterruptedException { E x; takeLock.lockInterruptibly(); try { while (delegate.isEmpty()) { notEmpty.await(); } final Iterator iter = delegate.iterator(); x = iter.next(); iter.remove(); if (!delegate.isEmpty()) { notEmpty.signal(); } } finally { takeLock.unlock(); } return x; } {code} As we can see in above code while we are reading the set, we don't take putLock and only use takeLock which leaves the Set open for modification. > Concurrent modification in RegionNormalizerWorkQueue > > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at
[jira] [Comment Edited] (HBASE-27384) Concurrent modification in RegionNormalizerWorkQueue
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608018#comment-17608018 ] Aman Poonia edited comment on HBASE-27384 at 9/22/22 1:00 AM: -- The issue here is we use LinkedHashSet which would throws exception when we modify the Set after a iterator is created. [https://github.com/apache/hbase/blob/branch-2.4/hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/RegionNormalizerWorkQueue.java#L184] {code:java} // code placeholder public E take() throws InterruptedException { E x; takeLock.lockInterruptibly(); try { while (delegate.isEmpty()) { notEmpty.await(); } final Iterator iter = delegate.iterator(); x = iter.next(); iter.remove(); if (!delegate.isEmpty()) { notEmpty.signal(); } } finally { takeLock.unlock(); } return x; } {code} As we can see in above code while we are reading the set, we don't take putLock and only use takeLock which leaves the Set open for modification. was (Author: mnpoonia): The issue here is we use LinkedHashSet which would throws exception when we modify the Set after a iterator is created. https://github.com/apache/hbase/blob/branch-2.4/hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/RegionNormalizerWorkQueue.java#L184 {code:java} // code placeholder public E take() throws InterruptedException { E x; takeLock.lockInterruptibly(); try { while (delegate.isEmpty()) { notEmpty.await(); } final Iterator iter = delegate.iterator(); x = iter.next(); iter.remove(); if (!delegate.isEmpty()) { notEmpty.signal(); } } finally { takeLock.unlock(); } return x; } {code} As we can clearly see in above code while we are reading the set we don't take putLock and only use takeLock which leaves the Set open for modification. > Concurrent modification in RegionNormalizerWorkQueue > > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at >
[jira] [Commented] (HBASE-27384) Concurrent modification in RegionNormalizerWorkQueue
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608018#comment-17608018 ] Aman Poonia commented on HBASE-27384: - The issue here is we use LinkedHashSet which would throws exception when we modify the Set after a iterator is created. https://github.com/apache/hbase/blob/branch-2.4/hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/RegionNormalizerWorkQueue.java#L184 {code:java} // code placeholder public E take() throws InterruptedException { E x; takeLock.lockInterruptibly(); try { while (delegate.isEmpty()) { notEmpty.await(); } final Iterator iter = delegate.iterator(); x = iter.next(); iter.remove(); if (!delegate.isEmpty()) { notEmpty.signal(); } } finally { takeLock.unlock(); } return x; } {code} As we can clearly see in above code while we are reading the set we don't take putLock and only use takeLock which leaves the Set open for modification. > Concurrent modification in RegionNormalizerWorkQueue > > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerWorkQueue.take(RegionNormalizerWorkQueue.java:192) > at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.lambda$testTake$3(TestRegionNormalizerWorkQueue.java:192) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > at > java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175){}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HBASE-27384) Concurrent modification in RegionNormalizerWorkQueue
[ https://issues.apache.org/jira/browse/HBASE-27384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia reassigned HBASE-27384: --- Assignee: Aman Poonia > Concurrent modification in RegionNormalizerWorkQueue > > > Key: HBASE-27384 > URL: https://issues.apache.org/jira/browse/HBASE-27384 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 2.4.14 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > {*}Error: > java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: > java.util.ConcurrentModificationException at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:750) Caused by: > java.util.ConcurrentModificationException at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at > org.apache.hadoop.hbase.master.normalizer.RegionNormalizerWorkQueue.take(RegionNormalizerWorkQueue.java:192) > at > org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.lambda$testTake$3(TestRegionNormalizerWorkQueue.java:192) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > at > java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175){}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27384) Concurrent modification in RegionNormalizerWorkQueue
Aman Poonia created HBASE-27384: --- Summary: Concurrent modification in RegionNormalizerWorkQueue Key: HBASE-27384 URL: https://issues.apache.org/jira/browse/HBASE-27384 Project: HBase Issue Type: Bug Components: Normalizer Affects Versions: 2.4.14 Reporter: Aman Poonia {*}Error: java.util.ConcurrentModificationException{*}{{{}java.util.concurrent.ExecutionException: java.util.ConcurrentModificationException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.testTake(TestRegionNormalizerWorkQueue.java:211) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:39) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:750) Caused by: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) at org.apache.hadoop.hbase.master.normalizer.RegionNormalizerWorkQueue.take(RegionNormalizerWorkQueue.java:192) at org.apache.hadoop.hbase.master.normalizer.TestRegionNormalizerWorkQueue.lambda$testTake$3(TestRegionNormalizerWorkQueue.java:192) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175){}}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-26779) Introduce a config to enable setting normalizer target region count through hbase site.
Aman Poonia created HBASE-26779: --- Summary: Introduce a config to enable setting normalizer target region count through hbase site. Key: HBASE-26779 URL: https://issues.apache.org/jira/browse/HBASE-26779 Project: HBase Issue Type: New Feature Components: Normalizer Affects Versions: 2.4.10, 1.7.1 Reporter: Aman Poonia Assignee: Aman Poonia Currently, we can define NORMALIZER_TARGET_REGION_COUNT in table descriptors. I am thinking of introducing a global property through hbase-site so we don't need to set it at each level but just add the property and that would be considered a default value. If we want to override this value we can always use htable descriptor for selected tables. priority of configs (priority decreases as we move down) NORMALIZER_TARGET_REGION_COUNT hbase.normalizer.target_region_count -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HBASE-26073) Avoid merging regions if tables has not reached the state where regions are not split because of number of regions
[ https://issues.apache.org/jira/browse/HBASE-26073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17491804#comment-17491804 ] Aman Poonia edited comment on HBASE-26073 at 2/14/22, 5:45 AM: --- [~ndimiduk] This is not an issue for us now as we use normalizer in a way that we will not encounter this anymore. To elaborate more, we configure min number of regions after which normalizer should trigger to a number higher than number of region servers. In few cases the number is as large as twice the number of RS in cluster. was (Author: mnpoonia): [~ndimiduk] This is not an issue for us now as we use normalizer in a way that we will not encounter this anymore. > Avoid merging regions if tables has not reached the state where regions are > not split because of number of regions > -- > > Key: HBASE-26073 > URL: https://issues.apache.org/jira/browse/HBASE-26073 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Affects Versions: 1.7.0, 2.4.4 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > we have a table on a cluster with 100 regions with default split policy > (SteppingSplitPolicy). This is a small table and will not get loaded with too > much data. Now if region size of table is smaller than the normalizer target > region size than there are chances that normalizer will consider the regions > for merges. But since the number of regions are small split policy will > trigger split on next flush. This is a continuous loop and our cluster will > be busy in these two actions. > We plan to consider number of regions and number of regionservers in creating > plans for normalizer. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26073) Avoid merging regions if tables has not reached the state where regions are not split because of number of regions
[ https://issues.apache.org/jira/browse/HBASE-26073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17491804#comment-17491804 ] Aman Poonia commented on HBASE-26073: - [~ndimiduk] This is not an issue for us now as we use normalizer in a way that we will not encounter this anymore. > Avoid merging regions if tables has not reached the state where regions are > not split because of number of regions > -- > > Key: HBASE-26073 > URL: https://issues.apache.org/jira/browse/HBASE-26073 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Affects Versions: 1.7.0, 2.4.4 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > we have a table on a cluster with 100 regions with default split policy > (SteppingSplitPolicy). This is a small table and will not get loaded with too > much data. Now if region size of table is smaller than the normalizer target > region size than there are chances that normalizer will consider the regions > for merges. But since the number of regions are small split policy will > trigger split on next flush. This is a continuous loop and our cluster will > be busy in these two actions. > We plan to consider number of regions and number of regionservers in creating > plans for normalizer. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HBASE-26073) Avoid merging regions if tables has not reached the state where regions are not split because of number of regions
[ https://issues.apache.org/jira/browse/HBASE-26073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-26073: Description: we have a table on a cluster with 100 regions with default split policy (SteppingSplitPolicy). This is a small table and will not get loaded with too much data. Now if region size of table is smaller than the normalizer target region size than there are chances that normalizer will consider the regions for merges. But since the number of regions are small split policy will trigger split on next flush. This is a continuous loop and our cluster will be busy in these two actions. We plan to consider number of regions and number of regionservers in creating plans for normalizer. was: we have a table on a cluster with 100 regions with default split policy (SteppingSplitPolicy). This is a small table and will not get loaded with too much data. Now if region size of table is smaller than the normalizer target region size than there are chances that normalizer will consider the regions for merges. But since the number of regions are small split policy will trigger split on next flush. This is a continuous loop and our cluster will be busy in these two actions. We plan to consider number of regions and number of regions in creating plans for normalizer. > Avoid merging regions if tables has not reached the state where regions are > not split because of number of regions > -- > > Key: HBASE-26073 > URL: https://issues.apache.org/jira/browse/HBASE-26073 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Affects Versions: 1.7.0, 2.4.4 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > we have a table on a cluster with 100 regions with default split policy > (SteppingSplitPolicy). This is a small table and will not get loaded with too > much data. Now if region size of table is smaller than the normalizer target > region size than there are chances that normalizer will consider the regions > for merges. But since the number of regions are small split policy will > trigger split on next flush. This is a continuous loop and our cluster will > be busy in these two actions. > We plan to consider number of regions and number of regionservers in creating > plans for normalizer. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26752) Fix flappy test TestSimpleRegionNormalizerOnCluster.java
Aman Poonia created HBASE-26752: --- Summary: Fix flappy test TestSimpleRegionNormalizerOnCluster.java Key: HBASE-26752 URL: https://issues.apache.org/jira/browse/HBASE-26752 Project: HBase Issue Type: Bug Components: Normalizer Affects Versions: 1.7.1 Reporter: Aman Poonia Assignee: Aman Poonia TestSimpleRegionNormalizerOnCluster.java can hang after HBASE-26744 The assumption that order of HTable list is sorted is wrong so depending on that order can cause the test to hang or be inaccurate -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26744) Normalizer exits without normalizing all the tables
Aman Poonia created HBASE-26744: --- Summary: Normalizer exits without normalizing all the tables Key: HBASE-26744 URL: https://issues.apache.org/jira/browse/HBASE-26744 Project: HBase Issue Type: Bug Components: Normalizer Affects Versions: 1.7.1 Reporter: Aman Poonia Assignee: Aman Poonia Currently if there are multiple tables to normalize, normalizer is exiting even before iterating all the table if there is a table that doesnot require normalization here is the offending code [https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1736] {code:java} List plans = this.normalizer.computePlansForTable(table); if (plans == null || plans.isEmpty()) { return true; } {code} this is running inside loop over tables -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HBASE-26073) Avoid merging regions if tables has not reached the state where regions are not split because of number of regions
[ https://issues.apache.org/jira/browse/HBASE-26073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-26073: Summary: Avoid merging regions if tables has not reached the state where regions are not split because of number of regions (was: Avoid merging regions if tables has not reached the state where numbers of regions are not split because of number of regions) > Avoid merging regions if tables has not reached the state where regions are > not split because of number of regions > -- > > Key: HBASE-26073 > URL: https://issues.apache.org/jira/browse/HBASE-26073 > Project: HBase > Issue Type: Improvement > Components: Normalizer >Affects Versions: 1.7.0, 2.4.4 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > we have a table on a cluster with 100 regions with default split policy > (SteppingSplitPolicy). This is a small table and will not get loaded with too > much data. Now if region size of table is smaller than the normalizer target > region size than there are chances that normalizer will consider the regions > for merges. But since the number of regions are small split policy will > trigger split on next flush. This is a continuous loop and our cluster will > be busy in these two actions. > We plan to consider number of regions and number of regions in creating plans > for normalizer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26073) Avoid merging regions if tables has not reached the state where numbers of regions are not split because of number of regions
Aman Poonia created HBASE-26073: --- Summary: Avoid merging regions if tables has not reached the state where numbers of regions are not split because of number of regions Key: HBASE-26073 URL: https://issues.apache.org/jira/browse/HBASE-26073 Project: HBase Issue Type: Improvement Components: Normalizer Affects Versions: 2.4.4, 1.7.0 Reporter: Aman Poonia Assignee: Aman Poonia we have a table on a cluster with 100 regions with default split policy (SteppingSplitPolicy). This is a small table and will not get loaded with too much data. Now if region size of table is smaller than the normalizer target region size than there are chances that normalizer will consider the regions for merges. But since the number of regions are small split policy will trigger split on next flush. This is a continuous loop and our cluster will be busy in these two actions. We plan to consider number of regions and number of regions in creating plans for normalizer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25986) Expose the NORMALIZARION_ENABLED table descriptor through a property in hbase-site
Aman Poonia created HBASE-25986: --- Summary: Expose the NORMALIZARION_ENABLED table descriptor through a property in hbase-site Key: HBASE-25986 URL: https://issues.apache.org/jira/browse/HBASE-25986 Project: HBase Issue Type: Improvement Components: Normalizer Reporter: Aman Poonia Assignee: Aman Poonia Today if we want to enable region normalizer on a table we have to add the table descriptor "{color:#6a8759}NORMALIZATION_ENABLED{color}" to the table. If we have a lot of tables and we want normalizer to be enabled by default for each table unless disabled explicitly for a table we can't achieve it [https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/TableDescriptorBuilder.java#L164] Intention here is to set it using a property in hbase-site -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25593) Backport changes from HBASE-24418 to branch-1
Aman Poonia created HBASE-25593: --- Summary: Backport changes from HBASE-24418 to branch-1 Key: HBASE-25593 URL: https://issues.apache.org/jira/browse/HBASE-25593 Project: HBase Issue Type: Improvement Components: Normalizer Reporter: Aman Poonia Assignee: Aman Poonia Fix For: 1.7.0 Backport _"HBASE-24418 Consolidate Normalizer implementations"_ to branch-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25592) Improve normalizer code in line with HBASE-23932
Aman Poonia created HBASE-25592: --- Summary: Improve normalizer code in line with HBASE-23932 Key: HBASE-25592 URL: https://issues.apache.org/jira/browse/HBASE-25592 Project: HBase Issue Type: Improvement Components: Normalizer Reporter: Aman Poonia Assignee: Aman Poonia Fix For: 1.7.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25569) Adding region count and average region size in a table at regionserver level
Aman Poonia created HBASE-25569: --- Summary: Adding region count and average region size in a table at regionserver level Key: HBASE-25569 URL: https://issues.apache.org/jira/browse/HBASE-25569 Project: HBase Issue Type: New Feature Components: metrics Affects Versions: 1.6.0 Reporter: Aman Poonia Assignee: Aman Poonia Fix For: 1.7.0 Currently we have these two metrics - regionCount and avgRegionSize in branch-2+. Adding them to branch-1. These will give better insight into whether we should enable region normalizer on a table or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23966) Backport HBASE-22285 (MergeToNormalize) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-23966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia resolved HBASE-23966. - Resolution: Won't Do > Backport HBASE-22285 (MergeToNormalize) to branch-1 > --- > > Key: HBASE-23966 > URL: https://issues.apache.org/jira/browse/HBASE-23966 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.7.0 >Reporter: Viraj Jasani >Assignee: Aman Poonia >Priority: Major > Fix For: 1.7.0 > > > A normalizer which merges very small size regions with adjacent regions > (MergeToNormalize) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25523) Region normalizer chore thread is getting killed
[ https://issues.apache.org/jira/browse/HBASE-25523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-25523: Affects Version/s: 3.0.0-alpha-1 2.4.1 > Region normalizer chore thread is getting killed > > > Key: HBASE-25523 > URL: https://issues.apache.org/jira/browse/HBASE-25523 > Project: HBase > Issue Type: Bug > Components: Normalizer >Affects Versions: 3.0.0-alpha-1, 1.6.0, 2.4.1 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Major > > Region normalizer chore thread is getting killed when the region is not found > on any server. > As per the method > {code:java} > // code placeholder > /** > * @param serverName > * @return ServerLoad if serverName is known else null > */ > public ServerLoad getLoad(final ServerName serverName) { > return this.onlineServers.get(serverName); > } > {code} > So ideally we should check for the returned null in SimpleRegionNormalizer > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25523) Region normalizer chore thread is getting killed
Aman Poonia created HBASE-25523: --- Summary: Region normalizer chore thread is getting killed Key: HBASE-25523 URL: https://issues.apache.org/jira/browse/HBASE-25523 Project: HBase Issue Type: Bug Components: Normalizer Affects Versions: 1.6.0 Reporter: Aman Poonia Assignee: Aman Poonia Region normalizer chore thread is getting killed when the region is not found on any server. As per the method {code:java} // code placeholder /** * @param serverName * @return ServerLoad if serverName is known else null */ public ServerLoad getLoad(final ServerName serverName) { return this.onlineServers.get(serverName); } {code} So ideally we should check for the returned null in SimpleRegionNormalizer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25105) Fix log line in SimpleRegionNormalizer
[ https://issues.apache.org/jira/browse/HBASE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia resolved HBASE-25105. - Resolution: Fixed > Fix log line in SimpleRegionNormalizer > -- > > Key: HBASE-25105 > URL: https://issues.apache.org/jira/browse/HBASE-25105 > Project: HBase > Issue Type: Bug >Affects Versions: 1.7.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Fix For: 1.7.0 > > > Currently it is logging the string targetRegionSize instead of value > LOG.debug("Table " + table + ": target region count is " + targetRegionCount+ > ", target region size is targetRegionSize"); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25105) Fix log line in SimpleRegionNormalizer
[ https://issues.apache.org/jira/browse/HBASE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-25105: Fix Version/s: 1.7.0 > Fix log line in SimpleRegionNormalizer > -- > > Key: HBASE-25105 > URL: https://issues.apache.org/jira/browse/HBASE-25105 > Project: HBase > Issue Type: Bug >Affects Versions: 1.7.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Fix For: 1.7.0 > > > Currently it is logging the string targetRegionSize instead of value > LOG.debug("Table " + table + ": target region count is " + targetRegionCount+ > ", target region size is targetRegionSize"); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25105) Fix log line in SimpleRegionNormalizer
[ https://issues.apache.org/jira/browse/HBASE-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-25105: Affects Version/s: (was: 1.6.0) 1.7.0 > Fix log line in SimpleRegionNormalizer > -- > > Key: HBASE-25105 > URL: https://issues.apache.org/jira/browse/HBASE-25105 > Project: HBase > Issue Type: Bug >Affects Versions: 1.7.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > Currently it is logging the string targetRegionSize instead of value > LOG.debug("Table " + table + ": target region count is " + targetRegionCount+ > ", target region size is targetRegionSize"); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25105) Fix log line in SimpleRegionNormalizer
Aman Poonia created HBASE-25105: --- Summary: Fix log line in SimpleRegionNormalizer Key: HBASE-25105 URL: https://issues.apache.org/jira/browse/HBASE-25105 Project: HBase Issue Type: Bug Affects Versions: 1.6.0 Reporter: Aman Poonia Assignee: Aman Poonia Currently it is logging the string targetRegionSize instead of value LOG.debug("Table " + table + ": target region count is " + targetRegionCount+ ", target region size is targetRegionSize"); -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24988) Donot merge regions if they are non adjacent in MergeNormalizationPlan.execute
Aman Poonia created HBASE-24988: --- Summary: Donot merge regions if they are non adjacent in MergeNormalizationPlan.execute Key: HBASE-24988 URL: https://issues.apache.org/jira/browse/HBASE-24988 Project: HBase Issue Type: Bug Affects Versions: 1.6.0 Reporter: Aman Poonia Assignee: Aman Poonia Currently when we have MergeNormalizationPlan we do a force merge in execute method {code:java} // code placeholder admin.mergeRegions(firstRegion.getEncodedNameAsBytes(), secondRegion.getEncodedNameAsBytes(), true); {code} Since we do not expect these regions to be non adjacent it is better to not force merge and be on safe side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24970) Backport HBASE-20985 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-24970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187581#comment-17187581 ] Aman Poonia commented on HBASE-24970: - [~sandeep.guggilam] [~vjasani] FYI > Backport HBASE-20985 to branch-1 > > > Key: HBASE-24970 > URL: https://issues.apache.org/jira/browse/HBASE-24970 > Project: HBase > Issue Type: Improvement >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24970) Backport HBASE-20985 to branch-1
Aman Poonia created HBASE-24970: --- Summary: Backport HBASE-20985 to branch-1 Key: HBASE-24970 URL: https://issues.apache.org/jira/browse/HBASE-24970 Project: HBase Issue Type: Improvement Reporter: Aman Poonia Assignee: Aman Poonia -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24969) Back-port "HBASE-20289 Fix comparator for NormalizationPlan" to branch-1
Aman Poonia created HBASE-24969: --- Summary: Back-port "HBASE-20289 Fix comparator for NormalizationPlan" to branch-1 Key: HBASE-24969 URL: https://issues.apache.org/jira/browse/HBASE-24969 Project: HBase Issue Type: Bug Reporter: Aman Poonia Assignee: Aman Poonia -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111790#comment-17111790 ] Aman Poonia commented on HBASE-22285: - [~ndimiduk]- Yes sir your review is correct. The reason to introduce new Normalizer is because i wanted to keep the things simpler. A simpleRegionNormalizer does multiple things. It split the region and it merges the region depending on the average region size and there was no configuratin to do only merge or only splits. What i wanted to achieve was just merge the regions to make sure we don't have zero bytes or very small region(may be less than a MB or GB). Adding the configuration helps in achieving the same thing but then we will have too many configuration for one normalizer and predicting it's behaviour becomes bit difficult. So for the sake of simplicity i created a new Normalizer and abstracted the common functionality out. If you think adding configuration is a good way then adding a new class then i am fine with that also. Have no reservation for this approach except above mentioned reasons. > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: master >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.0 > > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014016#comment-17014016 ] Aman Poonia commented on HBASE-22285: - [~stack]- Sir, Planning to create PR's for branch-1 and branch-2 both. There is already a PR for branch-1 will update it to be in sync with master. [~apurtell]- Agreed. Planning to update the PR to include those changes. Or should i backport the changes in different JIRA. What do you suggest? > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Fix For: 3.0.0 > > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22285: Status: Patch Available (was: Open) > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916831#comment-16916831 ] Aman Poonia commented on HBASE-22872: - Looks likes something messed up in master branch patch. Uploaded latest patch [^HBASE-22872.master.001.patch] > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Fix For: 1.5.0, 2.2.1, 1.3.6, 1.4.11, 2.1.7 > > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch, > HBASE-22872.branch-2.patch, HBASE-22872.master.001.patch, > HBASE-22872.master.v01.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.master.001.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Fix For: 1.5.0, 2.2.1, 1.3.6, 1.4.11, 2.1.7 > > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch, > HBASE-22872.branch-2.patch, HBASE-22872.master.001.patch, > HBASE-22872.master.v01.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915753#comment-16915753 ] Aman Poonia commented on HBASE-22872: - Added for master and branch-2. Thanks for the patience. [~reidchan] > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch, > HBASE-22872.branch-2.patch, HBASE-22872.master.v01.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.branch-2.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch, > HBASE-22872.branch-2.patch, HBASE-22872.master.v01.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.master.v01.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch, > HBASE-22872.master.v01.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913087#comment-16913087 ] Aman Poonia commented on HBASE-22872: - Updated patch accommodating review comments for formatting. > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.branch-1.4.005.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch, HBASE-22872.branch-1.4.005.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.branch-1.4.004.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch, > HBASE-22872.branch-1.4.004.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912172#comment-16912172 ] Aman Poonia commented on HBASE-22872: - Thanks for the review [~reidchan] . Fixed the space issue. Not sure about line separation. Somehow hbase_eclipse_formatter.xml applies this formatting when i format the code. Do you want me to not use formatter in this case because it is different then what we expect? > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.branch-1.4.003.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch, HBASE-22872.branch-1.4.003.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912123#comment-16912123 ] Aman Poonia commented on HBASE-22872: - Tests are passing. Can someone help in reviewing this. > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.branch-1.4.002.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch, > HBASE-22872.branch-1.4.002.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Status: Patch Available (was: Open) > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
[ https://issues.apache.org/jira/browse/HBASE-22872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22872: Attachment: HBASE-22872.branch-1.4.001.patch > Don't create normalization plan unnecesarily when split and merge both are > disabled > --- > > Key: HBASE-22872 > URL: https://issues.apache.org/jira/browse/HBASE-22872 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-22872.branch-1.4.001.patch > > > We should not proceed futher in normalization plan creation if split and > merge both are disabled on a table. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HBASE-22872) Don't create normalization plan unnecesarily when split and merge both are disabled
Aman Poonia created HBASE-22872: --- Summary: Don't create normalization plan unnecesarily when split and merge both are disabled Key: HBASE-22872 URL: https://issues.apache.org/jira/browse/HBASE-22872 Project: HBase Issue Type: Improvement Affects Versions: 1.4.10 Reporter: Aman Poonia Assignee: Aman Poonia We should not proceed futher in normalization plan creation if split and merge both are disabled on a table. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860663#comment-16860663 ] Aman Poonia commented on HBASE-22285: - okay. Timestamp thing will not work. If a table has been pre-split and we don't want normalizer to do anything ideally we should not enable normalizer on such table. By default normalizer is disabled on all tables. Only when we enable this property(NORMALIZATION_ENABLED) on a table normalizer will pick it. This will keep things simple. If user feels that this table is not normalized(after it was pre-split) and it needs to be then table property can be changed and normalizer will do its job for the table. > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827535#comment-16827535 ] Aman Poonia commented on HBASE-22285: - Few thoughts # If we have pre-split the table then the creation time of the table and the regions will be very near. This way we can roughly say that this is pre-split region. # If the store’s earliest flush time and the region creation time match(approx) then it is a pre-split scenario(and/or it is a new region) I think 2nd is a cleaner approach here. > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-22285: Description: There are scenarios where we have seen around 5% of total regions with a size of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of regions increase with time considering we have TTL over the rows. After exploring the option of RegionNormalizer and doing some quick runs we found that that is not suitable considering it also splits the regions and merges to normalize. What we really want is to split as per Split policy and merge very small regions with adjacent regions to make sure we reduce 0-byte regions. We can plugin this normalizer using the property "hbase.master.normalizer.class" was: There are scenarios where we have seen around 5% of total regions with a size of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of regions increase with time considering we have TTL over the rows. After exploring the option of RegionNormalizer and doing some quick runs we found that that is not suitable considering it also splits the regions and merges to normalize. What we really want is to split as per Split policy and merge very small regions with adjacent regions to make sure we reduce 0-byte regions. We can plugin the normalizer using the property "hbase.master.normalizer.class" > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823070#comment-16823070 ] Aman Poonia commented on HBASE-22285: - [~lhofhansl] - FYI > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin this normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
Aman Poonia created HBASE-22285: --- Summary: A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize) Key: HBASE-22285 URL: https://issues.apache.org/jira/browse/HBASE-22285 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Aman Poonia There are scenarios where we have seen around 5% of total regions with a size of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of regions increase with time considering we have TTL over the rows. After exploring the option of RegionNormalizer and doing some quick runs we found that that is not suitable considering it also splits the regions and merges to normalize. What we really want is to split as per Split policy and merge very small regions with adjacent regions to make sure we reduce 0-byte regions. We can plugin the normalizer using the property "hbase.master.normalizer.class" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-22285) A normalizer which merges very small size regions with adjacent regions.(MergeToNormalize)
[ https://issues.apache.org/jira/browse/HBASE-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia reassigned HBASE-22285: --- Assignee: Aman Poonia > A normalizer which merges very small size regions with adjacent > regions.(MergeToNormalize) > -- > > Key: HBASE-22285 > URL: https://issues.apache.org/jira/browse/HBASE-22285 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > > There are scenarios where we have seen around 5% of total regions with a size > of 0 bytes and another 5-6 % regions with size in a few bytes. These kinds of > regions increase with time considering we have TTL over the rows. > After exploring the option of RegionNormalizer and doing some quick runs we > found that that is not suitable considering it also splits the regions and > merges to normalize. What we really want is to split as per Split policy and > merge very small regions with adjacent regions to make sure we reduce 0-byte > regions. > We can plugin the normalizer using the property > "hbase.master.normalizer.class" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-14190) Assign system tables ahead of user region assignment
[ https://issues.apache.org/jira/browse/HBASE-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813329#comment-16813329 ] Aman Poonia commented on HBASE-14190: - [~abhishek.chouhan] [~apurtell] - FYI > Assign system tables ahead of user region assignment > > > Key: HBASE-14190 > URL: https://issues.apache.org/jira/browse/HBASE-14190 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 14190-system-wal-v1.txt, 14190-v12.4.txt, 14190-v12.txt > > > Currently the namespace table region is assigned like user regions. > I spent several hours working with a customer where master couldn't finish > initialization. > Even though master was restarted quite a few times, it went down with the > following: > {code} > 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] > master.HMaster: Master server abort: loaded coprocessors are: [] > 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] > master.HMaster: Unhandled exception. Starting shutdown. > java.io.IOException: Timedout 30ms waiting for namespace table to be > assigned > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) > at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:985) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:779) > at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182) > at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646) > at java.lang.Thread.run(Thread.java:744) > {code} > During previous run(s), namespace table was created, hence leaving an entry > in hbase:meta. > The following if block in TableNamespaceManager#start() was skipped: > {code} > if (!MetaTableAccessor.tableExists(masterServices.getConnection(), > TableName.NAMESPACE_TABLE_NAME)) { > {code} > TableNamespaceManager#start() spins, waiting for namespace region to be > assigned. > There was issue in master assigning user regions. > We tried issuing 'assign' command from hbase shell which didn't work because > of the following check in MasterRpcServices#assignRegion(): > {code} > master.checkInitialized(); > {code} > This scenario can be avoided if we assign hbase:namespace table after > hbase:meta is assigned but before user table region assignment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21130) NullPointerException in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-21130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596247#comment-16596247 ] Aman Poonia commented on HBASE-21130: - mostly duplicate of HBASE-21069 > NullPointerException in StoreScanner > > > Key: HBASE-21130 > URL: https://issues.apache.org/jira/browse/HBASE-21130 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.3.1 >Reporter: Chandra Sekhar >Priority: Critical > > I've created a script that will put continuous put one record (size 2.5KB) > and flush immediately -- in middle am doing compaction at regular intervals. > Rate of flushes are around 20flushes/sec. After some time, my RS aborted and > never came up back > with the following error > {code:java} > 2018-08-29 11:34:34,183 DEBUG [flush-table-TestTable_client_1258196-thread-1] > regionserver.RSRpcServices: Closing region operation on > TestTable_client_1,32816,1535513244999.762a3e633b03e5f847f357aca28768d0. > 2018-08-29 11:34:34,183 INFO > [RpcServer.FifoWFPBQ.default.handler=49,queue=4,port=16040] > regionserver.RSRpcServices: flush table task succeed 1, failed 10. > 2018-08-29 11:34:34,280 INFO [MemStoreFlusher.0] > regionserver.DefaultStoreFlusher: Flushed, sequenceid=230, memsize=4.2 K, > hasBloomFilter=false, into tmp file > hdfs://hacluster/hbase/data/hbase/meta/1588230740/.tmp/1cf1deee293848b0bea08940696dbd2a > 2018-08-29 11:34:34,290 INFO [MemStoreFlusher.0] > regionserver.StoreFile$Reader: Loaded Delete Family Bloom > (CompoundBloomFilter) metadata for 1cf1deee293848b0bea08940696dbd2a > 2018-08-29 11:34:34,291 DEBUG [MemStoreFlusher.0] > regionserver.HRegionFileSystem: Committing store file > hdfs://hacluster/hbase/data/hbase/meta/1588230740/.tmp/1cf1deee293848b0bea08940696dbd2a > as > hdfs://hacluster/hbase/data/hbase/meta/1588230740/info/1cf1deee293848b0bea08940696dbd2a > 2018-08-29 11:34:34,304 INFO [MemStoreFlusher.0] > regionserver.StoreFile$Reader: Loaded Delete Family Bloom > (CompoundBloomFilter) metadata for 1cf1deee293848b0bea08940696dbd2a > 2018-08-29 11:34:34,304 INFO [MemStoreFlusher.0] regionserver.HStore: Added > hdfs://hacluster/hbase/data/hbase/meta/1588230740/info/1cf1deee293848b0bea08940696dbd2a, > entries=13, sequenceid=230, filesize=6.6 K > 2018-08-29 11:34:34,307 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: > ABORTING region server host-,16040,1535454741321: Replay of WAL required. > Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: hbase:meta,,1 > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2578) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2108) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2034) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:505) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:475) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:263) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at java.util.ArrayList.(ArrayList.java:177) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:826) > at > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1117) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1090) > at > org.apache.hadoop.hbase.regionserver.HStore.access$700(HStore.java:120) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2450) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2533) > ... 9 more > 2018-08-29 11:34:34,307 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: > RegionServer abort: loaded coprocessors are: > [org.apache.hadoop.hbase.security.access.AccessController, > org.apache.hadoop.hbase. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Status: Patch Available (was: In Progress) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.master.01.patch, HBASE-19835.master.02.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356579#comment-16356579 ] Aman Poonia commented on HBASE-19835: - Rebased. [~mdrob] - Was analyzing FindBugs report. > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.master.01.patch, HBASE-19835.master.02.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Status: In Progress (was: Patch Available) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.master.01.patch, HBASE-19835.master.02.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Attachment: HBASE-19835.master.02.patch > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.master.01.patch, HBASE-19835.master.02.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Status: Patch Available (was: Open) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.master.01.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Attachment: (was: master.v02.patch) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Attachment: HBASE-19835.master.01.patch > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.master.01.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Attachment: (was: HBASE-19835.patch) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Status: Open (was: Patch Available) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Status: Patch Available (was: Open) > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Attachment: HBASE-19835.patch > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > Attachments: HBASE-19835.patch > > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19835) Make explicit casting of atleast one operand to final type
[ https://issues.apache.org/jira/browse/HBASE-19835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-19835: Attachment: master.v02.patch > Make explicit casting of atleast one operand to final type > -- > > Key: HBASE-19835 > URL: https://issues.apache.org/jira/browse/HBASE-19835 > Project: HBase > Issue Type: Bug > Components: hbase >Affects Versions: 3.0.0 >Reporter: Aman Poonia >Priority: Minor > > We have used > _long = int + int_ > at many places mostly wherever ClassSize.java variables are used for > calculation. > Need to cast explicitly at-least one operand to final type(i.e. type the > result is intended to be casted). > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19835) Make explicit casting of atleast one operand to final type
Aman Poonia created HBASE-19835: --- Summary: Make explicit casting of atleast one operand to final type Key: HBASE-19835 URL: https://issues.apache.org/jira/browse/HBASE-19835 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 3.0.0 Reporter: Aman Poonia We have used _long = int + int_ at many places mostly wherever ClassSize.java variables are used for calculation. Need to cast explicitly at-least one operand to final type(i.e. type the result is intended to be casted). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-17304) Avoid draining region servers in draining mode while moving the regions at client side i.e. in region_mover.rb
[ https://issues.apache.org/jira/browse/HBASE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745443#comment-15745443 ] Aman Poonia commented on HBASE-17304: - [~abhishek.chouhan] > Avoid draining region servers in draining mode while moving the regions at > client side i.e. in region_mover.rb > -- > > Key: HBASE-17304 > URL: https://issues.apache.org/jira/browse/HBASE-17304 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Aman Poonia > > While using region_mover.rb in our testing we faced an issue where > region_mover was taking a lot of time to unload regions. > For instance we take 10 region server and create draining znode for 8. So > while moving the regions region_mover.rb doesn't take draining znode in > consideration and it tries to move the region in all the region server which > are available so it fails at the server side and then it sleeps for some > second and retries 5 time for each region and iterates over all the region > servers. This makes this process really slow in extreme cases as mentioned > above where a lot of nodes in a cluster are in draining mode. > We need to get the list of draining region servers so that we can avoid them > while moving regions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17304) Avoid draining region servers in draining mode while moving the regions at client side i.e. in region_mover.rb
Aman Poonia created HBASE-17304: --- Summary: Avoid draining region servers in draining mode while moving the regions at client side i.e. in region_mover.rb Key: HBASE-17304 URL: https://issues.apache.org/jira/browse/HBASE-17304 Project: HBase Issue Type: Improvement Components: Client Reporter: Aman Poonia While using region_mover.rb in our testing we faced an issue where region_mover was taking a lot of time to unload regions. For instance we take 10 region server and create draining znode for 8. So while moving the regions region_mover.rb doesn't take draining znode in consideration and it tries to move the region in all the region server which are available so it fails at the server side and then it sleeps for some second and retries 5 time for each region and iterates over all the region servers. This makes this process really slow in extreme cases as mentioned above where a lot of nodes in a cluster are in draining mode. We need to get the list of draining region servers so that we can avoid them while moving region. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17304) Avoid draining region servers in draining mode while moving the regions at client side i.e. in region_mover.rb
[ https://issues.apache.org/jira/browse/HBASE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Poonia updated HBASE-17304: Description: While using region_mover.rb in our testing we faced an issue where region_mover was taking a lot of time to unload regions. For instance we take 10 region server and create draining znode for 8. So while moving the regions region_mover.rb doesn't take draining znode in consideration and it tries to move the region in all the region server which are available so it fails at the server side and then it sleeps for some second and retries 5 time for each region and iterates over all the region servers. This makes this process really slow in extreme cases as mentioned above where a lot of nodes in a cluster are in draining mode. We need to get the list of draining region servers so that we can avoid them while moving regions. was: While using region_mover.rb in our testing we faced an issue where region_mover was taking a lot of time to unload regions. For instance we take 10 region server and create draining znode for 8. So while moving the regions region_mover.rb doesn't take draining znode in consideration and it tries to move the region in all the region server which are available so it fails at the server side and then it sleeps for some second and retries 5 time for each region and iterates over all the region servers. This makes this process really slow in extreme cases as mentioned above where a lot of nodes in a cluster are in draining mode. We need to get the list of draining region servers so that we can avoid them while moving region. > Avoid draining region servers in draining mode while moving the regions at > client side i.e. in region_mover.rb > -- > > Key: HBASE-17304 > URL: https://issues.apache.org/jira/browse/HBASE-17304 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Aman Poonia > > While using region_mover.rb in our testing we faced an issue where > region_mover was taking a lot of time to unload regions. > For instance we take 10 region server and create draining znode for 8. So > while moving the regions region_mover.rb doesn't take draining znode in > consideration and it tries to move the region in all the region server which > are available so it fails at the server side and then it sleeps for some > second and retries 5 time for each region and iterates over all the region > servers. This makes this process really slow in extreme cases as mentioned > above where a lot of nodes in a cluster are in draining mode. > We need to get the list of draining region servers so that we can avoid them > while moving regions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)