subject:"\[GitHub\] \[hbase\] ndimiduk commented on a change in pull request #1786\: HBASE\-24418 Consolidate Normalizer implementations"

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-06-02 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r434090537



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:
+ * 
+ *   Whether to split a region as part of normalization. Configuration:
+ * {@value SPLIT_ENABLED_KEY}, default: {@value 
DEFAULT_SPLIT_ENABLED}.
+ *   Whether to merge a region as part of normalization. Configuration:
+ * {@value MERGE_ENABLED_KEY}, default: {@value 
DEFAULT_MERGE_ENABLED}.
+ *   The minimum number of regions in a table to consider it for 
normalization. Configuration:
+ * {@value MIN_REGION_COUNT_KEY}, default: {@value 
DEFAULT_MIN_REGION_COUNT}.
+ *   The minimum age for a region to be considered for a merge, in days. 
Configuration:
+ * {@value MERGE_MIN_REGION_AGE_DAYS_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_AGE_DAYS}.
+ *   The minimum size for a region to be considered for a merge, in whole 
MBs. Configuration:
+ * {@value MERGE_MIN_REGION_SIZE_MB_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_SIZE_MB}.
  * 
  * 
- * Region sizes are coarse and approximate on the order of megabytes. 
Additionally, "empty" regions
- * (less than 1MB, with the previous note) are not merged away. This is by 
design to prevent
- * normalization from undoing the pre-splitting of a table.
+ * To see detailed logging of the application of these configuration values, 
set the log level for
+ * this class to `TRACE`.
  */
-@InterfaceAudience.Private
-public class SimpleRegionNormalizer extends AbstractRegionNormalizer {
-
+@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
+public class SimpleRegionNormalizer implements RegionNormalizer {

Review comment:
   Yes, I have kept the interface for now, and posted a question to the dev 
list. If there's no responses there, I'll file a new issue to deprecate and 
remove it. I think the soonest we can remove a configuration point is for 3.0 
though, so it would merely be deprecated for branch-2 onward.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-06-02 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r434088560



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,474 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
-import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.

Review comment:
   Rather than slop, my recommendation is that we add the concept of a 
"minimum region size" for deciding when to split. I filed 
[HBASE-24464](https://issues.apache.org/jira/browse/HBASE-24464) for that 
discussion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-06-01 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r433523464



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:
+ * 
+ *   Whether to split a region as part of normalization. Configuration:
+ * {@value SPLIT_ENABLED_KEY}, default: {@value 
DEFAULT_SPLIT_ENABLED}.
+ *   Whether to merge a region as part of normalization. Configuration:
+ * {@value MERGE_ENABLED_KEY}, default: {@value 
DEFAULT_MERGE_ENABLED}.
+ *   The minimum number of regions in a table to consider it for 
normalization. Configuration:
+ * {@value MIN_REGION_COUNT_KEY}, default: {@value 
DEFAULT_MIN_REGION_COUNT}.
+ *   The minimum age for a region to be considered for a merge, in days. 
Configuration:
+ * {@value MERGE_MIN_REGION_AGE_DAYS_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_AGE_DAYS}.
+ *   The minimum size for a region to be considered for a merge, in whole 
MBs. Configuration:
+ * {@value MERGE_MIN_REGION_SIZE_MB_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_SIZE_MB}.
  * 
  * 
- * Region sizes are coarse and approximate on the order of megabytes. 
Additionally, "empty" regions
- * (less than 1MB, with the previous note) are not merged away. This is by 
design to prevent
- * normalization from undoing the pre-splitting of a table.
+ * To see detailed logging of the application of these configuration values, 
set the log level for
+ * this class to `TRACE`.
  */
-@InterfaceAudience.Private
-public class SimpleRegionNormalizer extends AbstractRegionNormalizer {
-
+@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
+public class SimpleRegionNormalizer implements RegionNormalizer {
   private static final Logger LOG = 
LoggerFactory.getLogger(SimpleRegionNormalizer.class);
-  private static long[] skippedCount = new 
long[NormalizationPlan.PlanType.values().length];
+
+  static final String SPLIT_ENABLED_KEY = "hbase.normalizer.split.enabled";
+  static final boolean DEFAULT_SPLIT_ENABLED = true;
+  static final String MERGE_ENABLED_KEY = "hbase.normalizer.merge.enabled";
+  static final boolean DEFAULT_MERGE_ENABLED = true;
+  // TODO: after HBASE-24416, `min.region.count` only applies to merge plans; 
should
+  //  deprecate/rename the configuration key.
+  static final String MIN_REGION_COUNT_KEY = 
"hbase.normalizer.min.region.count";
+  static final int DEFAULT_MIN_REGION_COUNT = 3;
+  static final String MERGE_MIN_REGION_AGE_DAYS_KEY = 
"hbase.normalizer.merge.min_region_age.days";
+  static final int DEFAULT_MERGE_MIN_REGION_AGE_DAYS = 3;
+  static final String

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432193207



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SplitPlanFirstComparator.java
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.master.normalizer;
+
+import java.util.Comparator;
+
+/**
+ * Comparator class that gives higher priority to {@link 
SplitNormalizationPlan}.
+ */
+class SplitPlanFirstComparator implements Comparator {

Review comment:
   Reading through the old patches, i think this is an appendix we can 
safely discard.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432182388



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:
+ * 
+ *   Whether to split a region as part of normalization. Configuration:
+ * {@value SPLIT_ENABLED_KEY}, default: {@value 
DEFAULT_SPLIT_ENABLED}.
+ *   Whether to merge a region as part of normalization. Configuration:
+ * {@value MERGE_ENABLED_KEY}, default: {@value 
DEFAULT_MERGE_ENABLED}.
+ *   The minimum number of regions in a table to consider it for 
normalization. Configuration:
+ * {@value MIN_REGION_COUNT_KEY}, default: {@value 
DEFAULT_MIN_REGION_COUNT}.
+ *   The minimum age for a region to be considered for a merge, in days. 
Configuration:
+ * {@value MERGE_MIN_REGION_AGE_DAYS_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_AGE_DAYS}.
+ *   The minimum size for a region to be considered for a merge, in whole 
MBs. Configuration:
+ * {@value MERGE_MIN_REGION_SIZE_MB_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_SIZE_MB}.
  * 
  * 
- * Region sizes are coarse and approximate on the order of megabytes. 
Additionally, "empty" regions
- * (less than 1MB, with the previous note) are not merged away. This is by 
design to prevent
- * normalization from undoing the pre-splitting of a table.
+ * To see detailed logging of the application of these configuration values, 
set the log level for
+ * this class to `TRACE`.
  */
-@InterfaceAudience.Private
-public class SimpleRegionNormalizer extends AbstractRegionNormalizer {
-
+@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
+public class SimpleRegionNormalizer implements RegionNormalizer {

Review comment:
   Sent a question over on the dev list.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432176087



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.

Review comment:
   https://issues.apache.org/jira/browse/HBASE-24465





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432174796



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.

Review comment:
   https://issues.apache.org/jira/browse/HBASE-24464





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432171658



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##
@@ -1911,43 +1912,51 @@ public boolean normalizeRegions() throws IOException {
   return false;
 }
 
-synchronized (this.normalizer) {
+if (!normalizationInProgressLock.tryLock()) {
   // Don't run the normalizer concurrently
+  LOG.info("Normalization already in progress. Skipping request.");
+} else {
+  try {
+List allEnabledTables = new ArrayList<>(
+  tableStateManager.getTablesInStates(TableState.State.ENABLED));
+Collections.shuffle(allEnabledTables);
 
-  List allEnabledTables = new ArrayList<>(
-this.tableStateManager.getTablesInStates(TableState.State.ENABLED));
-
-  Collections.shuffle(allEnabledTables);
-
-  for (TableName table : allEnabledTables) {
-TableDescriptor tblDesc = getTableDescriptors().get(table);
-if (table.isSystemTable() || (tblDesc != null &&
-!tblDesc.isNormalizationEnabled())) {
-  LOG.trace("Skipping normalization for {}, as it's either system"
-  + " table or doesn't have auto normalization turned on", table);
-  continue;
-}
+for (TableName table : allEnabledTables) {
+  if (table.isSystemTable()) {
+continue;
+  }
+  final TableDescriptor tblDesc = getTableDescriptors().get(table);
+  if (tblDesc != null && !tblDesc.isNormalizationEnabled()) {
+LOG.debug(
+  "Skipping {} because normalization is disabled in its table 
properties.", table);
+continue;
+  }
 
-// make one last check that the cluster isn't shutting down before 
proceeding.
-if (skipRegionManagementAction("region normalizer")) {
-  return false;
-}
+  // make one last check that the cluster isn't shutting down before 
proceeding.
+  if (skipRegionManagementAction("region normalizer")) {
+return false;
+  }
 
-final List plans = 
this.normalizer.computePlanForTable(table);
-if (CollectionUtils.isEmpty(plans)) {
-  return true;
-}
+  final List plans = 
normalizer.computePlansForTable(table);
+  if (CollectionUtils.isEmpty(plans)) {
+return true;
+  }
 
-try (final Admin admin = 
asyncClusterConnection.toConnection().getAdmin()) {
-  for (NormalizationPlan plan : plans) {
-plan.execute(admin);
-if (plan.getType() == PlanType.SPLIT) {
-  splitPlanCount++;
-} else if (plan.getType() == PlanType.MERGE) {
-  mergePlanCount++;
+  try (final Admin admin = 
asyncClusterConnection.toConnection().getAdmin()) {
+// as of this writing, `plan.execute()` is non-blocking, so 
there's no artificial rate-
+// limiting of merge requests due to this serial loop.

Review comment:
   https://issues.apache.org/jira/browse/HBASE-24463





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432168513



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:
+ * 
+ *   Whether to split a region as part of normalization. Configuration:
+ * {@value SPLIT_ENABLED_KEY}, default: {@value 
DEFAULT_SPLIT_ENABLED}.
+ *   Whether to merge a region as part of normalization. Configuration:
+ * {@value MERGE_ENABLED_KEY}, default: {@value 
DEFAULT_MERGE_ENABLED}.
+ *   The minimum number of regions in a table to consider it for 
normalization. Configuration:
+ * {@value MIN_REGION_COUNT_KEY}, default: {@value 
DEFAULT_MIN_REGION_COUNT}.
+ *   The minimum age for a region to be considered for a merge, in days. 
Configuration:
+ * {@value MERGE_MIN_REGION_AGE_DAYS_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_AGE_DAYS}.
+ *   The minimum size for a region to be considered for a merge, in whole 
MBs. Configuration:
+ * {@value MERGE_MIN_REGION_SIZE_MB_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_SIZE_MB}.
  * 
  * 
- * Region sizes are coarse and approximate on the order of megabytes. 
Additionally, "empty" regions
- * (less than 1MB, with the previous note) are not merged away. This is by 
design to prevent
- * normalization from undoing the pre-splitting of a table.
+ * To see detailed logging of the application of these configuration values, 
set the log level for
+ * this class to `TRACE`.
  */
-@InterfaceAudience.Private
-public class SimpleRegionNormalizer extends AbstractRegionNormalizer {
-
+@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
+public class SimpleRegionNormalizer implements RegionNormalizer {
   private static final Logger LOG = 
LoggerFactory.getLogger(SimpleRegionNormalizer.class);
-  private static long[] skippedCount = new 
long[NormalizationPlan.PlanType.values().length];
+
+  static final String SPLIT_ENABLED_KEY = "hbase.normalizer.split.enabled";
+  static final boolean DEFAULT_SPLIT_ENABLED = true;
+  static final String MERGE_ENABLED_KEY = "hbase.normalizer.merge.enabled";
+  static final boolean DEFAULT_MERGE_ENABLED = true;
+  // TODO: after HBASE-24416, `min.region.count` only applies to merge plans; 
should
+  //  deprecate/rename the configuration key.
+  static final String MIN_REGION_COUNT_KEY = 
"hbase.normalizer.min.region.count";
+  static final int DEFAULT_MIN_REGION_COUNT = 3;
+  static final String MERGE_MIN_REGION_AGE_DAYS_KEY = 
"hbase.normalizer.merge.min_region_age.days";
+  static final int DEFAULT_MERGE_MIN_REGION_AGE_DAYS = 3;
+  static final String

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432148644



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:
+ * 
+ *   Whether to split a region as part of normalization. Configuration:
+ * {@value SPLIT_ENABLED_KEY}, default: {@value 
DEFAULT_SPLIT_ENABLED}.
+ *   Whether to merge a region as part of normalization. Configuration:
+ * {@value MERGE_ENABLED_KEY}, default: {@value 
DEFAULT_MERGE_ENABLED}.
+ *   The minimum number of regions in a table to consider it for 
normalization. Configuration:
+ * {@value MIN_REGION_COUNT_KEY}, default: {@value 
DEFAULT_MIN_REGION_COUNT}.
+ *   The minimum age for a region to be considered for a merge, in days. 
Configuration:
+ * {@value MERGE_MIN_REGION_AGE_DAYS_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_AGE_DAYS}.
+ *   The minimum size for a region to be considered for a merge, in whole 
MBs. Configuration:
+ * {@value MERGE_MIN_REGION_SIZE_MB_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_SIZE_MB}.
  * 
  * 
- * Region sizes are coarse and approximate on the order of megabytes. 
Additionally, "empty" regions
- * (less than 1MB, with the previous note) are not merged away. This is by 
design to prevent
- * normalization from undoing the pre-splitting of a table.
+ * To see detailed logging of the application of these configuration values, 
set the log level for
+ * this class to `TRACE`.
  */
-@InterfaceAudience.Private
-public class SimpleRegionNormalizer extends AbstractRegionNormalizer {
-
+@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
+public class SimpleRegionNormalizer implements RegionNormalizer {
   private static final Logger LOG = 
LoggerFactory.getLogger(SimpleRegionNormalizer.class);
-  private static long[] skippedCount = new 
long[NormalizationPlan.PlanType.values().length];
+
+  static final String SPLIT_ENABLED_KEY = "hbase.normalizer.split.enabled";
+  static final boolean DEFAULT_SPLIT_ENABLED = true;
+  static final String MERGE_ENABLED_KEY = "hbase.normalizer.merge.enabled";
+  static final boolean DEFAULT_MERGE_ENABLED = true;
+  // TODO: after HBASE-24416, `min.region.count` only applies to merge plans; 
should
+  //  deprecate/rename the configuration key.
+  static final String MIN_REGION_COUNT_KEY = 
"hbase.normalizer.min.region.count";
+  static final int DEFAULT_MIN_REGION_COUNT = 3;
+  static final String MERGE_MIN_REGION_AGE_DAYS_KEY = 
"hbase.normalizer.merge.min_region_age.days";
+  static final int DEFAULT_MERGE_MIN_REGION_AGE_DAYS = 3;
+  static final String

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-28 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r432134487



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:
+ * 
+ *   Whether to split a region as part of normalization. Configuration:
+ * {@value SPLIT_ENABLED_KEY}, default: {@value 
DEFAULT_SPLIT_ENABLED}.
+ *   Whether to merge a region as part of normalization. Configuration:
+ * {@value MERGE_ENABLED_KEY}, default: {@value 
DEFAULT_MERGE_ENABLED}.
+ *   The minimum number of regions in a table to consider it for 
normalization. Configuration:
+ * {@value MIN_REGION_COUNT_KEY}, default: {@value 
DEFAULT_MIN_REGION_COUNT}.
+ *   The minimum age for a region to be considered for a merge, in days. 
Configuration:
+ * {@value MERGE_MIN_REGION_AGE_DAYS_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_AGE_DAYS}.
+ *   The minimum size for a region to be considered for a merge, in whole 
MBs. Configuration:
+ * {@value MERGE_MIN_REGION_SIZE_MB_KEY}, default:
+ * {@value DEFAULT_MERGE_MIN_REGION_SIZE_MB}.
  * 
  * 
- * Region sizes are coarse and approximate on the order of megabytes. 
Additionally, "empty" regions
- * (less than 1MB, with the previous note) are not merged away. This is by 
design to prevent
- * normalization from undoing the pre-splitting of a table.
+ * To see detailed logging of the application of these configuration values, 
set the log level for
+ * this class to `TRACE`.
  */
-@InterfaceAudience.Private
-public class SimpleRegionNormalizer extends AbstractRegionNormalizer {
-
+@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.CONFIG)
+public class SimpleRegionNormalizer implements RegionNormalizer {
   private static final Logger LOG = 
LoggerFactory.getLogger(SimpleRegionNormalizer.class);
-  private static long[] skippedCount = new 
long[NormalizationPlan.PlanType.values().length];
+
+  static final String SPLIT_ENABLED_KEY = "hbase.normalizer.split.enabled";
+  static final boolean DEFAULT_SPLIT_ENABLED = true;
+  static final String MERGE_ENABLED_KEY = "hbase.normalizer.merge.enabled";
+  static final boolean DEFAULT_MERGE_ENABLED = true;
+  // TODO: after HBASE-24416, `min.region.count` only applies to merge plans; 
should
+  //  deprecate/rename the configuration key.
+  static final String MIN_REGION_COUNT_KEY = 
"hbase.normalizer.min.region.count";
+  static final int DEFAULT_MIN_REGION_COUNT = 3;
+  static final String MERGE_MIN_REGION_AGE_DAYS_KEY = 
"hbase.normalizer.merge.min_region_age.days";
+  static final int DEFAULT_MERGE_MIN_REGION_AGE_DAYS = 3;
+  static final String

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-27 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r431473726



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.
+ *   Otherwise, for the next region in the chain R1, if R0 + R1 is smaller 
then S, R0 and R1
+ * are kindly requested to merge.
+ * 
+ * 
+ * The following parameters are configurable:

Review comment:
   I've done some word-smithing here. However, the bulk of the 
documentation on this feature appears to have been copy-pasted from a post to 
Hortonworks forums. Rather than edit this work, I would prefer we start from 
scratch with our own documentation, so as to not mis-represent the work of that 
individual.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-27 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r431442449



##
File path: 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/normalizer/TestSimpleRegionNormalizer.java
##
@@ -69,517 +78,347 @@
   public static final HBaseClassTestRule CLASS_RULE =
   HBaseClassTestRule.forClass(TestSimpleRegionNormalizer.class);
 
-  private static final Logger LOG = 
LoggerFactory.getLogger(TestSimpleRegionNormalizer.class);
-
-  private RegionNormalizer normalizer;
+  private Configuration conf;
+  private SimpleRegionNormalizer normalizer;
   private MasterServices masterServices;
 
   @Rule
   public TestName name = new TestName();
 
-  @Test
-  public void testPlanComparator() {
-Comparator comparator = new 
SimpleRegionNormalizer.PlanComparator();
-NormalizationPlan splitPlan1 = new SplitNormalizationPlan(null, null);
-NormalizationPlan splitPlan2 = new SplitNormalizationPlan(null, null);
-NormalizationPlan mergePlan1 = new MergeNormalizationPlan(null, null);
-NormalizationPlan mergePlan2 = new MergeNormalizationPlan(null, null);
-
-assertEquals(0, comparator.compare(splitPlan1, splitPlan2));
-assertEquals(0, comparator.compare(splitPlan2, splitPlan1));
-assertEquals(0, comparator.compare(mergePlan1, mergePlan2));
-assertEquals(0, comparator.compare(mergePlan2, mergePlan1));
-assertTrue(comparator.compare(splitPlan1, mergePlan1) < 0);
-assertTrue(comparator.compare(mergePlan1, splitPlan1) > 0);
+  @Before
+  public void before() {
+conf = HBaseConfiguration.create();
   }
 
   @Test
-  public void testNoNormalizationForMetaTable() throws HBaseIOException {
+  public void testNoNormalizationForMetaTable() {
 TableName testTable = TableName.META_TABLE_NAME;
 List RegionInfo = new ArrayList<>();
 Map regionSizes = new HashMap<>();
 
 setupMocksForNormalizer(regionSizes, RegionInfo);
-List plans = normalizer.computePlanForTable(testTable);
-assertNull(plans);
+List plans = normalizer.computePlansForTable(testTable);
+assertThat(plans, empty());
   }
 
   @Test
-  public void testNoNormalizationIfTooFewRegions() throws HBaseIOException {
+  public void testNoNormalizationIfTooFewRegions() {
 final TableName tableName = TableName.valueOf(name.getMethodName());
-List RegionInfo = new ArrayList<>();
-Map regionSizes = new HashMap<>();
-RegionInfo hri1 = RegionInfoBuilder.newBuilder(tableName)
-.setStartKey(Bytes.toBytes("aaa"))
-.setEndKey(Bytes.toBytes("bbb"))
-.build();
-RegionInfo.add(hri1);
-regionSizes.put(hri1.getRegionName(), 10);
-
-RegionInfo hri2 = RegionInfoBuilder.newBuilder(tableName)
-.setStartKey(Bytes.toBytes("bbb"))
-.setEndKey(Bytes.toBytes("ccc"))
-.build();
-RegionInfo.add(hri2);
-regionSizes.put(hri2.getRegionName(), 15);
+final List regionInfos = createRegionInfos(tableName, 2);
+final Map regionSizes = createRegionSizesMap(regionInfos, 
10, 15);
+setupMocksForNormalizer(regionSizes, regionInfos);
 
-setupMocksForNormalizer(regionSizes, RegionInfo);
-List plans = normalizer.computePlanForTable(tableName);
-assertNull(plans);
+List plans = normalizer.computePlansForTable(tableName);
+assertThat(plans, empty());
   }
 
   @Test
-  public void testNoNormalizationOnNormalizedCluster() throws HBaseIOException 
{
+  public void testNoNormalizationOnNormalizedCluster() {
 final TableName tableName = TableName.valueOf(name.getMethodName());
-List RegionInfo = new ArrayList<>();
-Map regionSizes = new HashMap<>();
-
-RegionInfo hri1 = RegionInfoBuilder.newBuilder(tableName)
-.setStartKey(Bytes.toBytes("aaa"))
-.setEndKey(Bytes.toBytes("bbb"))
-.build();
-RegionInfo.add(hri1);
-regionSizes.put(hri1.getRegionName(), 10);
-
-RegionInfo hri2 = RegionInfoBuilder.newBuilder(tableName)
-.setStartKey(Bytes.toBytes("bbb"))
-.setEndKey(Bytes.toBytes("ccc"))
-.build();
-RegionInfo.add(hri2);
-regionSizes.put(hri2.getRegionName(), 15);
-
-RegionInfo hri3 = RegionInfoBuilder.newBuilder(tableName)
-.setStartKey(Bytes.toBytes("ccc"))
-.setEndKey(Bytes.toBytes("ddd"))
-.build();
-RegionInfo.add(hri3);
-regionSizes.put(hri3.getRegionName(), 8);
-
-RegionInfo hri4 = RegionInfoBuilder.newBuilder(tableName)
-.setStartKey(Bytes.toBytes("ddd"))
-.setEndKey(Bytes.toBytes("eee"))
-.build();
-regionSizes.put(hri4.getRegionName(), 10);
+final List regionInfos = createRegionInfos(tableName, 4);
+final Map regionSizes = createRegionSizesMap(regionInfos, 
10, 15, 8, 10);
+setupMocksForNormalizer(regionSizes, regionInfos);
 
-setupMocksForNormalizer(regionSizes, RegionInfo);
-List plans = normalizer.computePlanForTable(tableName);
-assertNull(plans);
+List plans =

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-27 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r431437782



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##
@@ -1911,43 +1912,51 @@ public boolean normalizeRegions() throws IOException {
   return false;
 }
 
-synchronized (this.normalizer) {
+if (!normalizationInProgressLock.tryLock()) {

Review comment:
   It hangs out here on the master because, sadly, a bunch of logic is 
implemented in `HMaster`. All the stuff about deciding which tables to 
normalize and actual execution of the generated plans happens in 
`Master#normalizeRegions` (under the same lock we were just discussing). I 
don't love this design, but I think it does make sense that a normalizer 
algorithm exist mostly outside of the context of the master context in which it 
runs. If I were to make the master more of a composition, I would implement a 
class called something like `NormalizationService`, which was responsible for 
instantiating the instance of the `Normalizer` algorithm, registering the 
`Chore` and being the delegation target for invocations coming from RPC. The 
lock would be internal to this `NormalizationService`.

##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##
@@ -400,6 +401,8 @@ public void run() {
   private final LockManager lockManager = new LockManager(this);
 
   private RSGroupBasedLoadBalancer balancer;
+  // a lock to prevent concurrent normalization actions.
+  private final ReentrantLock normalizationInProgressLock = new 
ReentrantLock();

Review comment:
   There is a `RegionNormalizerChore`, yes. It invokes 
`master.normalizeRegions()`. However, the same method can also be called from 
`MasterRpcServices`, so this lock (previously a `synchronized(normalizer) 
block`) prevents the chore and an operator invocation from colliding.

##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##
@@ -797,7 +800,6 @@ protected void initializeZKBasedSystemTrackers()
 this.balancer.setConf(conf);
 this.normalizer = RegionNormalizerFactory.getRegionNormalizer(conf);
 this.normalizer.setMasterServices(this);
-this.normalizer.setMasterRpcServices((MasterRpcServices)rpcServices);

Review comment:
   It was using `masterRpcServices` to reach out to check the split and 
merge switches. I've replaced that with direct calls using `masterServices`.

##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##
@@ -1911,43 +1912,51 @@ public boolean normalizeRegions() throws IOException {
   return false;
 }
 
-synchronized (this.normalizer) {
+if (!normalizationInProgressLock.tryLock()) {
   // Don't run the normalizer concurrently
+  LOG.info("Normalization already in progress. Skipping request.");
+} else {
+  try {
+List allEnabledTables = new ArrayList<>(
+  tableStateManager.getTablesInStates(TableState.State.ENABLED));
+Collections.shuffle(allEnabledTables);
 
-  List allEnabledTables = new ArrayList<>(
-this.tableStateManager.getTablesInStates(TableState.State.ENABLED));
-
-  Collections.shuffle(allEnabledTables);
-
-  for (TableName table : allEnabledTables) {
-TableDescriptor tblDesc = getTableDescriptors().get(table);
-if (table.isSystemTable() || (tblDesc != null &&
-!tblDesc.isNormalizationEnabled())) {
-  LOG.trace("Skipping normalization for {}, as it's either system"
-  + " table or doesn't have auto normalization turned on", table);
-  continue;
-}
+for (TableName table : allEnabledTables) {
+  if (table.isSystemTable()) {
+continue;
+  }
+  final TableDescriptor tblDesc = getTableDescriptors().get(table);
+  if (tblDesc != null && !tblDesc.isNormalizationEnabled()) {
+LOG.debug(
+  "Skipping {} because normalization is disabled in its table 
properties.", table);
+continue;
+  }
 
-// make one last check that the cluster isn't shutting down before 
proceeding.
-if (skipRegionManagementAction("region normalizer")) {
-  return false;
-}
+  // make one last check that the cluster isn't shutting down before 
proceeding.
+  if (skipRegionManagementAction("region normalizer")) {
+return false;
+  }
 
-final List plans = 
this.normalizer.computePlanForTable(table);
-if (CollectionUtils.isEmpty(plans)) {
-  return true;
-}
+  final List plans = 
normalizer.computePlansForTable(table);
+  if (CollectionUtils.isEmpty(plans)) {
+return true;

Review comment:
   Yikes! good point.

##
File path:

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

2020-05-27 Thread GitBox



ndimiduk commented on a change in pull request #1786:
URL: https://github.com/apache/hbase/pull/1786#discussion_r431409692



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
##
@@ -1911,43 +1912,51 @@ public boolean normalizeRegions() throws IOException {
   return false;
 }
 
-synchronized (this.normalizer) {
+if (!normalizationInProgressLock.tryLock()) {

Review comment:
   I wrote the check this way because I prefer, whenever possible, to 
short-circuit a logic path -- the less context a reader needs to have in their 
mind, the better. In this case, the inverted path is shorter, so i put it up 
front, so the reader can see what happens and they that path out of their mind.
   
   Actually, looks like I can short-circuit with a `return` statement, so it 
can be simplified. Thanks for pointing it out.

##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.yetus.audience.InterfaceAudience;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import 
org.apache.hbase.thirdparty.org.apache.commons.collections4.CollectionUtils;
 
 /**
  * Simple implementation of region normalizer. Logic in use:
  * 
- * Get all regions of a given table
- * Get avg size S of each region (by total size of store files reported in 
RegionMetrics)
- * Seek every single region one by one. If a region R0 is bigger than S * 
2, it is kindly
- * requested to split. Thereon evaluate the next region R1
- * Otherwise, if R0 + R1 is smaller than S, R0 and R1 are kindly requested 
to merge. Thereon
- * evaluate the next region R2
- * Otherwise, R1 is evaluated
+ *   Get all regions of a given table
+ *   Get avg size S of the regions in the table (by total size of store 
files reported in
+ * RegionMetrics)
+ *   For each region R0, if R0 is bigger than S * 2, it is kindly 
requested to split.

Review comment:
   As I see it, the `Normalizer` becomes the process that ensure the 
"weight" of regions is event, so that it can be well-balanced by the 
`Balancer`. The region server's ability to split a region at a configured point 
is more of a safety-valve, a way for the region server to protect itself.
   
   Maybe we want an analogue to `hbase.hregion.max.filesize`, a way for 
operators to specify a minimum region size, under this size the `Normalizer` 
won't split it.
   
   I don't know if this subject has been discussed elsewhere.

##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -18,126 +17,436 @@
  */
 package org.apache.hadoop.hbase.master.normalizer;
 
+import java.io.IOException;
+import java.time.Instant;
+import java.time.Period;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
 import java.util.List;
-
-import org.apache.hadoop.hbase.HBaseIOException;
+import java.util.Objects;
+import java.util.function.BooleanSupplier;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseInterfaceAudience;
+import org.apache.hadoop.hbase.RegionMetrics;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.Size;
 import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.MasterSwitchType;
 import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.master.MasterServices;
+import org.apache.hadoop.hbase.master.RegionState;
+import org.apache.hadoop.hbase.master.assignment.RegionStates;
 import org.apache.hadoop.hbase.master.normalizer.NormalizationPlan.PlanType;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

[GitHub] [hbase] ndimiduk commented on a change in pull request #1786: HBASE-24418 Consolidate Normalizer implementations

15 matches

Site Navigation

Mail list logo

Footer information