[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-08-03 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r464645151



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   Replying to my earlier comment,
   
   > So let me propose this: rather than basing this decision on a scaling 
factor, how about we decide on some arbitrary value to use as meaning 
"effectively 0" in size.
   
   Per my above, and @mnpoonia 's, we've already decided that "effectively 0" 
means "less than 1mb according to reported metrics." Thus I think we should 
keep this "special logic".





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-07-02 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r449304240



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   There hasn't been much feedback on the DISCUSS thread.
   
   I'm of the mind that we can resolve this specific bug using the special-case 
(`if == 0`) code I have in this patch. I think we should return with another 
change that allows the `min_*` values to be unset, and probably remove the 
defaults for everything except `min_region_age.days`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-26 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r446418314



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   Oh, and `hbase.normalizer.min.region.count` as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-26 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r446413487



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   I think what we need is to unset the default values of 
`hbase.normalizer.merge.min_region_age.days` and 
`hbase.normalizer.merge.min_region_size.mb`. If someone wants the features, 
make them opt-in.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-26 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r446387518



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   Argh. This idea is in direct contradiction to the feature implemented by 
"The minimum size for a region to be considered for a merge, in whole MBs."
   
   ```
 static final String MERGE_MIN_REGION_SIZE_MB_KEY = 
"hbase.normalizer.merge.min_region_size.mb";
 static final int DEFAULT_MERGE_MIN_REGION_SIZE_MB = 1;
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-26 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r446324754



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   @huaxiangsun I've been thinking about this and haven't come to a formula 
that I like. So let me propose this: rather than basing this decision on a 
scaling factor, how about we decide on some arbitrary value to use as meaning 
"effectively 0" in size. Say, any region <= 10mb will be considered as 
"effectively 0-sized" and we'll merge it into it's neighbor when possible. I 
supposed we can make this configurable as well...
   
   What do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-26 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r446323076



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java
##
@@ -78,4 +82,30 @@ public void execute(Admin admin) {
   LOG.error("Error during region merge: ", ex);
 }
   }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+
+MergeNormalizationPlan that = (MergeNormalizationPlan) o;
+
+return new EqualsBuilder()
+  .append(firstRegion, that.firstRegion)
+  .append(secondRegion, that.secondRegion)
+  .isEquals();
+  }
+
+  @Override
+  public int hashCode() {
+return new HashCodeBuilder(17, 37)

Review comment:
   From 
[javadoc](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html#equals(java.lang.Object))
 on `Object#equals(Object)`
   
   > Note that it is generally necessary to override the `hashCode` method 
whenever this method is overridden, so as to maintain the general contract for 
the `hashCode` method, which states that equal objects must have equal hash 
codes.
   
   And besides since I want to treat these objects like POJOs, it is useful to 
be able to use them in maps and sets.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-26 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r446319764



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java
##
@@ -78,4 +82,30 @@ public void execute(Admin admin) {
   LOG.error("Error during region merge: ", ex);
 }
   }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+
+MergeNormalizationPlan that = (MergeNormalizationPlan) o;
+
+return new EqualsBuilder()
+  .append(firstRegion, that.firstRegion)
+  .append(secondRegion, that.secondRegion)
+  .isEquals();
+  }
+
+  @Override
+  public int hashCode() {

Review comment:
   Actually, I think i'd like to keep them. I'm finding they're handy for 
some of the experimental changes I've been trying. I'd prefer keep them for now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-23 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r444268447



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java
##
@@ -78,4 +82,30 @@ public void execute(Admin admin) {
   LOG.error("Error during region merge: ", ex);
 }
   }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+
+MergeNormalizationPlan that = (MergeNormalizationPlan) o;
+
+return new EqualsBuilder()
+  .append(firstRegion, that.firstRegion)
+  .append(secondRegion, that.secondRegion)
+  .isEquals();
+  }
+
+  @Override
+  public int hashCode() {
+return new HashCodeBuilder(17, 37)

Review comment:
   So what I really wanted was `equals`, so that I could treat these 
objects as POJOs and do simple comparisons in unit tests. In java, it's a bad 
idea to implement a custom `equals` without also implementing `hashCode`, so I 
did both. Well, I generated both from IntelliJ without much thought about it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-22 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r443876595



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java
##
@@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates 
regionStates, final RegionInfo r
   }
   final long currentSizeMb = getRegionSizeMB(current);
   final long nextSizeMb = getRegionSizeMB(next);
-  if (currentSizeMb + nextSizeMb < avgRegionSizeMb) {
+  // always merge away empty regions when they present themselves.
+  if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb 
< avgRegionSizeMb) {

Review comment:
   I like this idea of a fuzzy threshold idea. What if we merge a little 
more aggressively, expressed relative to `avgRegionSizeMb`? Something like
   
   ```
   if (currentSizeMb + nextSizeMb < avgRegionSizeMb * 0.4) {...}
   ```
   
   This gives us a strong preference toward larger regions, with a threshold 
based on the average size.
   
   I guess next you'll same "make it configurable" :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-22 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r443873027



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java
##
@@ -78,4 +82,30 @@ public void execute(Admin admin) {
   LOG.error("Error during region merge: ", ex);
 }
   }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+
+MergeNormalizationPlan that = (MergeNormalizationPlan) o;
+
+return new EqualsBuilder()
+  .append(firstRegion, that.firstRegion)
+  .append(secondRegion, that.secondRegion)
+  .isEquals();
+  }
+
+  @Override
+  public int hashCode() {

Review comment:
   Yeah, seems they're unused. As is 
`SplitNormalizationPlan#getRegionInfo()`. Seems a little strange to have a POJO 
without public accessors, to include these members in `toString` but not have 
accessors, but yeah, less data visibility is better data visibility.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size

2020-06-22 Thread GitBox


ndimiduk commented on a change in pull request #1922:
URL: https://github.com/apache/hbase/pull/1922#discussion_r443871798



##
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java
##
@@ -78,4 +82,30 @@ public void execute(Admin admin) {
   LOG.error("Error during region merge: ", ex);
 }
   }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+
+MergeNormalizationPlan that = (MergeNormalizationPlan) o;
+
+return new EqualsBuilder()
+  .append(firstRegion, that.firstRegion)
+  .append(secondRegion, that.secondRegion)
+  .isEquals();
+  }
+
+  @Override
+  public int hashCode() {
+return new HashCodeBuilder(17, 37)

Review comment:
   And make the object instances that much larger? Is that really helpful? 
I think of this as a premature optimization, something the JIT can handle for 
me if it thinks so.
   
   If you feel strongly about it, I suppose...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org