[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r464645151 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: Replying to my earlier comment, > So let me propose this: rather than basing this decision on a scaling factor, how about we decide on some arbitrary value to use as meaning "effectively 0" in size. Per my above, and @mnpoonia 's, we've already decided that "effectively 0" means "less than 1mb according to reported metrics." Thus I think we should keep this "special logic". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r449304240 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: There hasn't been much feedback on the DISCUSS thread. I'm of the mind that we can resolve this specific bug using the special-case (`if == 0`) code I have in this patch. I think we should return with another change that allows the `min_*` values to be unset, and probably remove the defaults for everything except `min_region_age.days`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r446418314 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: Oh, and `hbase.normalizer.min.region.count` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r446413487 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: I think what we need is to unset the default values of `hbase.normalizer.merge.min_region_age.days` and `hbase.normalizer.merge.min_region_size.mb`. If someone wants the features, make them opt-in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r446387518 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: Argh. This idea is in direct contradiction to the feature implemented by "The minimum size for a region to be considered for a merge, in whole MBs." ``` static final String MERGE_MIN_REGION_SIZE_MB_KEY = "hbase.normalizer.merge.min_region_size.mb"; static final int DEFAULT_MERGE_MIN_REGION_SIZE_MB = 1; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r446324754 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: @huaxiangsun I've been thinking about this and haven't come to a formula that I like. So let me propose this: rather than basing this decision on a scaling factor, how about we decide on some arbitrary value to use as meaning "effectively 0" in size. Say, any region <= 10mb will be considered as "effectively 0-sized" and we'll merge it into it's neighbor when possible. I supposed we can make this configurable as well... What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r446323076 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java ## @@ -78,4 +82,30 @@ public void execute(Admin admin) { LOG.error("Error during region merge: ", ex); } } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} + +if (o == null || getClass() != o.getClass()) { + return false; +} + +MergeNormalizationPlan that = (MergeNormalizationPlan) o; + +return new EqualsBuilder() + .append(firstRegion, that.firstRegion) + .append(secondRegion, that.secondRegion) + .isEquals(); + } + + @Override + public int hashCode() { +return new HashCodeBuilder(17, 37) Review comment: From [javadoc](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html#equals(java.lang.Object)) on `Object#equals(Object)` > Note that it is generally necessary to override the `hashCode` method whenever this method is overridden, so as to maintain the general contract for the `hashCode` method, which states that equal objects must have equal hash codes. And besides since I want to treat these objects like POJOs, it is useful to be able to use them in maps and sets. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r446319764 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java ## @@ -78,4 +82,30 @@ public void execute(Admin admin) { LOG.error("Error during region merge: ", ex); } } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} + +if (o == null || getClass() != o.getClass()) { + return false; +} + +MergeNormalizationPlan that = (MergeNormalizationPlan) o; + +return new EqualsBuilder() + .append(firstRegion, that.firstRegion) + .append(secondRegion, that.secondRegion) + .isEquals(); + } + + @Override + public int hashCode() { Review comment: Actually, I think i'd like to keep them. I'm finding they're handy for some of the experimental changes I've been trying. I'd prefer keep them for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r444268447 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java ## @@ -78,4 +82,30 @@ public void execute(Admin admin) { LOG.error("Error during region merge: ", ex); } } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} + +if (o == null || getClass() != o.getClass()) { + return false; +} + +MergeNormalizationPlan that = (MergeNormalizationPlan) o; + +return new EqualsBuilder() + .append(firstRegion, that.firstRegion) + .append(secondRegion, that.secondRegion) + .isEquals(); + } + + @Override + public int hashCode() { +return new HashCodeBuilder(17, 37) Review comment: So what I really wanted was `equals`, so that I could treat these objects as POJOs and do simple comparisons in unit tests. In java, it's a bad idea to implement a custom `equals` without also implementing `hashCode`, so I did both. Well, I generated both from IntelliJ without much thought about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r443876595 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/SimpleRegionNormalizer.java ## @@ -369,7 +369,8 @@ private boolean skipForMerge(final RegionStates regionStates, final RegionInfo r } final long currentSizeMb = getRegionSizeMB(current); final long nextSizeMb = getRegionSizeMB(next); - if (currentSizeMb + nextSizeMb < avgRegionSizeMb) { + // always merge away empty regions when they present themselves. + if (currentSizeMb == 0 || nextSizeMb == 0 || currentSizeMb + nextSizeMb < avgRegionSizeMb) { Review comment: I like this idea of a fuzzy threshold idea. What if we merge a little more aggressively, expressed relative to `avgRegionSizeMb`? Something like ``` if (currentSizeMb + nextSizeMb < avgRegionSizeMb * 0.4) {...} ``` This gives us a strong preference toward larger regions, with a threshold based on the average size. I guess next you'll same "make it configurable" :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r443873027 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java ## @@ -78,4 +82,30 @@ public void execute(Admin admin) { LOG.error("Error during region merge: ", ex); } } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} + +if (o == null || getClass() != o.getClass()) { + return false; +} + +MergeNormalizationPlan that = (MergeNormalizationPlan) o; + +return new EqualsBuilder() + .append(firstRegion, that.firstRegion) + .append(secondRegion, that.secondRegion) + .isEquals(); + } + + @Override + public int hashCode() { Review comment: Yeah, seems they're unused. As is `SplitNormalizationPlan#getRegionInfo()`. Seems a little strange to have a POJO without public accessors, to include these members in `toString` but not have accessors, but yeah, less data visibility is better data visibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hbase] ndimiduk commented on a change in pull request #1922: HBASE-24583 Normalizer can't actually merge empty regions when neighbor is larger than average size
ndimiduk commented on a change in pull request #1922: URL: https://github.com/apache/hbase/pull/1922#discussion_r443871798 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/normalizer/MergeNormalizationPlan.java ## @@ -78,4 +82,30 @@ public void execute(Admin admin) { LOG.error("Error during region merge: ", ex); } } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} + +if (o == null || getClass() != o.getClass()) { + return false; +} + +MergeNormalizationPlan that = (MergeNormalizationPlan) o; + +return new EqualsBuilder() + .append(firstRegion, that.firstRegion) + .append(secondRegion, that.secondRegion) + .isEquals(); + } + + @Override + public int hashCode() { +return new HashCodeBuilder(17, 37) Review comment: And make the object instances that much larger? Is that really helpful? I think of this as a premature optimization, something the JIT can handle for me if it thinks so. If you feel strongly about it, I suppose... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org