[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013177#comment-15013177 ] Mikhail Antonov commented on HBASE-14838: - correction - "didn't think" in first sensence above > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.2 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013175#comment-15013175 ] Mikhail Antonov commented on HBASE-14838: - Interesting, to be honest I did think about this scenario :) I think original incentive was to provide a tool to 1) automatically fix skews in data distribution in regions (result of suboptimal choice of pre-chosen split points or something) 2) merge up small regions (either the ones which shrunk after major compaction, or old small regions after migration to new version with bigger "standard" region size) If you have 5-6 empty regions and no data in there, do you want normalizer to merge them together? I would assume (if I see this scenario) that someone has just pre-split the table, and it should be left as is, until some data comes in and skews in distribution start to show up, at which point normalizer would kick in? Am I missing something? > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.2 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014035#comment-15014035 ] Josh Elser commented on HBASE-14838: bq. If you have 5-6 empty regions and no data in there, do you want normalizer to merge them together? I think that's the ultimate question to answer. Given how the code read, it just looked like this was something that wasn't considered. Specifically, in working with Romil, we were trying to investigate why the normalizer wasn't doing anything with these regions as we expected it to (in a contrived environment). bq. If you have 5-6 empty regions and no data in there, do you want normalizer to merge them together? In this contrived case, we had only written a small amount of data to one of the regions (<1MB). I've yet to investigate why the greater-than-zero amount of data in one region was ultimately treated as no data (confirmed via a remote debugger attached to the master). Because of this, the average size was reported as zero (even for a data with a small amount of data in it). Purely background information at this point -- I need to look into this again. bq. As Mikhail Antonov says, if there's no data, we have no way to guess at a reasonable distribution of split points. At first glance, my reaction was that the "reasonable distribution of split points" for no data in a table is having no split points. Same goes for small amounts of data. I hadn't considered the side-effect of the normalizer undo-ing a pre-split table (dev goes to get lunch before starting ingest) which would be a confusing story to tell. Perhaps some comments on the class would be sufficient to record that "hey, this won't do anything to empty tables". What do you guys think? > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014829#comment-15014829 ] Mikhail Antonov commented on HBASE-14838: - bq. In this contrived case, we had only written a small amount of data to one of the regions (<1MB). I've yet to investigate why the greater-than-zero amount of data in one region was ultimately treated as no data (confirmed via a remote debugger attached to the master). Because code which calculates region size in region normalizer uses metrics (ServerLoad/RegionLoad based), where region size (aggregated store file size) is represented is MB and is floored (truncated) down. If you got say 80kb worth of data, normalizer thingk its zero. That's the reason why minicluster tests for this feature are generating more than 1mb of data per region. I remember looking for some convenient method which would report exact size (like, hm, Region#size()), but haven't found anything suitable. > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014971#comment-15014971 ] Josh Elser commented on HBASE-14838: bq. Because code which calculates region size in region normalizer uses metrics (ServerLoad/RegionLoad based), where region size (aggregated store file size) is represented is MB and is floored (truncated) down You're the best. Saved me some digging :) bq. So how should we proceed here on this jira? Add a javadoc comment to specify that pre-split tables are not touched if they are empty? I think that would be a good addition. I can add something to the class-level javadocs. > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014774#comment-15014774 ] Enis Soztutar commented on HBASE-14838: --- bq. At first glance, my reaction was that the "reasonable distribution of split points" for no data in a table is having no split points. Same goes for small amounts of data. I hadn't considered the side-effect of the normalizer undo-ing a pre-split table This is a good point. Undoing pre-split tables will be very bad. > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014834#comment-15014834 ] Mikhail Antonov commented on HBASE-14838: - Yeah, merging together regions in pre-split table sounds like a bad idea to me too. So how should we proceed here on this jira? Add a javadoc comment to specify that pre-split tables are not touched if they are empty? > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table
[ https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013891#comment-15013891 ] Nick Dimiduk commented on HBASE-14838: -- Not sure how this applies to 1.1.2; normalizer is only on 1.2+. I don't think we've considered a target steady state for an empty table. As [~mantonov] says, if there's no data, we have no way to guess at a reasonable distribution of split points. Ideally we'd work toward some N * the number of region servers (or region server group, once HBASE-6721 lands) so that subsequent data load is as parallel as possible, but without an idea of distribution, it's only a guess. > SimpleRegionNormalizer does not merge empty region of a table > - > > Key: HBASE-14838 > URL: https://issues.apache.org/jira/browse/HBASE-14838 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.2 >Reporter: Romil Choksi > > SImpleRegionNormalizer does not merge empty region of a table > Steps to repro: > - Create an empty table with few, say 5-6 regions without any data in any of > them > - Verify hbase:meta table to verify the regions for the table or check > HMaster UI > - Enable normalizer switch and normalization for this table > - Run normalizer, by 'normalize' command from hbase shell > - Verify the regions for table by scanning hbase:meta table or checking > HMaster web UI > The empty regions are not merged on running the region normalizer. This seems > to be an edge case with completely empty regions since the Normalizer checks > for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion > (in this case 0 size) > avg region size (in this case 0 size) > thanks to [~elserj] for verifying this from the source code side -- This message was sent by Atlassian JIRA (v6.3.4#6332)