[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013177#comment-15013177
 ] 

Mikhail Antonov commented on HBASE-14838:
-

correction - "didn't think" in first sensence above

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013175#comment-15013175
 ] 

Mikhail Antonov commented on HBASE-14838:
-

Interesting, to be honest I did think about this scenario :)

I think original incentive was to provide a tool to 1) automatically fix skews 
in data distribution in regions (result of suboptimal choice of pre-chosen 
split points or something) 2) merge up small regions (either the ones which 
shrunk after major compaction, or old small regions after migration to new 
version with bigger "standard" region size)

If you have 5-6 empty regions and no data in there, do you want normalizer to 
merge them together? I would assume (if I see this scenario) that someone has 
just pre-split the table, and it should be left as is, until some data comes in 
and skews in distribution start to show up, at which point normalizer would 
kick in? Am I missing something?

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014035#comment-15014035
 ] 

Josh Elser commented on HBASE-14838:


bq. If you have 5-6 empty regions and no data in there, do you want normalizer 
to merge them together?

I think that's the ultimate question to answer. Given how the code read, it 
just looked like this was something that wasn't considered. Specifically, in 
working with Romil, we were trying to investigate why the normalizer wasn't 
doing anything with these regions as we expected it to (in a contrived 
environment).

bq. If you have 5-6 empty regions and no data in there, do you want normalizer 
to merge them together?

In this contrived case, we had only written a small amount of data to one of 
the regions (<1MB). I've yet to investigate why the greater-than-zero amount of 
data in one region was ultimately treated as no data (confirmed via a remote 
debugger attached to the master). Because of this, the average size was 
reported as zero (even for a data with a small amount of data in it). Purely 
background information at this point -- I need to look into this again.

bq. As Mikhail Antonov says, if there's no data, we have no way to guess at a 
reasonable distribution of split points.

At first glance, my reaction was that the "reasonable distribution of split 
points" for no data in a table is having no split points. Same goes for small 
amounts of data. I hadn't considered the side-effect of the normalizer undo-ing 
a pre-split table (dev goes to get lunch before starting ingest) which would be 
a confusing story to tell. Perhaps some comments on the class would be 
sufficient to record that "hey, this won't do anything to empty tables". What 
do you guys think?

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014829#comment-15014829
 ] 

Mikhail Antonov commented on HBASE-14838:
-

bq. In this contrived case, we had only written a small amount of data to one 
of the regions (<1MB). I've yet to investigate why the greater-than-zero amount 
of data in one region was ultimately treated as no data (confirmed via a remote 
debugger attached to the master). 

Because code which calculates region size in region normalizer uses metrics 
(ServerLoad/RegionLoad based), where region size (aggregated store file size) 
is represented is MB and is floored (truncated) down. If you got say 80kb worth 
of data, normalizer thingk its zero. That's the reason why minicluster tests 
for this feature are generating more than 1mb of data per region. I remember 
looking for some convenient method which would report exact size (like, hm, 
Region#size()), but haven't found anything suitable.

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014971#comment-15014971
 ] 

Josh Elser commented on HBASE-14838:


bq. Because code which calculates region size in region normalizer uses metrics 
(ServerLoad/RegionLoad based), where region size (aggregated store file size) 
is represented is MB and is floored (truncated) down

You're the best. Saved me some digging :)

bq. So how should we proceed here on this jira? Add a javadoc comment to 
specify that pre-split tables are not touched if they are empty?

I think that would be a good addition. I can add something to the class-level 
javadocs.

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014774#comment-15014774
 ] 

Enis Soztutar commented on HBASE-14838:
---

bq. At first glance, my reaction was that the "reasonable distribution of split 
points" for no data in a table is having no split points. Same goes for small 
amounts of data. I hadn't considered the side-effect of the normalizer undo-ing 
a pre-split table
This is a good point. Undoing pre-split tables will be very bad. 

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014834#comment-15014834
 ] 

Mikhail Antonov commented on HBASE-14838:
-

Yeah, merging together regions in pre-split table sounds like a bad idea to me 
too. 

So how should we proceed here on this jira? Add a javadoc comment to specify 
that pre-split tables are not touched if they are empty?

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14838) SimpleRegionNormalizer does not merge empty region of a table

2015-11-19 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013891#comment-15013891
 ] 

Nick Dimiduk commented on HBASE-14838:
--

Not sure how this applies to 1.1.2; normalizer is only on 1.2+.

I don't think we've considered a target steady state for an empty table. As 
[~mantonov] says, if there's no data, we have no way to guess at a reasonable 
distribution of split points. Ideally we'd work toward some N * the number of 
region servers (or region server group, once HBASE-6721 lands) so that 
subsequent data load is as parallel as possible, but without an idea of 
distribution, it's only a guess.

> SimpleRegionNormalizer does not merge empty region of a table
> -
>
> Key: HBASE-14838
> URL: https://issues.apache.org/jira/browse/HBASE-14838
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>
> SImpleRegionNormalizer does not merge empty region of a table
> Steps to repro:
> - Create an empty table with few, say 5-6 regions without any data in any of 
> them
> - Verify hbase:meta table to verify the regions for the table or check 
> HMaster UI
> - Enable normalizer switch and normalization for this table
> - Run normalizer, by 'normalize' command from hbase shell
> - Verify the regions for table by scanning hbase:meta table or checking 
> HMaster web UI
> The empty regions are not merged on running the region normalizer. This seems 
> to be an edge case with completely empty regions since the Normalizer checks 
> for: smallestRegion (in this case 0 size) + smallestNeighborOfSmallestRegion 
> (in this case 0 size) > avg region size (in this case 0 size)
> thanks to [~elserj] for verifying this from the source code side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)