[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623934#comment-15623934 ] Jean-Marc Spaggiari commented on HBASE-16765: - LGTM. Thanks Lars! > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623931#comment-15623931 ] Jean-Marc Spaggiari commented on HBASE-16765: - I don't have the exact number, but they have some small tables that got plitted way to much. I agree that 2 regions per table per RS is a good limitation (then after than it's 10GB per region...) > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623572#comment-15623572 ] Lars Hofhansl commented on HBASE-16765: --- OK... So I'll check the class into every branch, so that it could optionally be configured as SplitPolicy. For 2.0 I'm going to make this the default. Everybody cool with that? > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623570#comment-15623570 ] Lars Hofhansl commented on HBASE-16765: --- Heh... Yes. Although the old tribal knowledge is also not necessarily correct anymore. How many regions did HBase create? I think if we can cap it to 2/table/server that'd be a good improvement, and workable, I'd think. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620509#comment-15620509 ] Jean-Marc Spaggiari commented on HBASE-16765: - Just come back from customer site last week... I was smiling when I saw all tables configured with ConstantSizeRegionSplitPolicy... Cluster was about 300 HBase servers. They complained about HBase creating way to many regions for small tables, so they just configured each table to use ConstantSizeRegionSplitPolicy. So we should definitively do something on that side... > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615861#comment-15615861 ] stack commented on HBASE-16765: --- Go for it [~lhofhansl]. Put notes up in release notes. Flag it incompatible and 2.0. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615823#comment-15615823 ] Lars Hofhansl commented on HBASE-16765: --- Lost sight of this. I think this should be the default. Can only be in a major release (or perhaps in a minor release), the risk is that after running with this for a while and then rolling back the upgrade one might run into a split-storm, as the old policy will split more aggressively. Not sure if that is enough to not put it into the next minor release. Lemme commit this, after explaining more in the comments. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570029#comment-15570029 ] stack commented on HBASE-16765: --- Needs a release note. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570026#comment-15570026 ] stack commented on HBASE-16765: --- +1 on patch. On commit add more on how this is different to the default in the class comment. Should we make this the default since it less aggressive? Can be new issue. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1541#comment-1541 ] Jean-Marc Spaggiari commented on HBASE-16765: - Oh! I never figured it was cube and not square. I always just looked at the comment... Interesting. Well, this patch is then totally required, to get the 2 aligned... > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1423#comment-1423 ] Lars Hofhansl commented on HBASE-16765: --- SteppingSplitPolicy is the fix :) I noticed the comment in IncreasingToUpperBoundRegionSplitPolicy did not reflect the code so I fixed it. Could change IncreasingToUpperBoundRegionSplitPolicy itself of course. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549910#comment-15549910 ] Jean-Marc Spaggiari commented on HBASE-16765: - Is the last patch correct? What is SteppingSplitPolicy? And I see only modification on the comments for IncreasingToUpperBoundRegionSplitPolicy. I think until we get HBASE-12451 in, this might still help to reduce the damages... > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549869#comment-15549869 ] Lars Hofhansl commented on HBASE-16765: --- Any comments on the simple patch? Not worth it? > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547075#comment-15547075 ] Lars Hofhansl commented on HBASE-16765: --- And of course HBASE-12451 is far more elaborate. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547059#comment-15547059 ] Lars Hofhansl commented on HBASE-16765: --- Oh... Yeah. Missing the new file... Arrgghh :) > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546483#comment-15546483 ] Lars Hofhansl commented on HBASE-16765: --- Yeah... No tests, etc copyright, etc, etc. Was just getting a feeling for what people think. Do we need to be more aggressive in preventing the splitting in large clusters? In a 1000 node cluster even small table pretty quickly grow to 2000 regions. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546455#comment-15546455 ] Jean-Marc Spaggiari commented on HBASE-16765: - Big +1. When I go onsite to customers I usually recommend to disable this split policy and go with a size based policy... Because as you figured, it creates way to many regions. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546470#comment-15546470 ] stack commented on HBASE-16765: --- Sure. I think the patch is missing stuff though. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546468#comment-15546468 ] Lars Hofhansl commented on HBASE-16765: --- This should be the default, I think. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546464#comment-15546464 ] Lars Hofhansl commented on HBASE-16765: --- Note that the optimum we can achieve is 2 region per table per server unless we revert to some scheme with a global view of the number of server and current number of regions. But 2 is twice as good as 4! :) > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546417#comment-15546417 ] Lars Hofhansl commented on HBASE-16765: --- In comparison, the default IncreasingToUpperBoundRegionSplitPolicy would need 4 regions per table and server to reach the maximum split size. > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546402#comment-15546402 ] Lars Hofhansl commented on HBASE-16765: --- I think ideally we want the following axioms: # quick splitting and spreading of regions as the table is small # ideally not more than one region of a table per server (MAX_FILESIZE permitting of course) #2 is where IncreasingToUpperBoundRegionSplitPolicy falls short. I'd propose a step function instead: split at 2xflushsize when only one region of the table is seen, stop splitting (i.e. constant size split policy) when more than 1 region is seen. This should be as close to ideal as is possible with local knowledge only usually not leading to more than 2 regions per server (unless we need to split more due to MAX_FILESIZE) [~stack] > Improve IncreasingToUpperBoundRegionSplitPolicy > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)