[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-10-19 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Release Note: 
Changes default splitting policy from ConstantSizeRegionSplitPolicy to 
IncreasingToUpperBoundRegionSplitPolicy.  Splits quickly initially slowing as 
the number of regions climbs.

Split size is the number of regions that are on this server that all are of the 
same table, squared, times the region flush size OR the maximum region split 
size, whichever is smaller.  For example, if the flush size is 128M, then on 
first flush we will split which will make two regions that will split when 
their size is 2 * 2 * 128M = 512M.  If one of these regions splits, then there 
are three regions and now the split size is  3 * 3 * 128M =  1152M, and so on 
until we reach the configured maximum filesize and then from there on out, 
we'll use that.

Be warned, this new default could bring on lots of splits if you have many 
tables on your cluster.  Either go back to to the old split policy or up the 
lower bound configuration.

This patch changes the default split size from 64M to 128M.  It makes the 
region eventual split size, hbase.hregion.max.filesize, 10G (It was 1G).

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Critical
  Labels: usability
 Fix For: 0.94.0

 Attachments: 4365.txt, 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 
 4365-v5.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Attachment: 4365-v3.txt

This version sets the default split policy to be the new one and ups the max 
file size to 10G from 1G.  This is what I'll commit unless objection.  It does 
square of the number of regions * flushsize.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365-v3.txt, 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Status: Patch Available  (was: Open)

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365-v3.txt, 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Attachment: 4365-v4.txt

Your wish is my command Lars.

Also addressed Ted comments made earlier that I'd forgotten to fix

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Status: Patch Available  (was: Open)

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 
 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Attachment: 4365-v5.txt

Fix test (wasn't accomodating of square of the number of regions) and address 
Ted comment.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 
 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Status: Open  (was: Patch Available)

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 
 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-23 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

   Resolution: Fixed
Fix Version/s: 0.94.0
 Assignee: stack
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks for reviews lads and testing j-d

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Critical
  Labels: usability
 Fix For: 0.94.0

 Attachments: 4365-v2.txt, 4365-v3.txt, 4365-v4.txt, 4365-v5.txt, 
 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Attachment: 4365-v2.txt

Make it the square of the count of regions.

Address also a problem found by j-d where I was getting region size from conf 
instead of from HTD.

This patch works on trunk only.  Will need to do a version for 0.92.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1, 0.94.0
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365-v2.txt, 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-18 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

 Priority: Critical  (was: Major)
Affects Version/s: 0.92.1
   Labels: usability  (was: )

Making it so we consider this for 0.92.1 -- its usability.  Will try on cluster 
w/ square of the number of regions.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0, 0.92.1
Reporter: Todd Lipcon
Priority: Critical
  Labels: usability
 Attachments: 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4365) Add a decent heuristic for region size

2012-02-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4365:
-

Attachment: 4365.txt

Here is a first cut.

It does not do the lookup of regions in a table across the cluster nor query zk 
to find out how many nodes are in the mix.  Its kinda hard to do this from a 
RegionSplitPolicy context.

Instead, we count the number of regions that belong to a table on the current 
regionserver.  We then multiply the flushsize by this number and thats when 
we'll split.  If the multiplication produces a number  max filesize for a 
region, we'll take maxfilesize.

If 1 region for a given table on a regionserver, we'll split on the first flush.

If 5 regions from same table on a regionserver, we'll split at 5 * 128M and so 
on.

We could have the size grow more aggressively by squaring the count of regions; 
that might make sense if cluster has lots of small tables -- in fact it might 
be better altogether.  What you all think?

If agreeable, I should  make a new patch that makes this the default splitting 
policy.

 Add a decent heuristic for region size
 --

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Todd Lipcon
 Attachments: 4365.txt


 A few of us were brainstorming this morning about what the default region 
 size should be. There were a few general points made:
 - in some ways it's better to be too-large than too-small, since you can 
 always split a table further, but you can't merge regions currently
 - with HFile v2 and multithreaded compactions there are fewer reasons to 
 avoid very-large regions (10GB+)
 - for small tables you may want a small region size just so you can 
 distribute load better across a cluster
 - for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira