Mark Key designs depend on expected access patterns and use cases. From a theoretical stand point, what you are saying will work to distribute writes but if you want to access a small range, you'll need to fan out your reads and can't leverage short scans.
Amandeep On Nov 22, 2011, at 4:55 PM, Mark <static.void....@gmail.com> wrote: > I just thought of something. > > In cases where the id is sequential couldn't one simply reverse the id to get > more of a uniform distribution? > > 510911 => 119015 > 510912 => 219015 > 510913 => 319015 > 510914 => 419015 > > That seems like a reasonable alternative that doesn't require prefixing each > row key with an extra 16 bytes. Am I wrong in thinking this could work? > > > On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote: >> If you increase the region size to 2GB, then all regions (current and new) >> will avoid a split until their aggregate StoreFile size reaches that >> limit. Reorganizing the regions for a uniform growth pattern is really a >> schema design problem. There is the capability to merge two adjacent >> regions if you know that your data growth pattern is non-uniform. >> StumbleUpon& other companies have more experience with those utilities >> than I do. >> >> Note: With the introduction of HFileV2 in 0.92, you'll definitely want to >> lean towards increasing the region size. HFile scalability code is more >> mature/stable than the region splitting code. Plus, automatic region >> splitting is harder to optimize& debug when failures occur. >> >> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas" >> <srikanth_shreeni...@mindtree.com> wrote: >> >>> Thanks Nicolas for the clarification. I had a follow-up query. >>> >>> What will happen if we increased the region size, say from current value >>> of 256 MB to a new value of 2GB? >>> Will existing regions continue to use only 256 MB space? >>> >>> Is there a way to reorganize the regions so that each regions grows to >>> 2GB size? >>> >>> Thanks, >>> Srikanth >>> >>> -----Original Message----- >>> From: Nicolas Spiegelberg [mailto:nspiegelb...@fb.com] >>> Sent: Tuesday, November 22, 2011 10:59 PM >>> To: user@hbase.apache.org >>> Subject: Re: Region Splits >>> >>> No. The purpose of major compactions is to merge& dedupe within a region >>> boundary. Compactions will not alter region boundaries, except in the >>> case of splits where a compaction is necessary to filter out any Rows from >>> the parent region that are no longer applicable to the daughter region. >>> >>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas" >>> <srikanth_shreeni...@mindtree.com> wrote: >>> >>>> Will major compactions take care of merging "older" regions or adding >>>> more key/values to them as number of regions grow? >>>> >>>> Regard, >>>> Srikanth >>>> >>>> -----Original Message----- >>>> From: Amandeep Khurana [mailto:ama...@gmail.com] >>>> Sent: Monday, November 21, 2011 7:25 AM >>>> To: user@hbase.apache.org >>>> Subject: Re: Region Splits >>>> >>>> Mark, >>>> >>>> Yes, your understanding is correct. If your keys are sequential >>>> (timestamps >>>> etc), you will always be writing to the end of the table and "older" >>>> regions will not get any writes. This is one of the arguments against >>>> using >>>> sequential keys. >>>> >>>> -ak >>>> >>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<static.void....@gmail.com> wrote: >>>> >>>>> Say we have a use case that has sequential row keys and we have rows >>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a >>>>> split it will split at the halfway mark so there will be two regions as >>>>> follows: >>>>> >>>>> Region1 [START-49] >>>>> Region2 [50-END] >>>>> >>>>> So now at this point all inserts will be writing to Region2 only >>>>> correct? >>>>> Now at some point Region2 will need to split and it will look like the >>>>> following before the split: >>>>> >>>>> Region1 [START-49] >>>>> Region2 [50-150] >>>>> >>>>> After the split it will look like: >>>>> >>>>> Region1 [START-49] >>>>> Region2 [50-100] >>>>> Region3 [150-END] >>>>> >>>>> And this pattern will continue correct? My question is when there is a >>>>> use >>>>> case that has sequential keys how would any of the older regions every >>>>> receive anymore writes? It seems like they would always be stuck at >>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> >>>>> >>>> ________________________________ >>>> >>>> http://www.mindtree.com/email/disclaimer.html