Mark: you are correct about the old_key suffix. I'm assuming that you're worried about this because of keyspace size, correct? The default algorithm for pre-splitting assumes a 32-bit (4 byte) hash prefix, which should be perfectly scalable for all use cases in the near future of computing. Really, you could get away with an 8-bit hash prefix if your cluster is small & you plan to auto-split after a certain size. This is available if you use UniformSplit but will require a little power user investigation. I don't think anybody deviates from the default, mainly just because current use cases aren't as finicky about the extra overhead.
For the medium term, note that HBASE-4218 will also introduce key compression & further reduce overhead. This won't be available until 94 or so, but you probably won't be worried about an extra 4 bytes until then. We currently use the HexStringSplit algorithm in production, which is 8-bytes but is human-readable. With preliminary investigation, we predict an 80%+ compression in our key size (currently ~80 bytes) with HBASE-4218. On 11/21/11 9:55 AM, "Mark" <static.void....@gmail.com> wrote: >Damn, I was hoping my understanding was flawed. > >In your example I am guessing the addition of old_key suffix is to >prevent against any possible collision. Is that correct? > >On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote: >> Sequential writes are also an argument for pre-splitting and using hash >> prefixing. In other words, presplit your table into N regions instead >>of >> the default of 1& transform your keys into: >> >> new_key = md5(old_key) + old_key >> >> Using this method your sequential writes under the old_key are now >>spread >> evenly across all regions. There are some limitations to hash >>prefixing, >> such as non-sequential scans across row boundaries. However, it's a >> tradeoff between even distribution& advanced query options. >> >> On 11/20/11 7:54 PM, "Amandeep Khurana"<ama...@gmail.com> wrote: >> >>> Mark, >>> >>> Yes, your understanding is correct. If your keys are sequential >>> (timestamps >>> etc), you will always be writing to the end of the table and "older" >>> regions will not get any writes. This is one of the arguments against >>> using >>> sequential keys. >>> >>> -ak >>> >>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<static.void....@gmail.com> >>>wrote: >>> >>>> Say we have a use case that has sequential row keys and we have rows >>>> 0-100. Let's assume that 100 rows = the split size. Now when there is >>>>a >>>> split it will split at the halfway mark so there will be two regions >>>>as >>>> follows: >>>> >>>> Region1 [START-49] >>>> Region2 [50-END] >>>> >>>> So now at this point all inserts will be writing to Region2 only >>>> correct? >>>> Now at some point Region2 will need to split and it will look like the >>>> following before the split: >>>> >>>> Region1 [START-49] >>>> Region2 [50-150] >>>> >>>> After the split it will look like: >>>> >>>> Region1 [START-49] >>>> Region2 [50-100] >>>> Region3 [150-END] >>>> >>>> And this pattern will continue correct? My question is when there is a >>>> use >>>> case that has sequential keys how would any of the older regions every >>>> receive anymore writes? It seems like they would always be stuck at >>>> MaxRegionSize/2. Can someone please confirm or clarify this issue? >>>> >>>> Thanks >>>> >>>> >>>> >>>> >>>>