Damn, I was hoping my understanding was flawed.

In your example I am guessing the addition of old_key suffix is to prevent against any possible collision. Is that correct?

On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
Sequential writes are also an argument for pre-splitting and using hash
prefixing.  In other words, presplit your table into N regions instead of
the default of 1&  transform your keys into:

new_key = md5(old_key) + old_key

Using this method your sequential writes under the old_key are now spread
evenly across all regions.  There are some limitations to hash prefixing,
such as non-sequential scans across row boundaries.  However, it's a
tradeoff between even distribution&  advanced query options.

On 11/20/11 7:54 PM, "Amandeep Khurana"<ama...@gmail.com>  wrote:

Mark,

Yes, your understanding is correct. If your keys are sequential
(timestamps
etc), you will always be writing to the end of the table and "older"
regions will not get any writes. This is one of the arguments against
using
sequential keys.

-ak

On Sun, Nov 20, 2011 at 11:33 AM, Mark<static.void....@gmail.com>  wrote:

Say we have a use case that has sequential row keys and we have rows
0-100. Let's assume that 100 rows = the split size. Now when there is a
split it will split at the halfway mark so there will be two regions as
follows:

Region1 [START-49]
Region2 [50-END]

So now at this point all inserts will be writing to Region2 only
correct?
Now at some point Region2 will need to split and it will look like the
following before the split:

Region1 [START-49]
Region2 [50-150]

After the split it will look like:

Region1 [START-49]
Region2 [50-100]
Region3 [150-END]

And this pattern will continue correct? My question is when there is a
use
case that has sequential keys how would any of the older regions every
receive anymore writes? It seems like they would always be stuck at
MaxRegionSize/2. Can someone please confirm or clarify this issue?

Thanks





Reply via email to