Sequential writes are also an argument for pre-splitting and using hash prefixing. In other words, presplit your table into N regions instead of the default of 1 & transform your keys into:
new_key = md5(old_key) + old_key Using this method your sequential writes under the old_key are now spread evenly across all regions. There are some limitations to hash prefixing, such as non-sequential scans across row boundaries. However, it's a tradeoff between even distribution & advanced query options. On 11/20/11 7:54 PM, "Amandeep Khurana" <[email protected]> wrote: >Mark, > >Yes, your understanding is correct. If your keys are sequential >(timestamps >etc), you will always be writing to the end of the table and "older" >regions will not get any writes. This is one of the arguments against >using >sequential keys. > >-ak > >On Sun, Nov 20, 2011 at 11:33 AM, Mark <[email protected]> wrote: > >> Say we have a use case that has sequential row keys and we have rows >> 0-100. Let's assume that 100 rows = the split size. Now when there is a >> split it will split at the halfway mark so there will be two regions as >> follows: >> >> Region1 [START-49] >> Region2 [50-END] >> >> So now at this point all inserts will be writing to Region2 only >>correct? >> Now at some point Region2 will need to split and it will look like the >> following before the split: >> >> Region1 [START-49] >> Region2 [50-150] >> >> After the split it will look like: >> >> Region1 [START-49] >> Region2 [50-100] >> Region3 [150-END] >> >> And this pattern will continue correct? My question is when there is a >>use >> case that has sequential keys how would any of the older regions every >> receive anymore writes? It seems like they would always be stuck at >> MaxRegionSize/2. Can someone please confirm or clarify this issue? >> >> Thanks >> >> >> >> >>
