Re: md5 hash key and splits

2012-08-31 Thread Stack
On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? Time series data is a particular case [1] and the sematextians have tools to help w/ that

Re: md5 hash key and splits

2012-08-31 Thread Doug Meil
Stack, re: Where did you read that?, I think he might also be referring to this... http://hbase.apache.org/book.html#important_configurations On 8/30/12 8:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread

Re: md5 hash key and splits

2012-08-31 Thread Mohit Anchlia
On Thu, Aug 30, 2012 at 11:52 PM, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? Time series data is a

Re: md5 hash key and splits

2012-08-31 Thread Stack
On Fri, Aug 31, 2012 at 6:09 AM, Doug Meil doug.m...@explorysmedical.com wrote: Stack, re: Where did you read that?, I think he might also be referring to this... http://hbase.apache.org/book.html#important_configurations I'd say we need to revist that paragraph. It gives a 'wrong'

Re: md5 hash key and splits

2012-08-31 Thread Stack
On Fri, Aug 31, 2012 at 7:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: My data is timeseries and to get random distribution and still have the keys in the same region for a user I am thinking of using md5(userid)+reversetimestamp as a row key. But with this type of key how can one do

Re: md5 hash key and splits

2012-08-30 Thread Mohit Anchlia
On Wed, Aug 29, 2012 at 10:50 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 9:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

Re: md5 hash key and splits

2012-08-30 Thread Stack
On Thu, Aug 30, 2012 at 7:35 AM, Mohit Anchlia mohitanch...@gmail.com wrote: From what I;ve read it's advisable to do manual splits since you are able to spread the load in more predictable way. If I am missing something please let me know. Where did you read that? St.Ack

Re: md5 hash key and splits

2012-08-30 Thread Ian Varley
The Facebook devs have mentioned in public talks that they pre-split their tables and don't use automated region splitting. But as far as I remember, the reason for that isn't predictability of spreading load, so much as predictability of uptime latency (they don't want an automated split to

Re: md5 hash key and splits

2012-08-30 Thread Amandeep Khurana
Also, you might have read that an initial loading of data can be better distributed across the cluster if the table is pre-split rather than starting with a single region and splitting (possibly aggressively, depending on the throughput) as the data loads in. Once you are in a stable state with

Re: md5 hash key and splits

2012-08-30 Thread Mohit Anchlia
In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? I read about pre-splitting here: http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ On Thu, Aug 30, 2012 at

Re: md5 hash key and splits

2012-08-29 Thread Stack
On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use md5 hash + timestamp rowkey would hbase automatically detect the difference in ranges and peforms split? How does split work in such cases or is it still advisable to manually split the regions. Yes. On how

Re: md5 hash key and splits

2012-08-29 Thread Mohit Anchlia
On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use md5 hash + timestamp rowkey would hbase automatically detect the difference in ranges and peforms split? How does split work in such cases

Re: md5 hash key and splits

2012-08-29 Thread Stack
On Wed, Aug 29, 2012 at 9:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use md5 hash + timestamp rowkey would hbase automatically detect the