On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
In general isn't it better to split the regions so that the load can be
spread accross the cluster to avoid HotSpots?
Time series data is a particular case [1] and the sematextians have
tools to help w/ that
Stack, re: Where did you read that?, I think he might also be referring
to this...
http://hbase.apache.org/book.html#important_configurations
On 8/30/12 8:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
In general isn't it better to split the regions so that the load can be
spread
On Thu, Aug 30, 2012 at 11:52 PM, Stack st...@duboce.net wrote:
On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
In general isn't it better to split the regions so that the load can be
spread accross the cluster to avoid HotSpots?
Time series data is a
On Fri, Aug 31, 2012 at 6:09 AM, Doug Meil
doug.m...@explorysmedical.com wrote:
Stack, re: Where did you read that?, I think he might also be referring
to this...
http://hbase.apache.org/book.html#important_configurations
I'd say we need to revist that paragraph. It gives a 'wrong'
On Fri, Aug 31, 2012 at 7:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
My data is timeseries and to get random distribution and still have the
keys in the same region for a user I am thinking of using
md5(userid)+reversetimestamp as a row key. But with this type of key how
can one do
On Wed, Aug 29, 2012 at 10:50 PM, Stack st...@duboce.net wrote:
On Wed, Aug 29, 2012 at 9:38 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote:
On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
On Thu, Aug 30, 2012 at 7:35 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
From what I;ve read it's advisable to do manual splits since you are able
to spread the load in more predictable way. If I am missing something
please let me know.
Where did you read that?
St.Ack
The Facebook devs have mentioned in public talks that they pre-split their
tables and don't use automated region splitting. But as far as I remember, the
reason for that isn't predictability of spreading load, so much as
predictability of uptime latency (they don't want an automated split to
Also, you might have read that an initial loading of data can be better
distributed across the cluster if the table is pre-split rather than
starting with a single region and splitting (possibly aggressively,
depending on the throughput) as the data loads in. Once you are in a stable
state with
In general isn't it better to split the regions so that the load can be
spread accross the cluster to avoid HotSpots?
I read about pre-splitting here:
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
On Thu, Aug 30, 2012 at
On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
If I use md5 hash + timestamp rowkey would hbase automatically detect the
difference in ranges and peforms split? How does split work in such cases
or is it still advisable to manually split the regions.
Yes.
On how
On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote:
On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
If I use md5 hash + timestamp rowkey would hbase automatically detect the
difference in ranges and peforms split? How does split work in such cases
On Wed, Aug 29, 2012 at 9:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote:
On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
If I use md5 hash + timestamp rowkey would hbase automatically detect the
13 matches
Mail list logo