Re: Of hbase key distribution and query scalability, again.

2012-05-26 Thread Ian Varley
Mike, I gather that Dmitriy is asking whether there are any smarts in the region balancer based on heavy *read* traffic (i.e. if it turns out that your read load is heavily skewed towards a small subset of regions). Which there aren't, but could be if someone wanted to write the infrastructure

Re: Of hbase key distribution and query scalability, again.

2012-05-26 Thread Michael Segel
Ian Understood. Dmitry, Could you show a use case where you see this happening? If you have records that are being read that frequently, they would be cached in memory. I think you could use some concept of a systems table and then using coprocessors you could update the table with the

Re: Of hbase key distribution and query scalability, again.

2012-05-26 Thread highpointe
Here is my SS: 259 71 2451 On May 26, 2012, at 9:25 AM, Michael Segel michael_se...@hotmail.com wrote: Hi, Jumping in on this late... To cut a long story, is the region size the only current HBase technique to balance load, esp. w.r.t query load? Or perhaps there are some more advanced

Of hbase key distribution and query scalability, again.

2012-05-25 Thread Dmitriy Lyubimov
Hello, I'd like to collect opinions from HBase experts on the query uniformity and whether there's any advance technique currently exists in HBase to cope with the problems of query uniformity beyond just maintaining the key uniform distribution. I know we start with the statement that in order

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Ian Varley
Dmitriy, If I understand you right, what you're asking about might be called Read Hotspotting. For an obvious example, if I distribute my data nicely over the cluster but then say: for (int x = 0; x 100; x++) { htable.get(new Get(Bytes.toBytes(row1))); } Then naturally I'm only

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Dmitriy Lyubimov
Thanks, Ian. I am talking about situation when even when we have uniform keys, the query distribution over them is still non-uniform and impossible to predict without sampling query skewness, but skewness is surprisingly great. (as in least active/most active user may differ in activity 100 times

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Ian Varley
Yeah, I think you're right Dmitriy; there's nothing like that in HBase today as far as I know. If it'd be useful for you, maybe it would be for others, too; work up a rough patch and see what people think on the dev list. Ian On May 25, 2012, at 1:02 PM, Dmitriy Lyubimov wrote: Thanks, Ian.