Thanks for the responses! > why don't you use a scan I'll try that and compare it.
> How much memory do you have for your region servers? Have you enabled > block caching? Is your CPU spiking on your region servers? Block caching is enabled. Cpu and memory dont seem to be a problem. We think we are saturating a region because the quantity of keys requested. In that case my question will be if asking 500+ keys per request is a normal scenario? Cheers, On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina <[email protected]>wrote: > The scan can be an option if the cost of scanning undesired cells and > discarding them trough filters is better than accessing those keys > individually. I would say that as the number of 'undesired' cells decreases > the scan overall performance/efficiency gets increased. It all depends on > how the keys are designed to be grouped together. > > 2013/7/30 Ted Yu <[email protected]> > > > Please also go over http://hbase.apache.org/book.html#perf.reading > > > > Cheers > > > > On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah < > [email protected] > > >wrote: > > > > > If all your keys are grouped together, why don't you use a scan with > > > start/end key specified? A sequential scan can theoretically be faster > > than > > > MultiGet lookups (assuming your grouping is tight, you can also use > > filters > > > with the scan to give better performance) > > > > > > How much memory do you have for your region servers? Have you enabled > > > block caching? Is your CPU spiking on your region servers? > > > > > > If you are saturating the resources on your *hot* region server then > yes > > > having more region servers will help. If no, then something else is the > > > bottleneck and you probably need to dig further > > > > > > > > > > > > > > > Regards, > > > Dhaval > > > > > > > > > ________________________________ > > > From: Demian Berjman <[email protected]> > > > To: [email protected] > > > Sent: Tuesday, 30 July 2013 4:37 PM > > > Subject: help on key design > > > > > > > > > Hi, > > > > > > I would like to explain our use case of HBase, the row key design and > the > > > problems we are having so anyone can give us a help: > > > > > > The first thing we noticed is that our data set is too small compared > to > > > other cases we read in the list and forums. We have a table containing > 20 > > > million keys splitted automatically by HBase in 4 regions and balanced > > in 3 > > > region servers. We have designed our key to keep together the set of > keys > > > requested by our app. That is, when we request a set of keys we expect > > them > > > to be grouped together to improve data locality and block cache > > efficiency. > > > > > > The second thing we noticed, compared to other cases, is that we > > retrieve a > > > bunch keys per request (500 aprox). Thus, during our peaks (3k requests > > per > > > minute), we have a lot of requests going to a particular region servers > > and > > > asking a lot of keys. That results in poor response times (in the order > > of > > > seconds). Currently we are using multi gets. > > > > > > We think an improvement would be to spread the keys (introducing a > > > randomized component on it) in more region servers, so each rs will > have > > to > > > handle less keys and probably less requests. Doing that way the multi > > gets > > > will be spread over the region servers. > > > > > > Our questions: > > > > > > 1. Is it correct this design of asking so many keys on each request? > (if > > > you need high performance) > > > 2. What about splitting in more region servers? It's a good idea? How > we > > > could accomplish this? We thought in apply some hashing... > > > > > > Thanks in advance! > > > > > >
