Was in a meeting ...
In 0.94, if you look at HConnectionManager#processBatchCallback(), you
would see:
MultiAction<R> actions = actionsByServer.get(loc);
if (actions == null) {
actions = new MultiAction<R>();
actionsByServer.put(loc, actions);
}
where:
Map<HRegionLocation, MultiAction<R>> actionsByServer =
new HashMap<HRegionLocation, MultiAction<R>>();
And HRegionLocation#hashCode() is defined as:
public int hashCode() {
return this.serverName.hashCode();
}
So the grouping happens at region server level.
Cheers
On Wed, Jul 31, 2013 at 11:00 AM, Pablo Medina <[email protected]>wrote:
> Isn't that a job by the multiGet at the client side?. I mean, when you
> provide a list a of gets the client groups them in regions and region
> servers and them submits a job to its executor in order to call the region
> servers in parallel. Is that what you mean, right?.
>
>
>
> 2013/7/31 Ted Yu <[email protected]>
>
> > From the information Demian provided in the first email:
> >
> > bq. a table containing 20 million keys splitted automatically by HBase
> in 4
> > regions and balanced in 3 region servers
> >
> > I think the number of regions should be increased through (manual)
> > splitting so that the data is spread more evenly across servers.
> >
> > If the Get's are scattered across whole key space, there is some
> > optimization the client can do. Namely group the Get's by region boundary
> > and issue multi get per region.
> >
> > Please also refer to http://hbase.apache.org/book.html#rowkey.design,
> > especially 6.3.2.
> >
> > Cheers
> >
> > On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah
> > <[email protected]>wrote:
> >
> > > Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems
> > like
> > > the 500 Gets are executed sequentially on the region server.
> > >
> > > Also 3k requests per minute = 50 requests per second. Assuming your
> > > requests take 1 sec (which seems really long but who knows) then you
> need
> > > atleast 50 threads/region server handlers to handle these. Defaults for
> > > that number on some older versions of hbase is 10 which means you are
> > > running out of threads. Which brings up the following questions -
> > > What version of HBase are you running?
> > > How many region server handlers do you have?
> > >
> > > Regards,
> > > Dhaval
> > >
> > >
> > > ----- Original Message -----
> > > From: Demian Berjman <[email protected]>
> > > To: [email protected]
> > > Cc:
> > > Sent: Wednesday, 31 July 2013 11:12 AM
> > > Subject: Re: help on key design
> > >
> > > Thanks for the responses!
> > >
> > > > why don't you use a scan
> > > I'll try that and compare it.
> > >
> > > > How much memory do you have for your region servers? Have you enabled
> > > > block caching? Is your CPU spiking on your region servers?
> > > Block caching is enabled. Cpu and memory dont seem to be a problem.
> > >
> > > We think we are saturating a region because the quantity of keys
> > requested.
> > > In that case my question will be if asking 500+ keys per request is a
> > > normal scenario?
> > >
> > > Cheers,
> > >
> > >
> > > On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina <
> [email protected]
> > > >wrote:
> > >
> > > > The scan can be an option if the cost of scanning undesired cells and
> > > > discarding them trough filters is better than accessing those keys
> > > > individually. I would say that as the number of 'undesired' cells
> > > decreases
> > > > the scan overall performance/efficiency gets increased. It all
> depends
> > on
> > > > how the keys are designed to be grouped together.
> > > >
> > > > 2013/7/30 Ted Yu <[email protected]>
> > > >
> > > > > Please also go over http://hbase.apache.org/book.html#perf.reading
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah <
> > > > [email protected]
> > > > > >wrote:
> > > > >
> > > > > > If all your keys are grouped together, why don't you use a scan
> > with
> > > > > > start/end key specified? A sequential scan can theoretically be
> > > faster
> > > > > than
> > > > > > MultiGet lookups (assuming your grouping is tight, you can also
> use
> > > > > filters
> > > > > > with the scan to give better performance)
> > > > > >
> > > > > > How much memory do you have for your region servers? Have you
> > enabled
> > > > > > block caching? Is your CPU spiking on your region servers?
> > > > > >
> > > > > > If you are saturating the resources on your *hot* region server
> > then
> > > > yes
> > > > > > having more region servers will help. If no, then something else
> is
> > > the
> > > > > > bottleneck and you probably need to dig further
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Dhaval
> > > > > >
> > > > > >
> > > > > > ________________________________
> > > > > > From: Demian Berjman <[email protected]>
> > > > > > To: [email protected]
> > > > > > Sent: Tuesday, 30 July 2013 4:37 PM
> > > > > > Subject: help on key design
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to explain our use case of HBase, the row key design
> > and
> > > > the
> > > > > > problems we are having so anyone can give us a help:
> > > > > >
> > > > > > The first thing we noticed is that our data set is too small
> > compared
> > > > to
> > > > > > other cases we read in the list and forums. We have a table
> > > containing
> > > > 20
> > > > > > million keys splitted automatically by HBase in 4 regions and
> > > balanced
> > > > > in 3
> > > > > > region servers. We have designed our key to keep together the set
> > of
> > > > keys
> > > > > > requested by our app. That is, when we request a set of keys we
> > > expect
> > > > > them
> > > > > > to be grouped together to improve data locality and block cache
> > > > > efficiency.
> > > > > >
> > > > > > The second thing we noticed, compared to other cases, is that we
> > > > > retrieve a
> > > > > > bunch keys per request (500 aprox). Thus, during our peaks (3k
> > > requests
> > > > > per
> > > > > > minute), we have a lot of requests going to a particular region
> > > servers
> > > > > and
> > > > > > asking a lot of keys. That results in poor response times (in the
> > > order
> > > > > of
> > > > > > seconds). Currently we are using multi gets.
> > > > > >
> > > > > > We think an improvement would be to spread the keys (introducing
> a
> > > > > > randomized component on it) in more region servers, so each rs
> will
> > > > have
> > > > > to
> > > > > > handle less keys and probably less requests. Doing that way the
> > multi
> > > > > gets
> > > > > > will be spread over the region servers.
> > > > > >
> > > > > > Our questions:
> > > > > >
> > > > > > 1. Is it correct this design of asking so many keys on each
> > request?
> > > > (if
> > > > > > you need high performance)
> > > > > > 2. What about splitting in more region servers? It's a good idea?
> > How
> > > > we
> > > > > > could accomplish this? We thought in apply some hashing...
> > > > > >
> > > > > > Thanks in advance!
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>