Why do you have regions that large? The 0.92 default was 1G (admittedly, that
was much too small), the 0.98 default is 10G, which should be good in most
cases.Mappers divide their work based on regions, so very large region lead to
more uneven execution time, unless you truly have a a very large amount of
data.Compactions are in units of regions, etc.
Can I ask how much data you have overall (i.e. how many of these 75G regions
you have)?
Thanks.
-- Lars
From: Dejan Menges <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Monday, May 4, 2015 1:31 AM
Subject: Re: Right value for hbase.rpc.timeout
Hi Ted,
Max filesize for region is set to 75G in our case. Regarding split policy
we use most likely ConstantSizeRegionSplitPolicy
<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html>
(it's
0.98.0 with bunch of patches and that should be default one).
Also, regarding link you sent me in 98.3 - I can not find anywhere what's
default value for hbase.regionserver.lease.period? Is this parameter still
called like this?
On Thu, Apr 30, 2015 at 11:27 PM Ted Yu <[email protected]> wrote:
> Please take a look at 98.3 under
> http://hbase.apache.org/book.html#trouble.client
>
> BTW what's the value for hbase.hregion.max.filesize ?
> Which split policy do you use ?
>
> Cheers
>
> On Thu, Apr 30, 2015 at 6:59 AM, Dejan Menges <[email protected]>
> wrote:
>
> > Basically how I came to this question - this happened super rarely, and
> we
> > narrowed it down to hotspotting. Map was timing out on three regions
> which
> > were 4-5 times bigger then other regions for the same table, and region
> > split fixed this.
> >
> > However, was just thinking about if there are maybe some recommendations
> or
> > something about this, as it's also super hard to reproduce again same
> > situation to retest it.
> >
> > On Thu, Apr 30, 2015 at 3:56 PM Michael Segel <[email protected]
> >
> > wrote:
> >
> > > There is no single ‘right’ value.
> > >
> > > As you pointed out… some of your Mapper.map() iterations are taking
> > longer
> > > than 60 seconds.
> > >
> > > The first thing is to determine why that happens. (It could be normal,
> > or
> > > it could be bad code on your developers part. We don’t know.)
> > >
> > > The other thing is that if you determine that your code is perfect and
> it
> > > does what you want it to do… and its a major part of your use case… you
> > > then increase your timeouts to 120 seconds.
> > >
> > > The reason why its a tough issue is that we don’t know what hardware
> you
> > > are using. How many nodes… code quality.. etc … too many factors.
> > >
> > >
> > > > On Apr 30, 2015, at 6:51 AM, Dejan Menges <[email protected]>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > What's the best practice to calculate this value for your cluster, if
> > > there
> > > > is some?
> > > >
> > > > In some situations we saw that some maps are taking more than default
> > 60
> > > > seconds which was failing specific map job (as if it failed once, it
> > > failed
> > > > also every other time by number of configured retries).
> > > >
> > > > I would like to tune RPC parameters a bit, but googling and looking
> > into
> > > > HBase Book doesn't tell me how to calculate right values, and what
> else
> > > to
> > > > take a look beside hbase.rpc.timeout.
> > > >
> > > > Thanks a lot,
> > > > Dejan
> > >
> > >
> >
>