Re: Performance tuning

James Taylor Sat, 21 Dec 2013 15:31:13 -0800

FYI, scanner caching defaults to 1000 in Phoenix, but as folks have pointed
out, that's not relevant in this case b/c only a single row is returned
from the server for a COUNT(*) query.



On Sat, Dec 21, 2013 at 2:51 PM, Kristoffer Sjögren <[email protected]>wrote:

> Yeah, im doing a count(*) query on the 96 region table. Do you mean to
> check network traffic between RS?
>
> From debugging phoenix code I can see that there are 96 scans sent and each
> response returned back to the client contain only the sum of rows, which
> are then aggregated and returned. So the traffic between client and each RS
> is very small.
>
>
>
>
> On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl <[email protected]> wrote:
>
> > Thanks Kristoffer,
> >
> > yeah, that's the right metric. I would put my bet on the slower network.
> > But you're also doing a select count(*) query in Phoenix, right? So
> > nothing should really be sent across the network.
> >
> > When you do the queries, can you check whether there is any network
> > traffic?
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Kristoffer Sjögren <[email protected]>
> > To: [email protected]; lars hofhansl <[email protected]>
> > Sent: Saturday, December 21, 2013 1:28 PM
> > Subject: Re: Performance tuning
> >
> >
> > @pradeep scanner caching should not be an issue since data transferred to
> > the client is tiny.
> >
> > @lars Yes, the data might be small for this particular case :-)
> >
> > I have checked everything I can think of on RS (CPU, network, Hbase
> > console, uptime etc) and nothing stands out, except for the pings
> (network
> > pings).
> > There are 5 regions on 7, 18, 19, and 23 the others have 4.
> > hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?)
> >
> > -Kristoffer
> >
> >
> >
> >
> > On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <[email protected]> wrote:
> >
> > > Hi Kristoffer,
> > > For this particular problem. Are many regions on the same
> RegionServers?
> > > Did you profile those RegionServers? Anything weird on that box?
> > > Pings slower might well be an issue. How's the data locality? (You can
> > > check on a RegionServer's overview page).
> > > If needed, you can issue a major compaction to reestablish local data
> on
> > > all RegionServers.
> > >
> > >
> > > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny
> > > dataset it doesn't matter anyway.
> > >
> > > 10m rows across 96 regions is just about 100k rows per region. You
> won't
> > > see many of the nice properties for HBase.
> > > Try with 100m (or better 1bn rows). Then we're talking. For anything
> > below
> > > this you wouldn't want to use HBase anyway.
> > > (100k rows I could scan on my phone with a Perl script in less than 1s)
> > >
> > >
> > > With "ping" you mean an actual network ping, or some operation on top
> of
> > > HBase?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Kristoffer Sjögren <[email protected]>
> > > To: [email protected]
> > > Sent: Saturday, December 21, 2013 11:17 AM
> > > Subject: Performance tuning
> > >
> > >
> > > Hi
> > >
> > > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the
> > last
> > > couple of days and need some help.
> > >
> > > Background.
> > >
> > > - 23 machine cluster, 32 cores, 4GB heap per RS.
> > > - Table t_24 have 24 online regions (24 salt buckets).
> > > - Table t_96 have 96 online regions (96 salt buckets).
> > > - 10.5 million rows per table.
> > > - Count query - select (*) from ...
> > > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T
> >= 0
> > > and T <= 2147482800) group by A, B, C;
> > >
> > > What I found ultimately is that region servers 19, 20, 21, 22 and 23
> > > are consistently
> > > 2-3x slower than the others. This hurts overall latency pretty bad
> since
> > > queries are executed in parallel on the RS and then aggregated at the
> > > client (through Phoenix). In Hannibal regions spread out evenly over
> > region
> > > servers, according to salt buckets (phoenix feature, pre-create regions
> > and
> > > a rowkey prefix).
> > >
> > > As far as I can tell, there is no network or hardware configuration
> > > divergence between the machines. No CPU, network or other notable
> > > divergence
> > > in Ganglia. No RS metric differences in HBase master console.
> > >
> > > The only thing that may be of interest is that pings (within the
> cluster)
> > > to
> > > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if
> > > this is significant,
> > > but I get a bad feeling about it since it match exactly with the RS
> that
> > > stood out in my performance tests.
> > >
> > > Any ideas of how I might find the source of this problem?
> > >
> > > Cheers,
> > > -Kristoffer
> > >
> >
>

Re: Performance tuning

Reply via email to