FYI, scanner caching defaults to 1000 in Phoenix, but as folks have pointed out, that's not relevant in this case b/c only a single row is returned from the server for a COUNT(*) query.
On Sat, Dec 21, 2013 at 2:51 PM, Kristoffer Sjögren <[email protected]>wrote: > Yeah, im doing a count(*) query on the 96 region table. Do you mean to > check network traffic between RS? > > From debugging phoenix code I can see that there are 96 scans sent and each > response returned back to the client contain only the sum of rows, which > are then aggregated and returned. So the traffic between client and each RS > is very small. > > > > > On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl <[email protected]> wrote: > > > Thanks Kristoffer, > > > > yeah, that's the right metric. I would put my bet on the slower network. > > But you're also doing a select count(*) query in Phoenix, right? So > > nothing should really be sent across the network. > > > > When you do the queries, can you check whether there is any network > > traffic? > > > > -- Lars > > > > > > > > ________________________________ > > From: Kristoffer Sjögren <[email protected]> > > To: [email protected]; lars hofhansl <[email protected]> > > Sent: Saturday, December 21, 2013 1:28 PM > > Subject: Re: Performance tuning > > > > > > @pradeep scanner caching should not be an issue since data transferred to > > the client is tiny. > > > > @lars Yes, the data might be small for this particular case :-) > > > > I have checked everything I can think of on RS (CPU, network, Hbase > > console, uptime etc) and nothing stands out, except for the pings > (network > > pings). > > There are 5 regions on 7, 18, 19, and 23 the others have 4. > > hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?) > > > > -Kristoffer > > > > > > > > > > On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <[email protected]> wrote: > > > > > Hi Kristoffer, > > > For this particular problem. Are many regions on the same > RegionServers? > > > Did you profile those RegionServers? Anything weird on that box? > > > Pings slower might well be an issue. How's the data locality? (You can > > > check on a RegionServer's overview page). > > > If needed, you can issue a major compaction to reestablish local data > on > > > all RegionServers. > > > > > > > > > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny > > > dataset it doesn't matter anyway. > > > > > > 10m rows across 96 regions is just about 100k rows per region. You > won't > > > see many of the nice properties for HBase. > > > Try with 100m (or better 1bn rows). Then we're talking. For anything > > below > > > this you wouldn't want to use HBase anyway. > > > (100k rows I could scan on my phone with a Perl script in less than 1s) > > > > > > > > > With "ping" you mean an actual network ping, or some operation on top > of > > > HBase? > > > > > > > > > -- Lars > > > > > > > > > > > > ________________________________ > > > From: Kristoffer Sjögren <[email protected]> > > > To: [email protected] > > > Sent: Saturday, December 21, 2013 11:17 AM > > > Subject: Performance tuning > > > > > > > > > Hi > > > > > > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the > > last > > > couple of days and need some help. > > > > > > Background. > > > > > > - 23 machine cluster, 32 cores, 4GB heap per RS. > > > - Table t_24 have 24 online regions (24 salt buckets). > > > - Table t_96 have 96 online regions (96 salt buckets). > > > - 10.5 million rows per table. > > > - Count query - select (*) from ... > > > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T > >= 0 > > > and T <= 2147482800) group by A, B, C; > > > > > > What I found ultimately is that region servers 19, 20, 21, 22 and 23 > > > are consistently > > > 2-3x slower than the others. This hurts overall latency pretty bad > since > > > queries are executed in parallel on the RS and then aggregated at the > > > client (through Phoenix). In Hannibal regions spread out evenly over > > region > > > servers, according to salt buckets (phoenix feature, pre-create regions > > and > > > a rowkey prefix). > > > > > > As far as I can tell, there is no network or hardware configuration > > > divergence between the machines. No CPU, network or other notable > > > divergence > > > in Ganglia. No RS metric differences in HBase master console. > > > > > > The only thing that may be of interest is that pings (within the > cluster) > > > to > > > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if > > > this is significant, > > > but I get a bad feeling about it since it match exactly with the RS > that > > > stood out in my performance tests. > > > > > > Any ideas of how I might find the source of this problem? > > > > > > Cheers, > > > -Kristoffer > > > > > >
