Btw, I have tried different number of rows with similar symptom on the bad RS.
On Sat, Dec 21, 2013 at 10:28 PM, Kristoffer Sjögren <[email protected]>wrote: > @pradeep scanner caching should not be an issue since data transferred to > the client is tiny. > > @lars Yes, the data might be small for this particular case :-) > > I have checked everything I can think of on RS (CPU, network, Hbase > console, uptime etc) and nothing stands out, except for the pings (network > pings). > There are 5 regions on 7, 18, 19, and 23 the others have 4. > hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?) > > -Kristoffer > > > > On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <[email protected]> wrote: > >> Hi Kristoffer, >> For this particular problem. Are many regions on the same RegionServers? >> Did you profile those RegionServers? Anything weird on that box? >> Pings slower might well be an issue. How's the data locality? (You can >> check on a RegionServer's overview page). >> If needed, you can issue a major compaction to reestablish local data on >> all RegionServers. >> >> >> 32 cores matched with only 4G of RAM is a bit weird, but with your tiny >> dataset it doesn't matter anyway. >> >> 10m rows across 96 regions is just about 100k rows per region. You won't >> see many of the nice properties for HBase. >> Try with 100m (or better 1bn rows). Then we're talking. For anything >> below this you wouldn't want to use HBase anyway. >> (100k rows I could scan on my phone with a Perl script in less than 1s) >> >> >> With "ping" you mean an actual network ping, or some operation on top of >> HBase? >> >> >> -- Lars >> >> >> >> ________________________________ >> From: Kristoffer Sjögren <[email protected]> >> To: [email protected] >> Sent: Saturday, December 21, 2013 11:17 AM >> Subject: Performance tuning >> >> >> Hi >> >> I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the last >> couple of days and need some help. >> >> Background. >> >> - 23 machine cluster, 32 cores, 4GB heap per RS. >> - Table t_24 have 24 online regions (24 salt buckets). >> - Table t_96 have 96 online regions (96 salt buckets). >> - 10.5 million rows per table. >> - Count query - select (*) from ... >> - Group by query - select A, B, C sum(D) from ... where (A = 1 and T >= 0 >> and T <= 2147482800) group by A, B, C; >> >> What I found ultimately is that region servers 19, 20, 21, 22 and 23 >> are consistently >> 2-3x slower than the others. This hurts overall latency pretty bad since >> queries are executed in parallel on the RS and then aggregated at the >> client (through Phoenix). In Hannibal regions spread out evenly over >> region >> servers, according to salt buckets (phoenix feature, pre-create regions >> and >> a rowkey prefix). >> >> As far as I can tell, there is no network or hardware configuration >> divergence between the machines. No CPU, network or other notable >> divergence >> in Ganglia. No RS metric differences in HBase master console. >> >> The only thing that may be of interest is that pings (within the cluster) >> to >> bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if >> this is significant, >> but I get a bad feeling about it since it match exactly with the RS that >> stood out in my performance tests. >> >> Any ideas of how I might find the source of this problem? >> >> Cheers, >> -Kristoffer >> > >
