What is your scanner caching set to? I haven't worked with Phoenix so I'm not sure what defaults if any it uses. In 0.94 HBase, I believe the default caching is set to 1. This could be exacerbating your problem.
On Sat, Dec 21, 2013 at 7:52 PM, Kristoffer Sjögren <sto...@gmail.com>wrote: > Yes, im waiting on a response from them. It's just.. the ping difference is > tiny while the scan difference is huge, 2sec vs 4sec. > > Note the ping I mentioned is within the cluster. Ping from outside into the > cluster have hardly any (if at all) noticeable difference. > > > On Sat, Dec 21, 2013 at 8:37 PM, Pradeep Gollakota <pradeep...@gmail.com > >wrote: > > > Do you know if machines 19-23 are on a different rack? It seems to me > that > > your problem might be a networking problem. Whether it is hardware, > > configuration or something else entirely, I'm not sure. It might be > > worthwhile to talk to your systems administrator to see why pings to > these > > machines are slow. What are the pings like from a bad RS to another bad > RS? > > > > > > On Sat, Dec 21, 2013 at 7:17 PM, Kristoffer Sjögren <sto...@gmail.com > > >wrote: > > > > > Hi > > > > > > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the > > last > > > couple of days and need some help. > > > > > > Background. > > > > > > - 23 machine cluster, 32 cores, 4GB heap per RS. > > > - Table t_24 have 24 online regions (24 salt buckets). > > > - Table t_96 have 96 online regions (96 salt buckets). > > > - 10.5 million rows per table. > > > - Count query - select (*) from ... > > > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T > >= 0 > > > and T <= 2147482800) group by A, B, C; > > > > > > What I found ultimately is that region servers 19, 20, 21, 22 and 23 > > > are consistently > > > 2-3x slower than the others. This hurts overall latency pretty bad > since > > > queries are executed in parallel on the RS and then aggregated at the > > > client (through Phoenix). In Hannibal regions spread out evenly over > > region > > > servers, according to salt buckets (phoenix feature, pre-create regions > > and > > > a rowkey prefix). > > > > > > As far as I can tell, there is no network or hardware configuration > > > divergence between the machines. No CPU, network or other notable > > > divergence > > > in Ganglia. No RS metric differences in HBase master console. > > > > > > The only thing that may be of interest is that pings (within the > cluster) > > > to > > > bad RS is about 2-3x slower, around 0.050ms vs 0.130ms. Not sure if > > > this is significant, > > > but I get a bad feeling about it since it match exactly with the RS > that > > > stood out in my performance tests. > > > > > > Any ideas of how I might find the source of this problem? > > > > > > Cheers, > > > -Kristoffer > > > > > >