Back on the envelope math - assuming disks that can sustain 120mb/s - suggests 
you'd need about 17 disks 100% busy in total to pull 120gb off the disks in 
60s. (i.e. at least 6 servers completely utilizing all of their disk). How many 
server do you have? HBase/HDFS will likely not quite max out all disks so your 
10 machines are cutting that close.

Not concerned about the 250 regions - at least not for this.

Are all machines/disks/CPUs equally busy? Is the table salted?
Note that HBase's block cache stores data uncompressed, and hence your dataset 
likely does not fit into the aggregate block cache. Your query might run 
sightly better with the /*+ NO_CACHE */ hint.

Now from your 187541ms number, things look worse, though.Do you have OTSDB or 
Ganglia to record metrics of that cluster? If so can you share some graphs of 
IO/CPU during the query time.Any chance to attach a profiler to one of the busy 
region server, or at least get us stack trace?

Thanks. 

-- Lars

      From: Abe Weinograd <[email protected]>
 To: user <[email protected]>; lars hofhansl <[email protected]> 
 Sent: Monday, October 13, 2014 9:30 AM
 Subject: Re: count on large table
   
Hi Lars,
Thanks for following up.
Table Size - 120G doing a du on HDFS.  We are using Snappy compression on the 
table.Column Family - We have 1 column family for all columns and are using the 
Phoenix default one.Regions - right now we have a ton of regions (250) because 
we pre split to help out bulk loads.  I haven't collapsed them yet, but in a 
DEV environment that is configured the same way, we have ~50 regions and 
experience the same performance issues.  I am planning on squaring this away 
and trying again.Resource Utilization - Really high CPU usage on the region 
servers and noticing a spike in IO too.
Based on your questions and what I know, the # of regions needs to be compacted 
first, though I am not sure this is going to solve my issue.  the data nodes in 
HDFS have 3 1TB disks so I am not convinced that my IO is the bottleneck here.
Thanks,Abe


On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <[email protected]> wrote:

Hi Abe,

this is interesting.

How big are your rows (i.e. how much data is in the table, you tell with du in 
HDFS)? And how many columns do you have? Any column families?
How many regions are in this table? (you can tell that through the HBase 
HMaster UI page)
When you execute the query, are all HBase region servers busy? Do you see IO, 
or just high CPU?

Client batching won't help with an aggregate (such as count) where not much 
data is transferred back to the client.

Thanks.
-- Lars

      From: Abe Weinograd <[email protected]>
 To: user <[email protected]> 
 Sent: Wednesday, October 8, 2014 9:15 AM
 Subject: Re: count on large table
   
Good point.  I have to figure out how to do that in a SQL Tool like Squirrel or 
workbench.
Is there any obvious thing i can do to help tune this?  I know that's a loaded 
question.  My client scanner batches are 1000 (also tried 10000 with no luck).
Thanks,Abe


On Tue, Oct 7, 2014 at 9:09 PM, [email protected] <[email protected]> 
wrote:

Hi, AbeMaybe setting the following property would help...<property>
    <name>phoenix.query.timeoutMs</name>
    <value>3600000</value>  </property>
Thanks,Sun


From: Abe WeinogradDate: 2014-10-08 04:34To: userSubject: count on large tableI 
have a table with 1B  rows.  I know this can is very specific to my 
environment, but just doing a SELECT COUNT(1) on the table   It never finished. 
 
We have a 10 node cluster with the RS's Heap size at 26GiB and skewed towards 
the block cache.  In the RS logs, i see a lot of these:
2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: 
(responseTooSlow): 
{"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":"10.10.0.10:44791","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"}

They stop eventually, but i the query times out and the query tool reports: 
org.apache.phoenix.exception.PhoenixIOException: 187541ms passed since the last 
invocation, timeout is currently set to 60000
Any ideas of where I can start in order to figure this out?
using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1)
Thanks,Abe




   



  

Reply via email to