It seems to be disabled across the cluster. Abe
On Tue, Oct 21, 2014 at 9:04 AM, Puneet Kumar Ojha < [email protected]> wrote: > Please check IPV6 , if enabled …disable it and synchronize the ntpd and > restart. It might help. > > > > *From:* Abe Weinograd [mailto:[email protected]] > *Sent:* Tuesday, October 21, 2014 6:19 PM > *To:* user; lars hofhansl > > *Subject:* Re: count on large table > > > > Hi Lars, > > > > We have 10 Region Servers and 2 1TB on each. The table is not salted, but > we pre split regions when we bulk load so that we force equal distribution > of our data. the data is relatively distributed across our region servers, > with no one Region Server being the "long tail." > > > > I don't have any metrics from Ganglia. We are running CDH on EC2, for > what it is work. The CPUs spike to 100% and IO jumps pretty equallyon the > Region Servers. Attached is the RS log from one of them when all I am > doing on the entire cluster is a COUNT in phoenix. > > > > Thanks again for your help, > > Abe > > > > > > > > On Tue, Oct 14, 2014 at 3:10 AM, lars hofhansl <[email protected]> wrote: > > Back on the envelope math - assuming disks that can sustain 120mb/s - > suggests you'd need about 17 disks 100% busy in total to pull 120gb off the > disks in 60s. (i.e. at least 6 servers completely utilizing all of their > disk). How many server do you have? HBase/HDFS will likely not quite max > out all disks so your 10 machines are cutting that close. > > > > Not concerned about the 250 regions - at least not for this. > > > > Are all machines/disks/CPUs equally busy? Is the table salted? > > Note that HBase's block cache stores data uncompressed, and hence your > dataset likely does not fit into the aggregate block cache. Your query > might run sightly better with the /*+ NO_CACHE */ hint. > > > > Now from your 187541ms number, things look worse, though. > > Do you have OTSDB or Ganglia to record metrics of that cluster? If so can > you share some graphs of IO/CPU during the query time. > > Any chance to attach a profiler to one of the busy region server, or at > least get us stack trace? > > > > Thanks. > > > > -- Lars > > > ------------------------------ > > *From:* Abe Weinograd <[email protected]> > *To:* user <[email protected]>; lars hofhansl <[email protected]> > *Sent:* Monday, October 13, 2014 9:30 AM > > > *Subject:* Re: count on large table > > > > Hi Lars, > > > > Thanks for following up. > > > > Table Size - 120G doing a du on HDFS. We are using Snappy compression on > the table. > > Column Family - We have 1 column family for all columns and are using the > Phoenix default one. > > Regions - right now we have a ton of regions (250) because we pre split to > help out bulk loads. I haven't collapsed them yet, but in a DEV > environment that is configured the same way, we have ~50 regions and > experience the same performance issues. I am planning on squaring this > away and trying again. > > Resource Utilization - Really high CPU usage on the region servers and > noticing a spike in IO too. > > > > Based on your questions and what I know, the # of regions needs to be > compacted first, though I am not sure this is going to solve my issue. the > data nodes in HDFS have 3 1TB disks so I am not convinced that my IO is the > bottleneck here. > > > > Thanks, > > Abe > > > > > > On Thu, Oct 9, 2014 at 8:36 PM, lars hofhansl <[email protected]> wrote: > > Hi Abe, > > > > this is interesting. > > > > How big are your rows (i.e. how much data is in the table, you tell with > du in HDFS)? And how many columns do you have? Any column families? > > How many regions are in this table? (you can tell that through the HBase > HMaster UI page) > > When you execute the query, are all HBase region servers busy? Do you see > IO, or just high CPU? > > > > Client batching won't help with an aggregate (such as count) where not > much data is transferred back to the client. > > > > Thanks. > > > > -- Lars > > > ------------------------------ > > *From:* Abe Weinograd <[email protected]> > *To:* user <[email protected]> > *Sent:* Wednesday, October 8, 2014 9:15 AM > *Subject:* Re: count on large table > > > > Good point. I have to figure out how to do that in a SQL Tool like > Squirrel or workbench. > > > > Is there any obvious thing i can do to help tune this? I know that's a > loaded question. My client scanner batches are 1000 (also tried 10000 with > no luck). > > > > Thanks, > > Abe > > > > > > On Tue, Oct 7, 2014 at 9:09 PM, [email protected] < > [email protected]> wrote: > > Hi, Abe > > Maybe setting the following property would help... > > <property> > <name>phoenix.query.timeoutMs</name> > <value>3600000</value> > > </property> > > > > Thanks, > > Sun > > > ------------------------------ > ------------------------------ > > > > *From:* Abe Weinograd <[email protected]> > > *Date:* 2014-10-08 04:34 > > *To:* user <[email protected]> > > *Subject:* count on large table > > I have a table with 1B rows. I know this can is very specific to my > environment, but just doing a SELECT COUNT(1) on the table It never > finished. > > > > We have a 10 node cluster with the RS's Heap size at 26GiB and skewed > towards the block cache. In the RS logs, i see a lot of these: > > > > 2014-10-07 16:27:04,942 WARN org.apache.hadoop.ipc.RpcServer: > (responseTooSlow): > {"processingtimems":22770,"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","client":" > 10.10.0.10:44791 > ","starttimems":1412713602172,"queuetimems":0,"class":"HRegionServer","responsesize":8,"method":"Scan"} > > > > They stop eventually, but i the query times out and the query tool > reports: org.apache.phoenix.exception.PhoenixIOException: 187541ms passed > since the last invocation, timeout is currently set to 60000 > > > > Any ideas of where I can start in order to figure this out? > > > > using Phoenix 4.1 on CDH 5.1 (Hbase 0.98.1) > > > > Thanks, > > Abe > > > > > > > > > > >
