See http://phoenix.apache.org/update_statistics.html for more info.
Thanks, James On Thursday, January 15, 2015, [email protected] < [email protected]> wrote: > Hi, James > Really appreciated for your detailed illustration. I issue UPDATE > STATISTICS <table> and rerun the count (*) query > then found query performance has achived better results. Does the > statistics collection schema affects all of query or > that just affects aggreate query? How does that command improve query > performance? That would be fine if you > can explain a little : ) > > Our problem still occurs for now and we need to investigate more deep in > the query. I will check the configurations you > provide in the late tests. > > Thanks, > Sun > > ------------------------------ > ------------------------------ > > CertusNet > > > *From:* James Taylor > <javascript:_e(%7B%7D,'cvml','[email protected]');> > *Date:* 2015-01-15 17:46 > *To:* user <javascript:_e(%7B%7D,'cvml','[email protected]');> > *Subject:* Re: Re: rpc timeout when count on large table > Those settings (one or the other - you wouldn't set both) drive the > amount of parallelization done (i.e. the number or byte size of each > parallel chunk). > > What do you get when you run the following queries? > SELECT COUNT(*) FROM SYSTEM.STATS WHERE PHYSICAL_NAME = '<your full > table name>'; > SELECT SUM(GUIDE_POSTS_COUNT) FROM SYSTEM.STATS WHERE PHYSICAL_NAME = > '<your full table name>'; > > As a test, try adding the following config parameter to > hbase-sites.xml on each region server: > <property> > <name>phoenix.stats.guidepost.per.region</name> > <value>1</value> > </property> > > After setting it, bounce your cluster and run the following to update > your stats: > UPDATE STATISTICS <your full table name> > > Then run your count(*) query again and see if there's any impact. Try > setting the phoenix.stats.guidepost.per.region successively higher to > 2, 4, 8 (following above steps) and see if it makes a difference in > your query performance. > > Thanks, > James > > On Thu, Jan 15, 2015 at 1:23 AM, [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');> > <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > Hi, James > > Yes, we are running 4.2.2 > > Neither of these two configs are overridden. Do these configuration only > > affects stats collection? > > I had not searched for the regionserver log for refering if any major > > compaction is running. > > > > Just curious about the query performance. Cause we are good to count on > that > > perviously. > > > > Thanks, > > Sun. > > > > ________________________________ > > ________________________________ > > > > CertusNet > > > > > > From: James Taylor > > Date: 2015-01-15 17:10 > > To: user > > Subject: Re: rpc timeout when count on large table > > You're on 4.2.2, Sun? Have you overridden either of > > phoenix.stats.guidepost.width or phoenix.stats.guidepost.per.region? > > These control the size of each parallel scan. I assume you've run a > > major compaction on the table at some point? > > > > Thanks, > > James > > > > On Wed, Jan 14, 2015 at 7:06 PM, [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');> > > <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Hi, all > >> > >> When counting on large table, we got the following exception > >> org.apache.hadoop.hbase.ipc.RpcClient$CallTimeoutException: Call id=, > >> waitTime=69714 rpcTimetout=60000 > >> > >> How would that be resolved? Table size goes to 17.3G with issuing hdfs > dfs > >> -du. Table with 90+ columns > >> and only one column family F. Compression codec is snappy. > >> > >> Thanks, > >> Sun. > >> > >> ________________________________ > >> ________________________________ > >> > >> CertusNet > >> > >> > > > > > > > > >
