Renee - I had to deal with a similar scenario recently. I wouldn't do it manually, its easier to make mistake this way.
I would recommend you do incremental stats for each partition. Thanks, Krishna C Thirumalasetty On Tue, Aug 28, 2018 at 1:33 PM Renee Schnabel <renee.schna...@availity.com> wrote: > Hi all, > > > > We recently had an Cloudera Impala consult for an onsite consultation and he > told us under no circumstances to run Compute Incremental Stats, but rather > to compute the stats manually by running Alter table commands. I understand > why incremental stats is a performance hazard based on the memory > requirements. I am however concerned about manually updating stats, > especially as our tables are all partitioned. I know that work is being done > to improve compute incremental stats, but would love to understand what > everyone is doing in the meantime. We have Impala (impalad version 2.12) > over Kudu( kudu 1.7.0-cdh5.15.0) > > > > > > Our tables are way too large (63 billion rows +) to run compute stats on a > regular basis. I have run some comparative queries based on tables having > stats vs not having stats and there is obviously a marked difference in > memory and performance when stats do not exist. > > > > If this is the only way, I have some questions around doing this manually. Is > the general consensus to run them to update the total number of rows for the > table and then for the partitions as well? For example: alter table > big_table_part partition(year=2016, month=4) set tblproperties > ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true'); > > alter table big_table_part set tblproperties ('numRows'='1030000', > 'STATS_GENERATED_VIA_STATS_TASK'='true'); > > > > And should one include updating all the other columnar statistical > information around numDVs, numNulls, avgSize, maxSize? > > > > Also, the consensus is to do this when a table volume has changed by 30%, > so in my mind this means keeping track of the table growth in a separate > table possibly, unless there is some other function that I have not > stumbled upon. > > > > I would certainly appreciate any real-world examples and feedback > regarding how this is realistically being done in production. > > > > Thank you, > > > > > > *Renee Schnabel* > Senior Database Administrator/Team Lead > > Availity | 5555 Gate Parkway, Jacksonville FL 32256 > P 904.891.9405 > > renee.schna...@availity.com > > > > [image: 00 AV Logo RGB] <http://www.availity.com/> > > > > [image: Facebook_Icon_resized_greyscale] > <http://www.facebook.com/home.php#!/pages/Availity/607761835906201> [image: > Linked_In_Icon_resized_greyscale] > <http://www.linkedin.com/company/availity>[image: > Twitter_Icon_resized_greyscale] <https://twitter.com/AvailityHRD> > > > > > > > > *The information contained in this e-mail may be privileged and > confidential under applicable law. It is intended solely for the use of the > person or firm named above. If the reader of this e-mail is not the > intended recipient, please notify us immediately by returning the e-mail to > the originating e-mail address. Availity, LLC is not responsible for errors > or omissions in this e-mail message. Any personal comments made in this > e-mail do not reflect the views of Availity, LLC.* > > >