Renee - I had to deal with a similar scenario recently. I wouldn't do it
manually, its easier to make mistake this way.

I would recommend you do incremental stats for each partition.


Thanks,

Krishna C Thirumalasetty

On Tue, Aug 28, 2018 at 1:33 PM Renee Schnabel <renee.schna...@availity.com>
wrote:

> Hi all,
>
>
>
> We recently had an Cloudera Impala consult for an onsite consultation and he 
> told us under no circumstances to run Compute Incremental Stats, but rather 
> to compute the stats manually by running Alter table commands. I understand 
> why incremental stats is a performance hazard based on the memory 
> requirements. I am however concerned about manually updating stats, 
> especially as our tables are all partitioned. I know that work is being done 
> to improve compute incremental stats, but would love to understand what 
> everyone is doing in the meantime. We have Impala (impalad version 2.12)  
> over Kudu( kudu 1.7.0-cdh5.15.0)
>
>
>
>
>
> Our tables are way too large (63 billion rows +) to run compute stats on a
> regular basis. I have run some comparative queries based on tables having
> stats vs not having stats and there is obviously a marked difference in
> memory and performance when stats do not exist.
>
>
>
> If this is the only way, I have some questions around doing this manually. Is 
> the general consensus to run them to update the total number of rows for the 
> table and then for the partitions as well?  For example: alter table 
> big_table_part partition(year=2016, month=4) set tblproperties 
> ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
>
> alter table big_table_part  set tblproperties ('numRows'='1030000',
> 'STATS_GENERATED_VIA_STATS_TASK'='true');
>
>
>
> And should one include updating all the other columnar statistical
> information around  numDVs, numNulls, avgSize, maxSize?
>
>
>
> Also, the consensus is to do this when a table volume has changed by 30%,
> so in my mind this means keeping track of the table growth in a separate
> table possibly, unless there is some other function that I have not
> stumbled upon.
>
>
>
> I would certainly appreciate any real-world examples and feedback
> regarding how this is realistically being done in production.
>
>
>
> Thank you,
>
>
>
>
>
> *Renee Schnabel*
> Senior Database Administrator/Team Lead
>
> Availity | 5555 Gate Parkway, Jacksonville FL 32256
> P 904.891.9405
>
> renee.schna...@availity.com
>
>
>
> [image: 00 AV Logo RGB] <http://www.availity.com/>
>
>
>
> [image: Facebook_Icon_resized_greyscale]
> <http://www.facebook.com/home.php#!/pages/Availity/607761835906201> [image:
> Linked_In_Icon_resized_greyscale]
> <http://www.linkedin.com/company/availity>[image:
> Twitter_Icon_resized_greyscale] <https://twitter.com/AvailityHRD>
>
>
>
>
>
>
>
> *The information contained in this e-mail may be privileged and
> confidential under applicable law. It is intended solely for the use of the
> person or firm named above. If the reader of this e-mail is not the
> intended recipient, please notify us immediately by returning the e-mail to
> the originating e-mail address. Availity, LLC is not responsible for errors
> or omissions in this e-mail message. Any personal comments made in this
> e-mail do not reflect the views of Availity, LLC.*
>
>
>

Reply via email to