Hello everybody!

(I am using Impala 2.8.0, out of Cloudera Express 5.11.1)

I now understand that we are _highly_ recommended to compute stats for our
tables so I have decided to make sure we do.

On my quest to do so, I started with a first `COMPUTE INCREMENTAL STATS
my_big_partitioned_parquet_table` and ran into :

> HiveServer2Error: AnalysisException: Incremental stats size estimate
exceeds 200.00MB. Please try COMPUTE STATS instead.

I found out that we could increase this limit, so I set
inc_stats_size_limit_bytes to 1073741824 (1GB)

> HiveServer2Error: AnalysisException: Incremental stats size estimate
exceeds 1.00GB. Please try COMPUTE STATS instead.

So I ended up trying to COMPUTE STATS for the whole table instead of
incrementally, but I still hit memory limits when computing counts with my
mem_limit at 34359738368 (32GB)

> Process: memory limit exceeded. Limit=32.00 GB Total=48.87 GB Peak=51.97
GB

1. Am I correct to assume that even if I did not have enough memory, the
query should spill to disk and just be slower instead of OOMing?
2. Any other recommendation on how else I could go about computing some
stats on my big partitioned parquet table?

Thanks a lot!
Thoralf

Reply via email to