The documentation has good overview of the limitations and caveats: https://impala.apache.org/docs/build/html/topics/impala_perf_stats.html#perf_stats_incremental
On Thu, Jan 18, 2018 at 7:29 PM, Fawze Abujaber <fawz...@gmail.com> wrote: > Hi, > > I didn’t in the documentation of the incremental compute stats any > limitations, > > Is it size limit or memory limit ( 200 MB)? > > Why should compute stats successes and incremental compute stats not? > > I’m upgrading my cluster at Sunday as the incremental compute stats was > one of the incentives :( > > On Fri, 19 Jan 2018 at 4:13 Mostafa Mokhtar <mmokh...@cloudera.com> wrote: > >> Hi, >> >> Do you mind sharing the query profile for the query that failed with OOM? >> there should be some clues on to why the OOM is happening. >> >> Thanks >> Mostafa >> >> >> On Thu, Jan 18, 2018 at 5:54 PM, Thoralf Gutierrez < >> thoralfgutier...@gmail.com> wrote: >> >>> Hello everybody! >>> >>> (I am using Impala 2.8.0, out of Cloudera Express 5.11.1) >>> >>> I now understand that we are _highly_ recommended to compute stats for >>> our tables so I have decided to make sure we do. >>> >>> On my quest to do so, I started with a first `COMPUTE INCREMENTAL STATS >>> my_big_partitioned_parquet_table` and ran into : >>> >>> > HiveServer2Error: AnalysisException: Incremental stats size estimate >>> exceeds 200.00MB. Please try COMPUTE STATS instead. >>> >>> I found out that we could increase this limit, so I set >>> inc_stats_size_limit_bytes to 1073741824 (1GB) >>> >>> > HiveServer2Error: AnalysisException: Incremental stats size estimate >>> exceeds 1.00GB. Please try COMPUTE STATS instead. >>> >>> So I ended up trying to COMPUTE STATS for the whole table instead of >>> incrementally, but I still hit memory limits when computing counts with my >>> mem_limit at 34359738368 (32GB) >>> >>> > Process: memory limit exceeded. Limit=32.00 GB Total=48.87 GB >>> Peak=51.97 GB >>> >>> 1. Am I correct to assume that even if I did not have enough memory, the >>> query should spill to disk and just be slower instead of OOMing? >>> 2. Any other recommendation on how else I could go about computing some >>> stats on my big partitioned parquet table? >>> >>> Thanks a lot! >>> Thoralf >>> >>> >>