Hello everybody, Did anything catch your eye in the two profiles attached to my last email? We're still blocked and can't even COMPUTE STATS once for our tables :-/ I am really curious why it OOMs instead of spilling to disk?
Thanks, Thoralf On Fri, 19 Jan 2018 at 08:24 Thoralf Gutierrez <thoralfgutier...@gmail.com> wrote: > Hey Mostafa, > > Here are two query profiles on two different tables where COMPUTE STATS > OOMed at different steps. The first one OOMed on the first stats query > (counts) and the second one OOMed on the second stats query (NDV, MAX, etc). > > Don't be fooled by the corrupt parquet error, you can still see the > exceeded memory limit further down. I believe I am getting > https://issues.apache.org/jira/browse/IMPALA-5197 > > Thanks a lot in advance! > Thoralf > > On Thu, 18 Jan 2018 at 22:30 Alexander Behm <alex.b...@cloudera.com> > wrote: > >> The documentation has good overview of the limitations and caveats: >> >> https://impala.apache.org/docs/build/html/topics/impala_perf_stats.html#perf_stats_incremental >> >> On Thu, Jan 18, 2018 at 7:29 PM, Fawze Abujaber <fawz...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I didn’t in the documentation of the incremental compute stats any >>> limitations, >>> >>> Is it size limit or memory limit ( 200 MB)? >>> >>> Why should compute stats successes and incremental compute stats not? >>> >>> I’m upgrading my cluster at Sunday as the incremental compute stats was >>> one of the incentives :( >>> >>> On Fri, 19 Jan 2018 at 4:13 Mostafa Mokhtar <mmokh...@cloudera.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Do you mind sharing the query profile for the query that failed with >>>> OOM? there should be some clues on to why the OOM is happening. >>>> >>>> Thanks >>>> Mostafa >>>> >>>> >>>> On Thu, Jan 18, 2018 at 5:54 PM, Thoralf Gutierrez < >>>> thoralfgutier...@gmail.com> wrote: >>>> >>>>> Hello everybody! >>>>> >>>>> (I am using Impala 2.8.0, out of Cloudera Express 5.11.1) >>>>> >>>>> I now understand that we are _highly_ recommended to compute stats for >>>>> our tables so I have decided to make sure we do. >>>>> >>>>> On my quest to do so, I started with a first `COMPUTE INCREMENTAL >>>>> STATS my_big_partitioned_parquet_table` and ran into : >>>>> >>>>> > HiveServer2Error: AnalysisException: Incremental stats size estimate >>>>> exceeds 200.00MB. Please try COMPUTE STATS instead. >>>>> >>>>> I found out that we could increase this limit, so I set >>>>> inc_stats_size_limit_bytes to 1073741824 (1GB) >>>>> >>>>> > HiveServer2Error: AnalysisException: Incremental stats size estimate >>>>> exceeds 1.00GB. Please try COMPUTE STATS instead. >>>>> >>>>> So I ended up trying to COMPUTE STATS for the whole table instead of >>>>> incrementally, but I still hit memory limits when computing counts with my >>>>> mem_limit at 34359738368 (32GB) >>>>> >>>>> > Process: memory limit exceeded. Limit=32.00 GB Total=48.87 GB >>>>> Peak=51.97 GB >>>>> >>>>> 1. Am I correct to assume that even if I did not have enough memory, >>>>> the query should spill to disk and just be slower instead of OOMing? >>>>> 2. Any other recommendation on how else I could go about computing >>>>> some stats on my big partitioned parquet table? >>>>> >>>>> Thanks a lot! >>>>> Thoralf >>>>> >>>>> >>>> >>