Re: Computing stats on big partitioned parquet tables

2018-01-23 Thread Mostafa Mokhtar
At first glance I believe you have too many partitions which will cause you problems down the line, recommendation is keep this number in the 10K range, more partitions means more load on HMS and a larger metadata foot print. If the goal is to speedup read queries I recommend you check this blog:

Re: Computing stats on big partitioned parquet tables

2018-01-23 Thread Tim Armstrong
It looks like there's a bunch of untracked memory (the profile shows that the tracked memory consumption of the query operators is actually quite low). Given the number of files, I suspect that there's some sort of metadata or control structure that is accumulating throughout query execution. I'd b

Re: Computing stats on big partitioned parquet tables

2018-01-23 Thread Thoralf Gutierrez
Hello everybody, Did anything catch your eye in the two profiles attached to my last email? We're still blocked and can't even COMPUTE STATS once for our tables :-/ I am really curious why it OOMs instead of spilling to disk? Thanks, Thoralf On Fri, 19 Jan 2018 at 08:24 Thoralf Gutierrez wrote:

Re: Computing stats on big partitioned parquet tables

2018-01-19 Thread Thoralf Gutierrez
Hey Mostafa, Here are two query profiles on two different tables where COMPUTE STATS OOMed at different steps. The first one OOMed on the first stats query (counts) and the second one OOMed on the second stats query (NDV, MAX, etc). Don't be fooled by the corrupt parquet error, you can still see

Re: Computing stats on big partitioned parquet tables

2018-01-18 Thread Alexander Behm
The documentation has good overview of the limitations and caveats: https://impala.apache.org/docs/build/html/topics/impala_perf_stats.html#perf_stats_incremental On Thu, Jan 18, 2018 at 7:29 PM, Fawze Abujaber wrote: > Hi, > > I didn’t in the documentation of the incremental compute stats any >

Re: Computing stats on big partitioned parquet tables

2018-01-18 Thread Fawze Abujaber
Hi, I didn’t in the documentation of the incremental compute stats any limitations, Is it size limit or memory limit ( 200 MB)? Why should compute stats successes and incremental compute stats not? I’m upgrading my cluster at Sunday as the incremental compute stats was one of the incentives :(

Re: Computing stats on big partitioned parquet tables

2018-01-18 Thread Mostafa Mokhtar
Hi, Do you mind sharing the query profile for the query that failed with OOM? there should be some clues on to why the OOM is happening. Thanks Mostafa On Thu, Jan 18, 2018 at 5:54 PM, Thoralf Gutierrez < thoralfgutier...@gmail.com> wrote: > Hello everybody! > > (I am using Impala 2.8.0, out o