Re: Incremental stats and catalogd data serialization

2018-03-08 Thread Alexander Behm
Hi Miguel, the memory requirement stems from the incremental stats needed to compute the number of distinct values for each column incrementally. If you set the stats manually via ALTER TABLE, then no such incremental stats exist, so there's no memory issue. For manually setting stats, I'd recom

Re: Incremental stats and catalogd data serialization

2018-03-08 Thread Miguel Figueiredo
Hi Alex, Thanks for the feedback. I will the new version and the new way of computing stats when possible. In the meantime we are thinking of computing the stats manually. If we compute stats per partition and for the whole table, will we encounter the same memory limit? Should we compute stats f

Re: Incremental stats and catalogd data serialization

2018-03-07 Thread Alexander Behm
Using incremental stats in your scenario is extremely dangerous and I highly recommend against it. That limitation was put in place to guard clusters against service downtime due to serializing huge tables and hitting JVM limits like the 2GB max array size. Even if the catalogd and impalads stay u