Thanks for pointing this out. Kicked off a run of this. Shall get back with breakdowns and how it goes.
-- Piyush From: Mostafa Mokhtar <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, January 3, 2018 at 2:13 PM To: "[email protected]" <[email protected]> Subject: Re: Issues running compute incremental stats in Impala - alter not possible When you looked at the Web-UI while compute stats was running what did you find? In general compute stats has 3 phases 1. Get row count per partition using something like select count(*), partition_id from foo group by partition_id 2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size (CN), Avg size (CN) from foo 3. Persist the captured stats in Hive Meta store For steps #1 and #2 you should be able to get a good read on progress using the "Scan Progress" column. Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the progress there. And if the table is large in terms of on disk size I expect steps #2 to dominate the time. If the operation is CPU not IO bound increasing mt_dop should give you good speedup, yet I don't recommend a value greater than 16. On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <[email protected]<mailto:[email protected]>> wrote: Thanks Alex and Mostafa. I tried running compute stats full a couple of weeks back on this table and it was still going 4 hours later (and I didn’t see any progress indication on the Impala web UI). I’ll try and hunt through the Hive metastore log files and see if I can find anything. Is there something specific you’re looking for in the show create table output? I can dump that here (rather than the full table definition and details which is pretty verbose and I might need to check if it’s ok to share externally). -- Piyush From: Mostafa Mokhtar <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, January 3, 2018 at 1:48 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Issues running compute incremental stats in Impala - alter not possible Also check the Hive Metastore log files. In general if the table has a large number of partitions incremental stats will have very large overhead in terms of metadata. I would recommend running "compute stats bi_ full" then manually set the row count for newly added partitions whenever possible. On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <[email protected]<mailto:[email protected]>> wrote: Thanks for the report. I have not seen this issue. Looks like the alter RPC is rejected by the Hive Metastore. Maybe looking into the Hive/Metastore logs would help. The SHOW CREATE TABLE output might also help us debug. On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <[email protected]<mailto:[email protected]>> wrote: Hi folks, I’m running into some issues when I try to compute incremental stats in Impala that I was hoping someone would be able to help with. I’m able to ‘compute stats’ in Impala on my smaller tables just fine. When I try computing stats incrementally for one of my larger tables, I seem to be running into this error: > compute incremental stats bi_ full partition > (param1=0,day='2017-10-04',hour=00,host_platform='EU'); Query: compute incremental stats bi_full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU') WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive Metastore: CAUSED BY: InvalidOperationException: alter is not possible Looking at impalad.INFO and catalogd.INFO I don’t see any additional details. I verified that I’m the owner of the tables in HDFS. Has anyone run into this issue in the past? Any workarounds? Thanks, -- Piyush
