Hi Quanlong, Thanks for your response,
Yes we are using the DROP partition with expression on few impala internal tables. We have more than 200 external tables all are partitions by year,month and day and each has different retention policy and we are managing the retention in the HDFS side by deleting the files. seems like no straight forward solution for solving my issue, also i noticed that running DROP PARTITION and then recover partitions will get me back to the same situation even the hdfs files are not exist. for example if i have a table with 3 partitions year=2018,2019 and 2020, and hdfs files only for 2019 and 2020, when running drop partition year<=2018, yes then i see only partitions of 2019 and 2020 but if i run recover partitions i will see 2018, and i assume this related to the cache. On Mon, May 25, 2020 at 5:55 PM Quanlong Huang <huangquanl...@gmail.com> wrote: > Hi Fawze, > > 1) are these partitions impacting the hive metastore performance and >> memory used by the metastore? > > Do you drop the partitions after you clean the parquet files? If not, the > partition metadata still exists in hive metastore, which may impact its > performance when the partition number piles up. > > 2) Is there a way to drop these partitions without running ALTER TABLE xxx >> DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make >> this change but this is a heavy solution. >> > I think explicitly dropping these partitions is the simplest way. If your > requirement is doing this by one statement, you can use expressions in the > partition clause to specify several partitions at once, e.g. ALTER TABLE > xxx DROP IF EXISTS PARTITION (year = 2018 AND month = 4 AND day < 15). > > 3) Is ALTER TABLE xxxx recover partitions looking on the newly created >> partitions? >> > Yes, it compares partitions with cached meta and all partition directories > on HDFS to get the diff. Then create thoes partitions in Hive and load > their meta. > > Quanlong > > On Fri, May 22, 2020 at 5:09 PM Fawze Abujaber <fawz...@gmail.com> wrote: > >> Hello community, >> >> I have a spark job that writes parquet files partitioned by year, month >> and day, in top of these parquet files i'm creating different impala >> external tables, i have a retention job that cleans the parquet files in a >> daily basis, in the impala side i'm running a daily ALTER TABLE xxxxxx >> recover partitions and hourly REFRESH TABLE xxxx, when i'm running show >> partitions i see old partitions and they shows up as Zero size and number >> of files. >> >> 1) are these partitions impacting the hive metastore performance and >> memory used by the metastore? >> >> 2) Is there a way to drop these partitions without running ALTER TABLE >> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make >> this change but this is a heavy solution. >> >> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created >> partitions? >> >> -- >> Take Care >> Fawze Abujaber >> > -- Take Care Fawze Abujaber