> > for example if i have a table with 3 partitions year=2018,2019 and 2020, > and hdfs files only for 2019 and 2020, when running drop > partition year<=2018, yes then i see only partitions of 2019 and 2020 but > if i run recover partitions i will see 2018, and i assume this related to > the cache. > This should not happen if you have deleted the partition directory of year=2018. Could you make sure that the location of partition (year=2018) is no longer exists? The locations are shown in the last column of SHOW PARTITIONS.
On Tue, May 26, 2020 at 11:46 AM Fawze Abujaber <fawz...@gmail.com> wrote: > Hi Quanlong, > > Thanks for your response, > > Yes we are using the DROP partition with expression on few impala internal > tables. > > We have more than 200 external tables all are partitions by year,month and > day and each has different retention policy and we are managing the > retention in the HDFS side by deleting the files. > seems like no straight forward solution for solving my issue, also i > noticed that running DROP PARTITION and then recover partitions will get me > back to the same situation even the hdfs files are not exist. > > for example if i have a table with 3 partitions year=2018,2019 and 2020, > and hdfs files only for 2019 and 2020, when running drop > partition year<=2018, yes then i see only partitions of 2019 and 2020 but > if i run recover partitions i will see 2018, and i assume this related to > the cache. > > On Mon, May 25, 2020 at 5:55 PM Quanlong Huang <huangquanl...@gmail.com> > wrote: > >> Hi Fawze, >> >> 1) are these partitions impacting the hive metastore performance and >>> memory used by the metastore? >> >> Do you drop the partitions after you clean the parquet files? If not, the >> partition metadata still exists in hive metastore, which may impact its >> performance when the partition number piles up. >> >> 2) Is there a way to drop these partitions without running ALTER TABLE >>> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make >>> this change but this is a heavy solution. >>> >> I think explicitly dropping these partitions is the simplest way. If your >> requirement is doing this by one statement, you can use expressions in the >> partition clause to specify several partitions at once, e.g. ALTER TABLE >> xxx DROP IF EXISTS PARTITION (year = 2018 AND month = 4 AND day < 15). >> >> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created >>> partitions? >>> >> Yes, it compares partitions with cached meta and all partition >> directories on HDFS to get the diff. Then create thoes partitions in Hive >> and load their meta. >> >> Quanlong >> >> On Fri, May 22, 2020 at 5:09 PM Fawze Abujaber <fawz...@gmail.com> wrote: >> >>> Hello community, >>> >>> I have a spark job that writes parquet files partitioned by year, month >>> and day, in top of these parquet files i'm creating different impala >>> external tables, i have a retention job that cleans the parquet files in a >>> daily basis, in the impala side i'm running a daily ALTER TABLE xxxxxx >>> recover partitions and hourly REFRESH TABLE xxxx, when i'm running show >>> partitions i see old partitions and they shows up as Zero size and number >>> of files. >>> >>> 1) are these partitions impacting the hive metastore performance and >>> memory used by the metastore? >>> >>> 2) Is there a way to drop these partitions without running ALTER TABLE >>> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make >>> this change but this is a heavy solution. >>> >>> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created >>> partitions? >>> >>> -- >>> Take Care >>> Fawze Abujaber >>> >> > > -- > Take Care > Fawze Abujaber >