>
> for example if i have a table with 3 partitions year=2018,2019 and 2020,
> and hdfs files only for 2019 and 2020, when running drop
> partition year<=2018, yes then i see only  partitions of 2019 and 2020 but
> if i run recover partitions i will see 2018, and i assume this related to
> the cache.
>
This should not happen if you have deleted the partition directory of
year=2018. Could you make sure that the location of partition (year=2018)
is no longer exists? The locations are shown in the last column of SHOW
PARTITIONS.

On Tue, May 26, 2020 at 11:46 AM Fawze Abujaber <fawz...@gmail.com> wrote:

> Hi Quanlong,
>
> Thanks for your response,
>
> Yes we are using the DROP partition with expression on few impala internal
> tables.
>
> We have more than 200 external tables all are partitions by year,month and
> day and each has different retention policy and we are managing the
> retention in the HDFS side by deleting the files.
> seems like no straight forward solution for solving my issue, also i
> noticed that running DROP PARTITION and then recover partitions will get me
> back to the same situation even the hdfs files are not exist.
>
> for example if i have a table with 3 partitions year=2018,2019 and 2020,
> and hdfs files only for 2019 and 2020, when running drop
> partition year<=2018, yes then i see only  partitions of 2019 and 2020 but
> if i run recover partitions i will see 2018, and i assume this related to
> the cache.
>
> On Mon, May 25, 2020 at 5:55 PM Quanlong Huang <huangquanl...@gmail.com>
> wrote:
>
>> Hi Fawze,
>>
>> 1) are these partitions impacting the hive metastore performance and
>>> memory used by the metastore?
>>
>> Do you drop the partitions after you clean the parquet files? If not, the
>> partition metadata still exists in hive metastore, which may impact its
>> performance when the partition number piles up.
>>
>> 2) Is there a way to drop these partitions without running ALTER TABLE
>>> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make
>>> this change but this is a heavy solution.
>>>
>> I think explicitly dropping these partitions is the simplest way. If your
>> requirement is doing this by one statement, you can use expressions in the
>> partition clause to specify several partitions at once, e.g. ALTER TABLE
>> xxx DROP IF EXISTS PARTITION (year = 2018 AND month = 4 AND day < 15).
>>
>> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created
>>> partitions?
>>>
>> Yes, it compares partitions with cached meta and all partition
>> directories on HDFS to get the diff. Then create thoes partitions in Hive
>> and load their meta.
>>
>> Quanlong
>>
>> On Fri, May 22, 2020 at 5:09 PM Fawze Abujaber <fawz...@gmail.com> wrote:
>>
>>> Hello community,
>>>
>>> I have a spark job that writes parquet files partitioned by year, month
>>> and day, in top of these parquet files i'm creating different impala
>>> external tables, i have a retention job that cleans the parquet files in a
>>> daily basis, in the impala side i'm running a daily ALTER TABLE xxxxxx
>>> recover partitions and hourly REFRESH TABLE xxxx, when i'm running show
>>> partitions i see old partitions and they shows up as Zero size and number
>>> of files.
>>>
>>> 1) are these partitions impacting the hive metastore performance and
>>> memory used by the metastore?
>>>
>>> 2) Is there a way to drop these partitions without running ALTER TABLE
>>> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make
>>> this change but this is a heavy solution.
>>>
>>> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created
>>> partitions?
>>>
>>> --
>>> Take Care
>>> Fawze Abujaber
>>>
>>
>
> --
> Take Care
> Fawze Abujaber
>

Reply via email to