I already checked this and the data is not longer in HDFS and in the show
partitions yes it shown in the location

On Tue, May 26, 2020 at 10:01 AM Quanlong Huang <huangquanl...@gmail.com>
wrote:

> for example if i have a table with 3 partitions year=2018,2019 and 2020,
>> and hdfs files only for 2019 and 2020, when running drop
>> partition year<=2018, yes then i see only  partitions of 2019 and 2020 but
>> if i run recover partitions i will see 2018, and i assume this related to
>> the cache.
>>
> This should not happen if you have deleted the partition directory of
> year=2018. Could you make sure that the location of partition (year=2018)
> is no longer exists? The locations are shown in the last column of SHOW
> PARTITIONS.
>
> On Tue, May 26, 2020 at 11:46 AM Fawze Abujaber <fawz...@gmail.com> wrote:
>
>> Hi Quanlong,
>>
>> Thanks for your response,
>>
>> Yes we are using the DROP partition with expression on few impala
>> internal tables.
>>
>> We have more than 200 external tables all are partitions by year,month
>> and day and each has different retention policy and we are managing the
>> retention in the HDFS side by deleting the files.
>> seems like no straight forward solution for solving my issue, also i
>> noticed that running DROP PARTITION and then recover partitions will get me
>> back to the same situation even the hdfs files are not exist.
>>
>> for example if i have a table with 3 partitions year=2018,2019 and 2020,
>> and hdfs files only for 2019 and 2020, when running drop
>> partition year<=2018, yes then i see only  partitions of 2019 and 2020 but
>> if i run recover partitions i will see 2018, and i assume this related to
>> the cache.
>>
>> On Mon, May 25, 2020 at 5:55 PM Quanlong Huang <huangquanl...@gmail.com>
>> wrote:
>>
>>> Hi Fawze,
>>>
>>> 1) are these partitions impacting the hive metastore performance and
>>>> memory used by the metastore?
>>>
>>> Do you drop the partitions after you clean the parquet files? If not,
>>> the partition metadata still exists in hive metastore, which may impact its
>>> performance when the partition number piles up.
>>>
>>> 2) Is there a way to drop these partitions without running ALTER TABLE
>>>> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make
>>>> this change but this is a heavy solution.
>>>>
>>> I think explicitly dropping these partitions is the simplest way. If
>>> your requirement is doing this by one statement, you can use expressions in
>>> the partition clause to specify several partitions at once, e.g. ALTER
>>> TABLE xxx DROP IF EXISTS PARTITION (year = 2018 AND month = 4 AND day < 15).
>>>
>>> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created
>>>> partitions?
>>>>
>>> Yes, it compares partitions with cached meta and all partition
>>> directories on HDFS to get the diff. Then create thoes partitions in Hive
>>> and load their meta.
>>>
>>> Quanlong
>>>
>>> On Fri, May 22, 2020 at 5:09 PM Fawze Abujaber <fawz...@gmail.com>
>>> wrote:
>>>
>>>> Hello community,
>>>>
>>>> I have a spark job that writes parquet files partitioned by year, month
>>>> and day, in top of these parquet files i'm creating different impala
>>>> external tables, i have a retention job that cleans the parquet files in a
>>>> daily basis, in the impala side i'm running a daily ALTER TABLE xxxxxx
>>>> recover partitions and hourly REFRESH TABLE xxxx, when i'm running show
>>>> partitions i see old partitions and they shows up as Zero size and number
>>>> of files.
>>>>
>>>> 1) are these partitions impacting the hive metastore performance and
>>>> memory used by the metastore?
>>>>
>>>> 2) Is there a way to drop these partitions without running ALTER TABLE
>>>> xxx DROP IF EXISTS PARTITION, i believe also DROP and CREATE table can make
>>>> this change but this is a heavy solution.
>>>>
>>>> 3) Is ALTER TABLE xxxx recover partitions looking on the newly created
>>>> partitions?
>>>>
>>>> --
>>>> Take Care
>>>> Fawze Abujaber
>>>>
>>>
>>
>> --
>> Take Care
>> Fawze Abujaber
>>
>

-- 
Take Care
Fawze Abujaber

Reply via email to