Hey Fawze,

RECOVER PARTITIONS is cheaper to execute, but it works only once for
each new partition. If you keep adding files to existing partitions,
per-partition REFRESH is the best bet.

HTH

On Wed, 6 Feb 2019 at 09:27, Fawze Abujaber <fawz...@gmail.com> wrote:
>
> Hi Community,
>
> I'm all the time working to enhance our impala usage and resource 
> consumption, and here i would like to think which to use between alter table 
> recover partitions and refresh statement, in terms of running time and 
> resources, specially that refresh can be run on specific partitions, i have 
> spark job that adding files at the HDFS partitioned by year,month and day.
>
> To automatically detect new partition directories added through Hive or HDFS 
> operations:
>
> In CDH 5.5 / Impala 2.3 and higher, the RECOVER PARTITIONS clause scans a 
> partitioned table to detect if any new partition directories were added 
> outside of Impala, such as by Hive ALTER TABLE statements or by hdfs dfs or 
> hadoop fs commands. The RECOVER PARTITIONS clause automatically recognizes 
> any data files present in these new directories, the same as the REFRESH 
> statement does.
>
>
> --
> Take Care
> Fawze Abujaber

Reply via email to