If the table is partitioned by year, month, day, but not hour, running
recover partitions is not a good idea.
Recover partitions only load metadata when it discovers a new partition,
for existing partitions, even if there is new data, recover partitions will
ignore them. so the table metadata could be out-of-date and queries will
return wrong result.

If the spark job is not running very frequently, you can run refresh table
to refresh a specific partition after job completion. or running it once
per hour.

REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])]


On Sat, Mar 17, 2018 at 1:10 AM, Fawze Abujaber <[email protected]> wrote:

> Hello Guys,
>
> I have a parquet files that a Spark job generates, i'm defining an
> external table on these parquet files which portioned by year.month and
> day, The Spark job feeds these tables each hour.
>
> I have a cron job that running  each one hour and run the command:
>
>  alter table $(table_name) recover partitions
>
> I'm looking for other solutions if there is by impala, like configuration,
> for example i'm thinking if i need to educate the end users to use -r
> option to refresh the table.
>
>
> Is there any other solutions for recover partitions?
>
>
>
>
>
>
>

Reply via email to