Don't use the -r option to impala-shell! That option was a mistake and it's removed in impala 3.0. The problem is that it does a global invalidate which is expensive because it requires reloading all metadata.
On 19 Mar. 2018 10:35, "Juan" <any...@gmail.com> wrote: > If the table is partitioned by year, month, day, but not hour, running > recover partitions is not a good idea. > Recover partitions only load metadata when it discovers a new partition, > for existing partitions, even if there is new data, recover partitions will > ignore them. so the table metadata could be out-of-date and queries will > return wrong result. > > If the spark job is not running very frequently, you can run refresh table > to refresh a specific partition after job completion. or running it once > per hour. > > REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])] > > > On Sat, Mar 17, 2018 at 1:10 AM, Fawze Abujaber <fawz...@gmail.com> wrote: > >> Hello Guys, >> >> I have a parquet files that a Spark job generates, i'm defining an >> external table on these parquet files which portioned by year.month and >> day, The Spark job feeds these tables each hour. >> >> I have a cron job that running each one hour and run the command: >> >> alter table $(table_name) recover partitions >> >> I'm looking for other solutions if there is by impala, like >> configuration, for example i'm thinking if i need to educate the end users >> to use -r option to refresh the table. >> >> >> Is there any other solutions for recover partitions? >> >> >> >> >> >> >> >