If the table is partitioned by year, month, day, but not hour, running recover partitions is not a good idea. Recover partitions only load metadata when it discovers a new partition, for existing partitions, even if there is new data, recover partitions will ignore them. so the table metadata could be out-of-date and queries will return wrong result.
If the spark job is not running very frequently, you can run refresh table to refresh a specific partition after job completion. or running it once per hour. REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, key_col2=val2...])] On Sat, Mar 17, 2018 at 1:10 AM, Fawze Abujaber <[email protected]> wrote: > Hello Guys, > > I have a parquet files that a Spark job generates, i'm defining an > external table on these parquet files which portioned by year.month and > day, The Spark job feeds these tables each hour. > > I have a cron job that running each one hour and run the command: > > alter table $(table_name) recover partitions > > I'm looking for other solutions if there is by impala, like configuration, > for example i'm thinking if i need to educate the end users to use -r > option to refresh the table. > > > Is there any other solutions for recover partitions? > > > > > > >
