Did you have a different option in mind that might suit your needs better? These are your options for discovering metadata changes external to Impala: refresh <table> refresh <table> PARTITION (partition_spec) invalidate metadata <table> recover partitions <table> invalidate metadata (don't do this)
Those commands all do different things, so it really depends on your goals. If you want new files/partitions to be incrementally discovered by Impala, then use refresh. On Mon, Mar 19, 2018 at 12:49 PM, Fawze Abujaber <[email protected]> wrote: > Thanks Tim and Juan, > > So no options other than running the refresh statement each hour or to let > the spark job run it after writing the parquet files. > > On Mon, Mar 19, 2018 at 9:34 PM, Tim Armstrong <[email protected]> > wrote: > >> Don't use the -r option to impala-shell! That option was a mistake and >> it's removed in impala 3.0. The problem is that it does a global invalidate >> which is expensive because it requires reloading all metadata. >> >> On 19 Mar. 2018 10:35, "Juan" <[email protected]> wrote: >> >>> If the table is partitioned by year, month, day, but not hour, running >>> recover partitions is not a good idea. >>> Recover partitions only load metadata when it discovers a new partition, >>> for existing partitions, even if there is new data, recover partitions will >>> ignore them. so the table metadata could be out-of-date and queries will >>> return wrong result. >>> >>> If the spark job is not running very frequently, you can run refresh >>> table to refresh a specific partition after job completion. or running it >>> once per hour. >>> >>> REFRESH [db_name.]table_name [PARTITION (key_col1=val1 [, >>> key_col2=val2...])] >>> >>> >>> On Sat, Mar 17, 2018 at 1:10 AM, Fawze Abujaber <[email protected]> >>> wrote: >>> >>>> Hello Guys, >>>> >>>> I have a parquet files that a Spark job generates, i'm defining an >>>> external table on these parquet files which portioned by year.month and >>>> day, The Spark job feeds these tables each hour. >>>> >>>> I have a cron job that running each one hour and run the command: >>>> >>>> alter table $(table_name) recover partitions >>>> >>>> I'm looking for other solutions if there is by impala, like >>>> configuration, for example i'm thinking if i need to educate the end users >>>> to use -r option to refresh the table. >>>> >>>> >>>> Is there any other solutions for recover partitions? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >
