Re: Parquet Metadata table on Rolling window
No, it does not. We rebuild metadata for all directories under mydata when you query /mydata. If new files get added anywhere in the whole hierarchy, metadata gets regenerated for all of them. However, if you query only /mydata/3 and nothing is changed under 3, no metadata is generated. Thanks Padma > On Oct 16, 2017, at 12:33 PM, François Méthotwrote: > > Thanks Padma, > > Would we benefits at all from running metadata on directories that we know > we will never modify? > > We would end up with: > /mydata/3/(Metadata generated...) > /mydata/4/(Metadata generated...) > /mydata/.../(Metadata generated...) > /mydata/109/(Metadata generated...) > /mydata/110/(Current partition : Metadata NOT generated yet...) > > When users query /mydata, would drill take advantage of the metadata > available in each subfolder? > > Francois
Re: Parquet Metadata table on Rolling window
Thanks Padma, Would we benefits at all from running metadata on directories that we know we will never modify? We would end up with: /mydata/3/(Metadata generated...) /mydata/4/(Metadata generated...) /mydata/.../(Metadata generated...) /mydata/109/(Metadata generated...) /mydata/110/(Current partition : Metadata NOT generated yet...) When users query /mydata, would drill take advantage of the metadata available in each subfolder? Francois
Re: Parquet Metadata table on Rolling window
Unfortunately, we do not do incremental metadata updates. If new files are getting added constantly, refresh table metadata will not help. Thanks Padma > On Oct 5, 2017, at 5:36 PM, François Méthotwrote: > > Hi, > > I have been using drill for more than year now, we are running 1.10. > > My queries can spend from 5 to 10 minutes for planning because I am dealing > with lots of file in HDFS. (then 5 min to 60 min for execution) > > I maintain a rolling window of data partitionned by the epoch seconds > rounded to the hour. > /mydata/3/ -> Next partition to be deleted (nightly check) > /mydata/4/ > /mydata/.../ > /mydata/109/ > /mydata/110/ -> current hour, this is where new parquet files are added > > I am considering using REFRESH TABLE METADATA. > Is it beneficial at all in a situation where new files are added > constantly, (but only to the latest partition, older partition are set in > stone)? > Will drill detect that new files are added to the latest partition (110) ? > -Will it trigger a refresh metadata on all the directory, on just on > /mydata/110? > > > Thanks for your help > François