Re: Parquet Metadata table on Rolling window
No, it does not. We rebuild metadata for all directories under mydata when you query /mydata. If new files get added anywhere in the whole hierarchy, metadata gets regenerated for all of them. However, if you query only /mydata/3 and nothing is changed under 3, no metadata is generated. Thanks Padma > On Oct 16, 2017, at 12:33 PM, François Méthotwrote: > > Thanks Padma, > > Would we benefits at all from running metadata on directories that we know > we will never modify? > > We would end up with: > /mydata/3/(Metadata generated...) > /mydata/4/(Metadata generated...) > /mydata/.../(Metadata generated...) > /mydata/109/(Metadata generated...) > /mydata/110/(Current partition : Metadata NOT generated yet...) > > When users query /mydata, would drill take advantage of the metadata > available in each subfolder? > > Francois
Re: Parquet Metadata table on Rolling window
Thanks Padma, Would we benefits at all from running metadata on directories that we know we will never modify? We would end up with: /mydata/3/(Metadata generated...) /mydata/4/(Metadata generated...) /mydata/.../(Metadata generated...) /mydata/109/(Metadata generated...) /mydata/110/(Current partition : Metadata NOT generated yet...) When users query /mydata, would drill take advantage of the metadata available in each subfolder? Francois
Re: Parquet Metadata table on Rolling window
Unfortunately, we do not do incremental metadata updates. If new files are getting added constantly, refresh table metadata will not help. Thanks Padma > On Oct 5, 2017, at 5:36 PM, François Méthotwrote: > > Hi, > > I have been using drill for more than year now, we are running 1.10. > > My queries can spend from 5 to 10 minutes for planning because I am dealing > with lots of file in HDFS. (then 5 min to 60 min for execution) > > I maintain a rolling window of data partitionned by the epoch seconds > rounded to the hour. > /mydata/3/ -> Next partition to be deleted (nightly check) > /mydata/4/ > /mydata/.../ > /mydata/109/ > /mydata/110/ -> current hour, this is where new parquet files are added > > I am considering using REFRESH TABLE METADATA. > Is it beneficial at all in a situation where new files are added > constantly, (but only to the latest partition, older partition are set in > stone)? > Will drill detect that new files are added to the latest partition (110) ? > -Will it trigger a refresh metadata on all the directory, on just on > /mydata/110? > > > Thanks for your help > François
Parquet Metadata table on Rolling window
Hi, I have been using drill for more than year now, we are running 1.10. My queries can spend from 5 to 10 minutes for planning because I am dealing with lots of file in HDFS. (then 5 min to 60 min for execution) I maintain a rolling window of data partitionned by the epoch seconds rounded to the hour. /mydata/3/ -> Next partition to be deleted (nightly check) /mydata/4/ /mydata/.../ /mydata/109/ /mydata/110/ -> current hour, this is where new parquet files are added I am considering using REFRESH TABLE METADATA. Is it beneficial at all in a situation where new files are added constantly, (but only to the latest partition, older partition are set in stone)? Will drill detect that new files are added to the latest partition (110) ? -Will it trigger a refresh metadata on all the directory, on just on /mydata/110? Thanks for your help François