Re: Parquet Metadata table on Rolling window

2017-10-16 Thread Padma Penumarthy
No, it does not. We rebuild metadata for all directories under mydata when you
query /mydata. 
If new files get added anywhere in the whole hierarchy, metadata gets 
regenerated
for all of them.

However, if you query only  /mydata/3 and nothing is changed under 3,
no metadata is generated.

Thanks
Padma

> On Oct 16, 2017, at 12:33 PM, François Méthot  wrote:
> 
> Thanks Padma,
> 
> Would we benefits at all from running metadata on directories that we know
> we will never modify?
> 
> We would end up with:
> /mydata/3/(Metadata generated...)
> /mydata/4/(Metadata generated...)
> /mydata/.../(Metadata generated...)
> /mydata/109/(Metadata generated...)
> /mydata/110/(Current partition : Metadata NOT generated yet...)
> 
> When users query /mydata, would drill take advantage of the metadata
> available in each subfolder?
> 
> Francois



Re: Parquet Metadata table on Rolling window

2017-10-16 Thread François Méthot
Thanks Padma,

Would we benefits at all from running metadata on directories that we know
we will never modify?

We would end up with:
/mydata/3/(Metadata generated...)
/mydata/4/(Metadata generated...)
/mydata/.../(Metadata generated...)
/mydata/109/(Metadata generated...)
/mydata/110/(Current partition : Metadata NOT generated yet...)

When users query /mydata, would drill take advantage of the metadata
available in each subfolder?

Francois


Re: Parquet Metadata table on Rolling window

2017-10-05 Thread Padma Penumarthy
Unfortunately, we do not do incremental metadata updates. 
If new files are getting added constantly, refresh table metadata will not help.

Thanks
Padma


> On Oct 5, 2017, at 5:36 PM, François Méthot  wrote:
> 
> Hi,
> 
>  I have been using drill for more than year now, we are running 1.10.
> 
> My queries can spend from 5 to 10 minutes for planning because I am dealing
> with lots of file in HDFS. (then 5 min to 60 min for execution)
> 
> I maintain a rolling window of data  partitionned by the epoch seconds
> rounded to the hour.
> /mydata/3/   -> Next partition to be deleted (nightly check)
> /mydata/4/
> /mydata/.../
> /mydata/109/
> /mydata/110/ -> current hour, this is where new parquet files are added
> 
> I am  considering using REFRESH TABLE METADATA.
> Is it beneficial at all in a situation where new files are added
> constantly, (but only to the latest partition, older partition are set in
> stone)?
> Will drill detect that new files are added to the latest partition (110) ?
> -Will it trigger a refresh metadata on all the directory, on just on
> /mydata/110?
> 
> 
> Thanks for your help
> François