Re: Parquet Metadata table on Rolling window

2017-10-16 Thread Padma Penumarthy
No, it does not. We rebuild metadata for all directories under mydata when you
query /mydata. 
If new files get added anywhere in the whole hierarchy, metadata gets 
regenerated
for all of them.

However, if you query only  /mydata/3 and nothing is changed under 3,
no metadata is generated.

Thanks
Padma

> On Oct 16, 2017, at 12:33 PM, François Méthot  wrote:
> 
> Thanks Padma,
> 
> Would we benefits at all from running metadata on directories that we know
> we will never modify?
> 
> We would end up with:
> /mydata/3/(Metadata generated...)
> /mydata/4/(Metadata generated...)
> /mydata/.../(Metadata generated...)
> /mydata/109/(Metadata generated...)
> /mydata/110/(Current partition : Metadata NOT generated yet...)
> 
> When users query /mydata, would drill take advantage of the metadata
> available in each subfolder?
> 
> Francois



Re: Parquet Metadata table on Rolling window

2017-10-16 Thread François Méthot
Thanks Padma,

Would we benefits at all from running metadata on directories that we know
we will never modify?

We would end up with:
/mydata/3/(Metadata generated...)
/mydata/4/(Metadata generated...)
/mydata/.../(Metadata generated...)
/mydata/109/(Metadata generated...)
/mydata/110/(Current partition : Metadata NOT generated yet...)

When users query /mydata, would drill take advantage of the metadata
available in each subfolder?

Francois


Re: Parquet Metadata table on Rolling window

2017-10-05 Thread Padma Penumarthy
Unfortunately, we do not do incremental metadata updates. 
If new files are getting added constantly, refresh table metadata will not help.

Thanks
Padma


> On Oct 5, 2017, at 5:36 PM, François Méthot  wrote:
> 
> Hi,
> 
>  I have been using drill for more than year now, we are running 1.10.
> 
> My queries can spend from 5 to 10 minutes for planning because I am dealing
> with lots of file in HDFS. (then 5 min to 60 min for execution)
> 
> I maintain a rolling window of data  partitionned by the epoch seconds
> rounded to the hour.
> /mydata/3/   -> Next partition to be deleted (nightly check)
> /mydata/4/
> /mydata/.../
> /mydata/109/
> /mydata/110/ -> current hour, this is where new parquet files are added
> 
> I am  considering using REFRESH TABLE METADATA.
> Is it beneficial at all in a situation where new files are added
> constantly, (but only to the latest partition, older partition are set in
> stone)?
> Will drill detect that new files are added to the latest partition (110) ?
> -Will it trigger a refresh metadata on all the directory, on just on
> /mydata/110?
> 
> 
> Thanks for your help
> François



Parquet Metadata table on Rolling window

2017-10-05 Thread François Méthot
Hi,

  I have been using drill for more than year now, we are running 1.10.

My queries can spend from 5 to 10 minutes for planning because I am dealing
with lots of file in HDFS. (then 5 min to 60 min for execution)

I maintain a rolling window of data  partitionned by the epoch seconds
rounded to the hour.
/mydata/3/   -> Next partition to be deleted (nightly check)
/mydata/4/
/mydata/.../
/mydata/109/
/mydata/110/ -> current hour, this is where new parquet files are added

I am  considering using REFRESH TABLE METADATA.
Is it beneficial at all in a situation where new files are added
constantly, (but only to the latest partition, older partition are set in
stone)?
Will drill detect that new files are added to the latest partition (110) ?
-Will it trigger a refresh metadata on all the directory, on just on
/mydata/110?


Thanks for your help
François