Hi,

No way we can merge the files in Drill if creates lots of small files ?
AFAIK , partitioning improves the performance as in my case partitioning is
based on year,month,day.hour and querying the data also keeping
partitioning column values in where clause .
It should just go and read those files and eventually improves the query
performance.
In this use case shouldnt matter whether it creates small files or big
files until we query on non partition column.

Can somebody put light on my understanding on Apache Drill ?


Thanks,
Divya



On 17 August 2017 at 22:55, Andries Engelbrecht <[email protected]>
wrote:

> Do you partition the table?
> You may want to sort (order by) on the columns you partition, or just
> order by in any case on the column(s) you are most likely going to use for
> predicates. It increases the CTAS time, but normally will improve the query
> performance quite a bit.
>
> Yes a large number of files does affect the query performance, using
> metadata caching helps improve the query planning time a lot.
>
> --Andries
>
>
> On 8/16/17, 11:12 PM, "Divya Gehlot" <[email protected]> wrote:
>
>     Hi,
>     I have CTAS with partition on 4 columns and when I save it it creates
> lots
>     of small files ~ 102290 where size of each file is in KBs .
>
>     My queries are :
>     1.Does the lots of small files reduce the performance while reading the
>     data in Drill ?
>     2.If yes ,How can I merge the small parquet files ?
>
>
>
>     Thanks,
>     Divya
>
>
>

Reply via email to