Also, what is the cardinality of the partition field? If you have lots of
partitions, you will have lots of files...


On Thu, Aug 17, 2017 at 9:55 AM, Andries Engelbrecht <aengelbre...@mapr.com>
wrote:

> Do you partition the table?
> You may want to sort (order by) on the columns you partition, or just
> order by in any case on the column(s) you are most likely going to use for
> predicates. It increases the CTAS time, but normally will improve the query
> performance quite a bit.
>
> Yes a large number of files does affect the query performance, using
> metadata caching helps improve the query planning time a lot.
>
> --Andries
>
>
> On 8/16/17, 11:12 PM, "Divya Gehlot" <divya.htco...@gmail.com> wrote:
>
>     Hi,
>     I have CTAS with partition on 4 columns and when I save it it creates
> lots
>     of small files ~ 102290 where size of each file is in KBs .
>
>     My queries are :
>     1.Does the lots of small files reduce the performance while reading the
>     data in Drill ?
>     2.If yes ,How can I merge the small parquet files ?
>
>
>
>     Thanks,
>     Divya
>
>
>

Reply via email to