Also, what is the cardinality of the partition field? If you have lots of partitions, you will have lots of files...
On Thu, Aug 17, 2017 at 9:55 AM, Andries Engelbrecht <aengelbre...@mapr.com> wrote: > Do you partition the table? > You may want to sort (order by) on the columns you partition, or just > order by in any case on the column(s) you are most likely going to use for > predicates. It increases the CTAS time, but normally will improve the query > performance quite a bit. > > Yes a large number of files does affect the query performance, using > metadata caching helps improve the query planning time a lot. > > --Andries > > > On 8/16/17, 11:12 PM, "Divya Gehlot" <divya.htco...@gmail.com> wrote: > > Hi, > I have CTAS with partition on 4 columns and when I save it it creates > lots > of small files ~ 102290 where size of each file is in KBs . > > My queries are : > 1.Does the lots of small files reduce the performance while reading the > data in Drill ? > 2.If yes ,How can I merge the small parquet files ? > > > > Thanks, > Divya > > >