Do you partition the table? You may want to sort (order by) on the columns you partition, or just order by in any case on the column(s) you are most likely going to use for predicates. It increases the CTAS time, but normally will improve the query performance quite a bit.
Yes a large number of files does affect the query performance, using metadata caching helps improve the query planning time a lot. --Andries On 8/16/17, 11:12 PM, "Divya Gehlot" <[email protected]> wrote: Hi, I have CTAS with partition on 4 columns and when I save it it creates lots of small files ~ 102290 where size of each file is in KBs . My queries are : 1.Does the lots of small files reduce the performance while reading the data in Drill ? 2.If yes ,How can I merge the small parquet files ? Thanks, Divya
