Do you partition the table?
You may want to sort (order by) on the columns you partition, or just order by 
in any case on the column(s) you are most likely going to use for predicates. 
It increases the CTAS time, but normally will improve the query performance 
quite a bit.

Yes a large number of files does affect the query performance, using metadata 
caching helps improve the query planning time a lot.

--Andries


On 8/16/17, 11:12 PM, "Divya Gehlot" <[email protected]> wrote:

    Hi,
    I have CTAS with partition on 4 columns and when I save it it creates lots
    of small files ~ 102290 where size of each file is in KBs .
    
    My queries are :
    1.Does the lots of small files reduce the performance while reading the
    data in Drill ?
    2.If yes ,How can I merge the small parquet files ?
    
    
    
    Thanks,
    Divya
    

Reply via email to