Apply a sort in your CTAS, this will force the data down to a single stream before writing.
Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Thu, Jun 23, 2016 at 10:23 AM, John Omernik <[email protected]> wrote: > When have a small query writing smaller data (like aggregate tables for > faster aggregates for Dashboards etc). It appears to write a ton of small > files. Not sure why, maybe its just how the join worked out etc. I have a > "day" that is 1.5M in total size, but 400 files total. This seems > excessive. > > While I don't have the "small files" issues because I run MapR-FS, having > 400 files that make 1.5 mb of total date kills me on the planning phase. > How can I get Drill, when doing a CTAS to go through a round of > consolidation on the parquet files? > > Thanks > > John >
