[ https://issues.apache.org/jira/browse/DRILL-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers reopened DRILL-5025: -------------------------------- Reopening - issue applies just to "first generation" spill files. > ExternalSortBatch provides weak control over spill file size > ------------------------------------------------------------ > > Key: DRILL-5025 > URL: https://issues.apache.org/jira/browse/DRILL-5025 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > The ExternalSortBatch (ESB) operator sorts records while spilling to disk to > control memory use. The size of the spill file is not easy to control. It is > a function of the accumulated batches size (half of the accumulated total), > which is determined by either the memory budget or the > {{drill.exec.sort.external.group.size}} parameter. (But, even with the > parameter, the actual file size is still half the accumulated batches.) > The proposed solution is to provide an explicit parameter that sets the > maximum spill file size: {{drill.exec.sort.external.spill.size}}. If the ESB > needs to spill more than this amount of data, ESB should split the spill into > multiple files. > The spill.size should be in bytes (or MB). (A size in records makes the file > size data-dependent, which would not be helpful.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)