[ https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers updated DRILL-5211: ------------------------------- Comment: was deleted (was: Attached are two proposals. The first provides background information about the the issue, including the solutions considered. The second is a detailed proposal for enforcing vector size limits in the lowest levels of the code: the vectors themselves and the "mutator" that writes data to the vectors. To follow are higher-level proposals for creating a new version of the scan batch operator, and related mechanisms, to allow us to retrofit readers with the size-aware "mutator.") > Queries fail due to direct memory fragmentation > ----------------------------------------------- > > Key: DRILL-5211 > URL: https://issues.apache.org/jira/browse/DRILL-5211 > Project: Apache Drill > Issue Type: Bug > Reporter: Paul Rogers > Assignee: Paul Rogers > Fix For: 1.9.0 > > Attachments: ApacheDrillMemoryFragmentationBackground.pdf, > ApacheDrillVectorSizeLimits.pdf > > > Consider a test of the external sort as follows: > * Direct memory: 3GB > * Input file: 18 GB, with one Varchar column of 8K width > The sort runs, spilling to disk. Once all data arrives, the sort beings to > merge the results. But, to do that, it must first do an intermediate merge. > For example, in this sort, there are 190 spill files, but only 19 can be > merged at a time. (Each merge file contains 128 MB batches, and only 19 can > fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit. > Yet, when loading batch xx, Drill fails with an OOM error. At that point, > total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} > in the {{Bits}} class in the JDK.) > It appears that Drill wants to allocate 58,257,868 bytes, but the > {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an > OOM. > The problem is that, at this point, the external sort should not ask the > system for more memory. The allocator for the external sort is at just > 1,192,350,366 before the allocation request. Plenty of spare memory should be > available, released when the in-memory batches were spilled to disk prior to > merging. Indeed, earlier in the run, the sort had reached a peak memory usage > of 2,710,716,416 bytes. This memory should be available for reuse during > merging, and is plenty sufficient to fill the particular request in question. -- This message was sent by Atlassian JIRA (v6.3.15#6346)