Zelaine, thanks for the suggestion. I added this option both to the
drill-override and in the session and this time the query did stay running
for much longer but it still eventually failed with the same error,
although much different memory values.
(org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate
buffer of size 134217728 due to memory limit. Current allocation:
10653214316
org.apache.drill.exec.memory.BaseAllocator.buffer():220
org.apache.drill.exec.memory.BaseAllocator.buffer():195
org.apache.drill.exec.vector.VarCharVector.reAlloc():425
org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379
org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.doCopy():22
org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.next():76
org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill():1408
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill():1376
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.spillFromMemory():1339
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.processBatch():831
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():618
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():660
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():137
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext():144
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
At first I didn't change planner.width.max_per_query and the default on a
32 core machine makes it 23. This query failed after 34 minutes. I then
tried setting planner.width.max_per_query=1 and this query also failed but
of course took took longer, about 2 hours. In both cases,
planner.memory.max_query_memory_per_node was set to 230G.
On Mon, May 1, 2017 at 11:09 AM, Zelaine Fong <[email protected]> wrote:
> Nate,
>
> The Jira you’ve referenced relates to the new external sort, which is not
> enabled by default, as it is still going through some additional testing.
> If you’d like to try it to see if it resolves your problem, you’ll need to
> set “sort.external.disable_managed” as follows in your
> drill-override.conf file:
>
> drill.exec: {
> cluster-id: "drillbits1",
> zk.connect: "localhost:2181",
> sort.external.disable_managed: false
> }
>
> and run the following query:
>
> ALTER SESSION SET `exec.sort.disable_managed` = false;
>
> -- Zelaine
>
> On 5/1/17, 7:44 AM, "Nate Butler" <[email protected]> wrote:
>
> We keep running into this issue when trying to issue a query with
> hashagg
> disabled. When I look at system memory usage though, drill doesn't
> seem to
> be using much of it but still hits this error.
>
> Our environment:
>
> - 1 r3.8xl
> - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of
> Direct
> - Data stored on S3 is compressed CSV
>
> I've tried increasing planner.memory.max_query_memory_per_node to
> 230G and
> lowered planner.width.max_per_query to 1 and it still fails.
>
> We've applied the patch from this bug in the hopes that it would
> resolve
> the issue but it hasn't:
>
> https://issues.apache.org/jira/browse/DRILL-5226
>
> Stack Trace:
>
> (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
> allocate
> buffer of size 16777216 due to memory limit. Current allocation:
> 8445952
> org.apache.drill.exec.memory.BaseAllocator.buffer():220
> org.apache.drill.exec.memory.BaseAllocator.buffer():195
> org.apache.drill.exec.vector.VarCharVector.reAlloc():425
> org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
> org.apache.drill.exec.vector.NullableVarCharVector.
> copyFromSafe():379
>
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> doCopy():22
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
> next():75
>
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.
> mergeAndSpill():602
>
> org.apache.drill.exec.physical.impl.xsort.
> ExternalSortBatch.innerNext():428
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
> org.apache.drill.exec.physical.impl.aggregate.
> StreamingAggBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
> org.apache.drill.exec.physical.impl.partitionsender.
> PartitionSenderRootExec.innerNext():144
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
>
> Is there something I'm missing here? Any help/direction would be
> appreciated.
>
> Thanks,
> Nate
>
>
>