Hi Nate,

I’ll give you three separate suggestions. The first two build on the discussion 
with Zelaine. The third gets at a separate problem that could be the root cause.

First, let’s discuss logging. When we hit a bug such as this, the logs are 
incredibly useful to learn what is going on. Turn on debug logging. If you are 
familiar with Java logging, then you only need to enable the debug level for 
the org.apache.drill.exec.physical.impl.xsort.managed package. Then, look for 
lines that say “ExternalSortBatch”.

You will see a number of entries early on that identify the amount of memory 
available to the sort, the size of the incoming batches, and how we will slice 
up memory. Please post those lines to your JIRA entry.

Then, later, you’ll see an entry for the OOM error. Review the preceding 
entries to get a sense of where the sort was: was it still reading and spilling 
data from upstream (the sort phase)? Or, had it gotten to the merge phase in 
which we reread spilled data.

The log entries, while cryptic on first glance, make a bit more sense after you 
scan through the full set. Post those lines with summary info.

Also, the query profile will tell you how much memory was actually used at the 
time of the OOM. You can compare that with the “budget” explained in the log 
file entry mentioned above.

Second, we can better define how Drill works with sort memory to help you 
properly configure your setup.

Here is some background.

* Your system has some amount of memory. In your case, 230 GB.
* To allocate memory to the sort, Drill does not use the actual memory. 
Instead, we use planner.memory.max_query_memory_per_node. (The idea is that you 
set this value as, roughly, system memory / number of concurrent queries.)
* Drill divides up memory to compute per-sort memory as: query memory per node 
/ no. of slices / no. of sorts in the query.
* In your system, the number of slices is 23, so each fragment gets 10 GB of 
memory.
* If your query has a single sort, then each sort gets 10 GB of memory.
* However, memory per query is capped by the boot-time drill.memory.top.max 
option. (See below) which defaults go 20 GB. Not an issue here, but is an issue 
if the numbers above come out differently.
* When you changed planner.width.max_per_query, it has no effect on memory.
* You’d ideally change planner.width.max_per_node to 1 to run the query 
single-threaded. But, due to the item above, no sort will get more than 20 GB 
anyway.

For the actual code, see [1].

Despite all this, the likely original 10 GB allocation should be plenty; the 
sort is supposed to spill. How much it spills depends on your input data size. 
When sorting, performance is affected by memory:

* If your data is smaller than sort memory, sorting happens in memory, and 
performance is optimal.
* If your data is larger than memory, but smaller than 8x memory, you’ll get a 
“single generation” spill/merge and performance should be no worse than 3x an 
in-memory sort. (1 x is the original data read, then another 1x for spill and 
the third 1x for read/merge.)
* If your data is larger than 8x memory, sorting will need multiple generations 
of spill/merge/re-spill, and run-time will increase accordingly.

Some options:

* Set planner.width.max_per_node to 1 to run the query single-threaded. This 
will use all memory for the single sort.
* But, we’ve got that pesky 20 GB global cap. So, change your 
drill-override.conf file as follows:

drill.memory.top.max: 100000000000;

(Sorry for all the zeros. It is supposed to be 100 GB. We really should switch 
to a better format to specify memory…) 100 GB seems plenty without going larger.

You can verify that these changes take effect by looking for the log line that 
explains the managed sort’s memory calculations (when debug logging is enabled.)

Third, all that said, I wonder if the problem is elsewhere. Yes, you are 
getting an Out of Memory (OOM) error. But, not in the usual place that 
indicates a sort issue. Instead, you are getting it in the allocation of a 
“value vector.” This raises some questions:

* How big is your input data (size on disk)?
* How many columns?
* How wide are your VarChar columns, on average?

You mentioned data is compressed CSV. With typical 8x compression, actual data 
sorted will be ~8x your on-disk size.

The column width question is critical. I see that the vector is trying to 
allocate 16 MB of data, which suggests that your column widths are 250 or 
larger. If so, we are probably looking at a different error that happens to be 
showing up while sorting.

Once we see the details of your data size, we can determine if we should focus 
more closely in that area.

Thanks,

- Paul

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/util/MemoryAllocationUtilities.java

> On May 2, 2017, at 10:47 AM, rahul challapalli <[email protected]> 
> wrote:
> 
> This is clearly a bug and like zelaine suggested the new sort is still work
> in progress. We have a few similar bugs open for the new sort. I could have
> pointed to the jira's but unfortunately JIRA is not working for me due to
> firewall issues.
> 
> Another suggestion is build drill from the latest master and try it out, if
> you are willing to spend some time. But again there is no guarantee yet.
> 
> Please go ahead and raise a new jira. If it is a duplicate, I will mark it
> as such later. Thank You.
> 
> - Rahul
> 
> On Tue, May 2, 2017 at 8:24 AM, Nate Butler <[email protected]> wrote:
> 
>> Zelaine, thanks for the suggestion. I added this option both to the
>> drill-override and in the session and this time the query did stay running
>> for much longer but it still eventually failed with the same error,
>> although much different memory values.
>> 
>>  (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
>> allocate
>> buffer of size 134217728 due to memory limit. Current allocation:
>> 10653214316
>>    org.apache.drill.exec.memory.BaseAllocator.buffer():220
>>    org.apache.drill.exec.memory.BaseAllocator.buffer():195
>>    org.apache.drill.exec.vector.VarCharVector.reAlloc():425
>>    org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
>>    org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe():379
>>    org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.
>> doCopy():22
>>    org.apache.drill.exec.test.generated.PriorityQueueCopierGen8.next():76
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> CopierHolder$BatchMerger.next():234
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> doMergeAndSpill():1408
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> mergeAndSpill():1376
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> spillFromMemory():1339
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.
>> processBatch():831
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> ExternalSortBatch.loadBatch():618
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> ExternalSortBatch.load():660
>> 
>> org.apache.drill.exec.physical.impl.xsort.managed.
>> ExternalSortBatch.innerNext():559
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():119
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():109
>> 
>> org.apache.drill.exec.physical.impl.aggregate.
>> StreamingAggBatch.innerNext():137
>>    org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>> 
>> org.apache.drill.exec.physical.impl.partitionsender.
>> PartitionSenderRootExec.innerNext():144
>>    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>>    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>>    java.security.AccessController.doPrivileged():-2
>>    javax.security.auth.Subject.doAs():422
>>    org.apache.hadoop.security.UserGroupInformation.doAs():1657
>>    org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>>    org.apache.drill.common.SelfCleaningRunnable.run():38
>>    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>>    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>>    java.lang.Thread.run():745 (state=,code=0)
>> 
>> At first I didn't change planner.width.max_per_query and the default on a
>> 32 core machine makes it 23. This query failed after 34 minutes. I then
>> tried setting planner.width.max_per_query=1 and this query also failed but
>> of course took took longer, about 2 hours. In both cases,
>> planner.memory.max_query_memory_per_node was set to 230G.
>> 
>> 
>> On Mon, May 1, 2017 at 11:09 AM, Zelaine Fong <[email protected]> wrote:
>> 
>>> Nate,
>>> 
>>> The Jira you’ve referenced relates to the new external sort, which is not
>>> enabled by default, as it is still going through some additional testing.
>>> If you’d like to try it to see if it resolves your problem, you’ll need
>> to
>>> set “sort.external.disable_managed” as follows  in your
>>> drill-override.conf file:
>>> 
>>> drill.exec: {
>>>  cluster-id: "drillbits1",
>>>  zk.connect: "localhost:2181",
>>>  sort.external.disable_managed: false
>>> }
>>> 
>>> and run the following query:
>>> 
>>> ALTER SESSION SET `exec.sort.disable_managed` = false;
>>> 
>>> -- Zelaine
>>> 
>>> On 5/1/17, 7:44 AM, "Nate Butler" <[email protected]> wrote:
>>> 
>>>    We keep running into this issue when trying to issue a query with
>>> hashagg
>>>    disabled. When I look at system memory usage though, drill doesn't
>>> seem to
>>>    be using much of it but still hits this error.
>>> 
>>>    Our environment:
>>> 
>>>    - 1 r3.8xl
>>>    - 1 drillbit version 1.10.0 configured with 4GB of Heap and 230G of
>>> Direct
>>>    - Data stored on S3 is compressed CSV
>>> 
>>>    I've tried increasing planner.memory.max_query_memory_per_node to
>>> 230G and
>>>    lowered planner.width.max_per_query to 1 and it still fails.
>>> 
>>>    We've applied the patch from this bug in the hopes that it would
>>> resolve
>>>    the issue but it hasn't:
>>> 
>>>    https://issues.apache.org/jira/browse/DRILL-5226
>>> 
>>>    Stack Trace:
>>> 
>>>      (org.apache.drill.exec.exception.OutOfMemoryException) Unable to
>>> allocate
>>>    buffer of size 16777216 due to memory limit. Current allocation:
>>> 8445952
>>>        org.apache.drill.exec.memory.BaseAllocator.buffer():220
>>>        org.apache.drill.exec.memory.BaseAllocator.buffer():195
>>>        org.apache.drill.exec.vector.VarCharVector.reAlloc():425
>>>        org.apache.drill.exec.vector.VarCharVector.copyFromSafe():278
>>>        org.apache.drill.exec.vector.NullableVarCharVector.
>>> copyFromSafe():379
>>> 
>>>    org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
>>> doCopy():22
>>>        org.apache.drill.exec.test.generated.PriorityQueueCopierGen328.
>>> next():75
>>> 
>>>    org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.
>>> mergeAndSpill():602
>>> 
>>>    org.apache.drill.exec.physical.impl.xsort.
>>> ExternalSortBatch.innerNext():428
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():119
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():109
>>> 
>>>    org.apache.drill.exec.physical.impl.aggregate.
>>> StreamingAggBatch.innerNext():137
>>>        org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>>        org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>>> 
>>>    org.apache.drill.exec.physical.impl.partitionsender.
>>> PartitionSenderRootExec.innerNext():144
>>>        org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>>        org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>>>        org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>>>        java.security.AccessController.doPrivileged():-2
>>>        javax.security.auth.Subject.doAs():422
>>>        org.apache.hadoop.security.UserGroupInformation.doAs():1657
>>>        org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>>>        org.apache.drill.common.SelfCleaningRunnable.run():38
>>>        java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>>>        java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>>>        java.lang.Thread.run():745 (state=,code=0)
>>> 
>>>    Is there something I'm missing here? Any help/direction would be
>>>    appreciated.
>>> 
>>>    Thanks,
>>>    Nate
>>> 
>>> 
>>> 
>> 

Reply via email to