[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-13 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164240#comment-16164240
 ] 

Robert Hou edited comment on DRILL-5670 at 9/13/17 7:21 AM:


I tried with disabling exchanges, and got a different error.  It looks like 
sort did not complete in this case.
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false
alter session set `planner.memory.max_query_memory_per_node` = 482344960
alter session set `planner.width.max_per_node` = 1
alter session set `planner.width.max_per_query` = 1
alter session set `planner.disable_exchanges` = true
select count(*) from (select * from 
dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
 
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520],
 columns[1410], 
columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
 
columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
 columns[3210] ) d where d.col433 = 'sjka skjf'
{noformat}

This is the error from drillbit.log:
2017-09-12 17:36:53,155 [26478262-f0a7-8fc1-1887-4f27071b9c0f:frag:0:0] ERROR 
o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
Incoming batch size: 409305088, available memory: 482344960

Here is the plan:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03  Project($f0=[0])
00-04SelectionVectorRemover
00-05  Filter(condition=[=(ITEM($0, 'col433'), 'sjka skjf')])
00-06Project(T8¦¦*=[$0])
00-07  SelectionVectorRemover
00-08Sort(sort0=[$1], sort1=[$2], sort2=[$3], sort3=[$4], 
sort4=[$5], sort5=[$6], sort6=[$7], sort7=[$8], sort8=[$9], sort9=[$10], 
sort10=[$11], sort11=[$12], sort12=[$9], sort13=[$13], sort14=[$14], 
sort15=[$15], sort16=[$16], sort17=[$17], sort18=[$18], sort19=[$19], 
sort20=[$20], sort21=[$21], sort22=[$12], sort23=[$22], sort24=[$23], 
sort25=[$24], sort26=[$25], sort27=[$26], sort28=[$27], sort29=[$28], 
sort30=[$29], sort31=[$30], sort32=[$31], sort33=[$32], sort34=[$33], 
sort35=[$34], sort36=[$35], sort37=[$36], sort38=[$37], sort39=[$38], 
sort40=[$39], sort41=[$40], sort42=[$41], sort43=[$42], sort44=[$43], 
sort45=[$44], sort46=[$45], sort47=[$46], dir0=[ASC], dir1=[ASC], dir2=[ASC], 
dir3=[ASC], dir4=[ASC], dir5=[ASC], dir6=[ASC], dir7=[ASC], dir8=[ASC], 
dir9=[ASC], dir10=[ASC], dir11=[ASC], dir12=[ASC], dir13=[ASC], dir14=[ASC], 
dir15=[ASC], dir16=[ASC], dir17=[ASC], dir18=[ASC], dir19=[ASC], dir20=[ASC], 
dir21=[ASC], dir22=[ASC], dir23=[ASC], dir24=[ASC], dir25=[ASC], dir26=[ASC], 
dir27=[ASC], dir28=[ASC], dir29=[ASC], dir30=[ASC], dir31=[ASC], dir32=[ASC], 
dir33=[ASC], dir34=[ASC], dir35=[ASC], dir36=[ASC], dir37=[ASC], dir38=[ASC], 
dir39=[ASC], dir40=[ASC], dir41=[ASC], dir42=[ASC], dir43=[ASC], dir44=[ASC], 
dir45=[ASC], dir46=[ASC], dir47=[ASC])
00-09  Project(T8¦¦*=[$0], EXPR$1=[ITEM($1, 450)], 
EXPR$2=[ITEM($1, 330)], EXPR$3=[ITEM($1, 230)], EXPR$4=[ITEM($1, 220)], 
EXPR$5=[ITEM($1, 110)], EXPR$6=[ITEM($1, 90)], EXPR$7=[ITEM($1, 80)], 
EXPR$8=[ITEM($1, 70)], EXPR$9=[ITEM($1, 40)], EXPR$10=[ITEM($1, 10)], 
EXPR$11=[ITEM($1, 20)], EXPR$12=[ITEM($1, 30)], EXPR$13=[ITEM($1, 50)], 
EXPR$14=[ITEM($1, 454)], EXPR$15=[ITEM($1, 413)], EXPR$16=[ITEM($1, 940)], 
EXPR$17=[ITEM($1, 834)], EXPR$18=[ITEM($1, 73)], EXPR$19=[ITEM($1, 140)], 
EXPR$20=[ITEM($1, 104)], EXPR$21=[ITEM($1, )], EXPR$22=[ITEM($1, 2420)], 
EXPR$23=[ITEM($1, 1520)], EXPR$24=[ITEM($1, 1410)], EXPR$25=[ITEM($1, 1110)], 
EXPR$26=[ITEM($1, 1290)], EXPR$27=[ITEM($1, 2380)], EXPR$28=[ITEM($1, 705)], 
EXPR$29=[ITEM($1, 45)], EXPR$30=[ITEM($1, 1054)], EXPR$31=[ITEM($1, 2430)], 
EXPR$32=[ITEM($1, 420)], EXPR$33=[ITEM($1, 404)], EXPR$34=[ITEM($1, 3350)], 
EXPR$35=[ITEM($1, )], EXPR$36=[ITEM($1, 153)], EXPR$37=[ITEM($1, 356)], 
EXPR$38=[ITEM($1, 84)], EXPR$39=[ITEM($1, 745)], EXPR$40=[ITEM($1, 1450)], 
EXPR$41=[ITEM($1, 103)], EXPR$42=[ITEM($1, 2065)], EXPR$43=[ITEM($1, 343)], 
EXPR$44=[ITEM($1, 3420)], EXPR$45=[ITEM($1, 530)], EXPR$46=[ITEM($1, 3210)])
00-10Project(T8¦¦*=[$0], columns=[$1])
00-11  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/3500cols.tbl, 
numFiles=1, columns=[`*`], 
files=[maprfs:///drill/testdata/resource-manager/3500cols.tbl]]])
{noformat}


was (Author: rhou):
I tried with disabling exchanges, and got a different error.  It 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-09 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159757#comment-16159757
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 7:42 PM:


The OOM occurs during the merge phase:

{code}
Completed spill: memory = 0
Starting merge phase. Runs = 62, Alloc. memory = 0
Read 100 records in 73169 us; size = 8480768, memory = 8481024
...
Read 100 records in 81261 us; size = 8480768, memory = 525807872
{code}

Here the "Read 100 records" indicates the sort is loading the first batch of 
each of 62 spill files. We see that the first spilled batch was 4,736,187 when 
written (previous comment), but requires 8,481,024 bytes when read. This is 
larger than the estimate of 7,215,150 that the calcs estimated. The average 
load size is 8,480,772 bytes. The 1,265,622 byte delta per batch adds up to a 
78,468,572 byte error over the 62 batches.

Then, when the code tries to allocate the output batch of 348 records, 16739148 
bytes, memory is exhausted. (The limit is 530,579,456 bytes.)

A quick fix is to use the "max buffer size" in computing the number of batches 
that can be merged. (The max buffer size is computed as twice the data size, 
which assumes a 50% internal fragmentation of the memory buffer.)

This turns out to work:

{code}
Starting merge phase. Runs = 46, Alloc. memory = 0
...
Read 100 records in 62918 us; size = 8480768, memory = 408187648
...
End of sort. Total write bytes: 86075714089, Total read bytes: 86075714089
Results: 1 records, 2 batches, 12,786,071 ms
{code}

Left unanswered is *why* Drill consumes 2x memory when reading a spilled batch 
from disk.


was (Author: paul.rogers):
The OOM occurs during the merge phase:

{code}
Completed spill: memory = 0
Starting merge phase. Runs = 62, Alloc. memory = 0
Read 100 records in 73169 us; size = 8480768, memory = 8481024
...
Read 100 records in 81261 us; size = 8480768, memory = 525807872
{code}

Here the "Read 100 records" indicates the sort is loading the first batch of 
each of 62 spill files. We see that the first spilled batch was 4,736,187 when 
written (previous comment), but requires 8,481,024 bytes when read. This is 
larger than the estimate of 7,215,150 that the calcs estimated. The average 
load size is 8,480,772 bytes. The 1,265,622 byte delta per batch adds up to a 
78,468,572 byte error over the 62 batches.

Then, when the code tries to allocate the output batch of 348 records, 16739148 
bytes, memory is exhausted. (The limit is 530,579,456 bytes.)

A quick fix is to use the "max buffer size" in computing the number of batches 
that can be merged. (The max buffer size is computed as twice the data size, 
which assumes a 50% internal fragmentation of the memory buffer.)

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 4:53 AM:


Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.


was (Author: paul.rogers):
Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.

The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read?

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:36 AM:


Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.

The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read?


was (Author: paul.rogers):
Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +---++
> |  ok   |  summary   |
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:18 AM:


On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

Note that actual density is 56%. There seems to be an overflow issue that gives 
a number of only 8%.

The above calcs are identical to that seen on the QA cluster.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map vectorMap) throws 
OutOfMemoryException {
for (final ValueVector v : vectorMap.values()) {
  v.allocateNew();
}
  }
{code}

The above does not say the number of values to allocate. There seems to be some 
attempt to allocate the same size as the previous allocation, but this does not 
explain the offset vector behavior. Also, if the algorithm is, indeed, 
"allocate the last amount", then the algorithm creates a ratchet effect: the 
vector will only grow in size over batches, retaining the largest size yet 
seen. Spikes in data can result in wasted space (low density batches.) This is 
something to investigate elsewhere.

Further, the logic does not seem to work with the offset vector, causing 
reallocations. Something else to investigate.

The run takes about an hour. Good news, I can reproduce the problem on the Mac:

{code}
Starting merge phase. Runs = 62, Alloc. memory = 0
End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}


was (Author: paul.rogers):
On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

Note that actual density is 56%. There seems to be an overflow issue that gives 
a number of only 8%.

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:13 AM:


On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

Note that actual density is 56%. There seems to be an overflow issue that gives 
a number of only 8%.

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map vectorMap) throws 
OutOfMemoryException {
for (final ValueVector v : vectorMap.values()) {
  v.allocateNew();
}
  }
{code}

The above does not say the number of values to allocate. There seems to be some 
attempt to allocate the same size as the previous allocation, but this does not 
explain the offset vector behavior. Also, if the algorithm is, indeed, 
"allocate the last amount", then the algorithm creates a ratchet effect: the 
vector will only grow in size over batches, retaining the largest size yet 
seen. Spikes in data can result in wasted space (low density batches.) This is 
something to investigate elsewhere.

Further, the logic does not seem to work with the offset vector, causing 
reallocations. Something else to investigate.

The run takes about an hour. Good news, I can reproduce the problem on the Mac:

{code}
Starting merge phase. Runs = 62, Alloc. memory = 0
End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}


was (Author: paul.rogers):
On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157917#comment-16157917
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/8/17 11:26 PM:
-

Analysis:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216,
mSort batch size = 65535
Memory config: Allocator limit = 482344960
Actual batch schema & sizes {
...
  Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 
50507, Net row width: 48101, Density: 13}
Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 
77501904, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
...
Completed load phase: read 978 batches, spilled 194 times, total input bytes: 
49162397056
Starting consolidate phase. Batches = 978, Records = 100, Memory = 
378422096, In-memory batches 8, spilled runs 194
Starting merge phase. Runs = 62, Alloc. memory = 0
...
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}

Here, batch size is 1023, which means that some operator must have resized 
batches after the scanner read them (the scanner reads batches of 8K rows.) 
However, this sort saw all 1 million records.

Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per 
batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, 
for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes 
total. So the math works out.

The error message does not say which batch causes the OOM. At the time of OOM, 
the total memory is 525,809,920. So, something is off.

The vector being allocated is 16 MB, but we expected each batch to be ~7MB in 
size. Something does not balance.

The spill batch size is:

{code}
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100;
spill file = 268435456 bytes
{code}

That is, 100 records needs 7 MB, assuming 50% average internal fragmentation. 
But, somehow, we actually try to allocate 16 MB for 100 records, which requires 
160,000 bytes per record, almost four time our expected data size. Clearly, 
something is off.


was (Author: paul.rogers):
Analysis:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216,
mSort batch size = 65535
Memory config: Allocator limit = 482344960
Actual batch schema & sizes {
...
  Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 
50507, Net row width: 48101, Density: 13}
Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 
77501904, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
...
Completed load phase: read 978 batches, spilled 194 times, total input bytes: 
49162397056
Starting consolidate phase. Batches = 978, Records = 100, Memory = 
378422096, In-memory batches 8, spilled runs 194
Starting merge phase. Runs = 62, Alloc. memory = 0
...
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}

Here, batch size is 1023, which means that some operator must have resized 
batches after the scanner read them (the scanner reads batches of 8K rows.) 
However, this sort saw all 1 million records.

Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per 
batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, 
for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes 
total. So the math works out.

The error message does not say which batch causes the OOM. At the time of OOM, 
the total memory is 525,809,920. So, something is off.

The vector being allocated is 16 MB, but we expected each batch to be ~7MB in 
size. Something does not balance.

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/8/17 11:23 PM:
-

On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map vectorMap) throws 
OutOfMemoryException {
for (final ValueVector v : vectorMap.values()) {
  v.allocateNew();
}
  }
{code}

The above does not say the number of values to allocate. There seems to be some 
attempt to allocate the same size as the previous allocation, but this does not 
explain the offset vector behavior. Also, if the algorithm is, indeed, 
"allocate the last amount", then the algorithm creates a ratchet effect: the 
vector will only grow in size over batches, retaining the largest size yet 
seen. Spikes in data can result in wasted space (low density batches.) This is 
something to investigate elsewhere.

Further, the logic does not seem to work with the offset vector, causing 
reallocations. Something else to investigate.

The run takes about an hour. Good news, I can reproduce the problem on the Mac:

{code}
Starting merge phase. Runs = 62, Alloc. memory = 0
End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}


was (Author: paul.rogers):
On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-07 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157917#comment-16157917
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/8/17 1:39 AM:


Analysis:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216,
mSort batch size = 65535
Memory config: Allocator limit = 482344960
Actual batch schema & sizes {
...
  Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 
50507, Net row width: 48101, Density: 13}
Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 
77501904, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
...
Completed load phase: read 978 batches, spilled 194 times, total input bytes: 
49162397056
Starting consolidate phase. Batches = 978, Records = 100, Memory = 
378422096, In-memory batches 8, spilled runs 194
Starting merge phase. Runs = 62, Alloc. memory = 0
...
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}

Here, batch size is 1023, which means that some operator must have resized 
batches after the scanner read them (the scanner reads batches of 8K rows.) 
However, this sort saw all 1 million records.

Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per 
batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, 
for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes 
total. So the math works out.

The error message does not say which batch causes the OOM. At the time of OOM, 
the total memory is 525,809,920. So, something is off.

The vector being allocated is 16 MB, but we expected each batch to be ~7MB in 
size. Something does not balance.


was (Author: paul.rogers):
Analysis:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216,
mSort batch size = 65535
Memory config: Allocator limit = 482344960
Actual batch schema & sizes {
...
  Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 
50507, Net row width: 48101, Density: 13}
Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 
77501904, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
...
Completed load phase: read 978 batches, spilled 194 times, total input bytes: 
49162397056
Starting consolidate phase. Batches = 978, Records = 100, Memory = 
378422096, In-memory batches 8, spilled runs 194
Starting merge phase. Runs = 62, Alloc. memory = 0
...
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}

Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per 
batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, 
for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes 
total. So the math works out.

The error message does not say which batch causes the OOM. At the time of OOM, 
the total memory is 525,809,920. So, something is off.

The vector being allocated is 16 MB, but we expected each batch to be ~7MB in 
size. Something does not balance.

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-08-31 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149908#comment-16149908
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/1/17 2:33 AM:


When I run the query (with the repeated column fix):

{code}
ExternalSortBatch - Insufficient memory to merge two batches.
Incoming batch size: 678,068,224, available memory: 482,344,960
{code}

So, this one is a misconfiguration issue.

However, even with I gave the sort 3 GB, the OOM error persisted (without the 
memory warning.)

Turns out, however, that the OOM is now in the {{ProjectRecordBatch}} code. 
Error message:

{code}
Unable to allocate buffer of size 536,870,912 due to memory limit. Current 
allocation: 32768
{code}

The allocator that failed:

{code}
Allocator(op:0:0:10:Project) 1,000,000/32,768/1,342,242,816/10,000,000,000 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 1
ledger[2209] allocator: op:0:0:10:Project), isOwning: true, size: 32768, 
references: 1, life: 1602515109470380..0, allocatorManager: [1448, life: 
1602515109440991..0] holds 1 buffers. 
DrillBuf[2905], udle: [1449 0..32768]
  reservations: 0
{code}

This problem occurs on the fifth batch read by the sort (four batches already 
buffered).

Once I increased JVM direct memory to 4 GB, the query ran, so the above error 
is due to a true OOM. (The error message should say that the OOM is due to 
system memory exhaustion rather than a memory limit.)

The query now completes. Warning, it took almost an hour on a Mac with SSD 
storage.


was (Author: paul.rogers):
When I run the query (with the repeated column fix):

{code}
ExternalSortBatch - Insufficient memory to merge two batches.
Incoming batch size: 678,068,224, available memory: 482,344,960
{code}

So, this one is a misconfiguration issue.

However, even with I gave the sort 3 GB, the OOM error persisted (without the 
memory warning.)

Turns out, however, that the OOM is now in the {{ProjectRecordBatch}} code. 
Error message:

{code}
Unable to allocate buffer of size 536,870,912 due to memory limit. Current 
allocation: 32768
{code}

The allocator that failed:

{code}
Allocator(op:0:0:10:Project) 1,000,000/32,768/1,342,242,816/10,000,000,000 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 1
ledger[2209] allocator: op:0:0:10:Project), isOwning: true, size: 32768, 
references: 1, life: 1602515109470380..0, allocatorManager: [1448, life: 
1602515109440991..0] holds 1 buffers. 
DrillBuf[2905], udle: [1449 0..32768]
  reservations: 0
{code}

This problem occurs on the fifth batch read by the sort (four batches already 
buffered).

Once I increased JVM direct memory to 4 GB, the query ran, so the above error 
is due to a true OOM. (The error message should say that the OOM is due to 
system memory exhaustion rather than a memory limit.)

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
>