[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1125 ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172105917 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -199,12 +422,18 @@ public String toString() { .append(", per-array: ") .append(estElementCountPerArray); } - buf .append(", std size: ") - .append(stdSize) - .append(", actual size: ") - .append(estSize) - .append(", data size: ") - .append(dataSize) + buf .append(", std size per entry: ") + .append(getStdDataSizePerEntry()) + .append(", std net size per entry: ") + .append(getStdNetSizePerEntry()) + .append(", data size per entry: ") + .append(getDataSizePerEntry()) + .append(", net size per entry: ") + .append(getNetSizePerEntry()) + .append(", totalDataSize: ") + .append(getTotalDataSize()) + .append(", totalNetSize: ") + .append(getTotalNetSize()) --- End diff -- Extra info is always good. This info is printed by the sort operator when debug logging is enabled. (It was invaluable when investigating issues found by QA tests.) Suggestion: find a way to compact the labels. Maybe: ``` Per entry: std size: xx, std net: xxx; Totals: data size: xxx, net size: xxx ``` Also, `totalDataSize` and `totalNetSize` should have spaces: they are labels, not variable names here. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172105268 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -525,4 +763,11 @@ public VectorInitializer buildVectorInitializer() { } return initializer; } + + public void allocateVectors(VectorContainer container, int recordCount) { +for (VectorWrapper w : container) { + ColumnSize colSize = columnSizes.get(w.getField().getName()); + colSize.allocateVector(w.getValueVector(), recordCount); +} + } --- End diff -- Very nice clean-up and enhancements. Now I don't feel so bad about this stuff being quick & dirty... ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172104808 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -76,110 +82,327 @@ * greater than (but unlikely) same as the row count. */ -public final int valueCount; +private final int valueCount; /** - * Total number of elements for a repeated type, or 1 if this is - * a non-repeated type. That is, a batch of 100 rows may have an - * array with 10 elements per row. In this case, the element count - * is 1000. + * Total number of elements for a repeated type, or same as + * valueCount if this is a non-repeated type. That is, a batch + * of 100 rows may have an array with 10 elements per row. + * In this case, the element count is 1000. */ -public final int elementCount; +private int elementCount; /** - * Size of the top level value vector. For map and repeated list, - * this is just size of offset vector. + * The estimated, average number of elements per parent value. + * Always 1 for a non-repeated type. For a repeated type, + * this is the average entries per array (per repeated element). */ -public int dataSize; + +private float estElementCountPerArray; /** - * Total size of the column includes the sum total of memory for all - * value vectors representing the column. + * Indicates if it is variable width column. + * For map columns, this is true if any of the children is variable + * width column. */ -public int netSize; + +private boolean isVariableWidth; /** - * The estimated, average number of elements per parent value. - * Always 1 for a non-repeated type. For a repeated type, - * this is the average entries per array (per repeated element). + * Indicates if cardinality is repeated(top level only). + */ + +private boolean isRepeated; + +/** + * Indicates if cardinality is optional i.e. nullable(top level only). + */ +private boolean isOptional; + +/** + * Child columns if this is a map column. + */ +private Mapchildren = CaseInsensitiveMap.newHashMap(); + +/** + * std pure data size per entry from Drill metadata, based on type. + * Does not include metadata vector overhead we add for cardinality, + * variable length etc. + * For variable-width columns, we use 50 as std size for entry width. + * For repeated column, we assume repetition of 10. + */ +public int getStdDataSizePerEntry() { + int stdDataSize; + + try { +stdDataSize = TypeHelper.getSize(metadata.getType()); + +// For variable width, typeHelper includes offset vector width. Adjust for that. +if (isVariableWidth) { + stdDataSize -= OFFSET_VECTOR_WIDTH; +} + +if (isRepeated) { + stdDataSize = stdDataSize * STD_REPETITION_FACTOR; +} + } catch (Exception e) { +// For unsupported types, just set stdSize to 0. +// Map, Union, List etc. +stdDataSize = 0; + } + + // Add sizes of children. + for (ColumnSize columnSize : children.values()) { +stdDataSize += columnSize.getStdDataSizePerEntry(); + } + + if (isRepeatedList()) { +stdDataSize = stdDataSize * STD_REPETITION_FACTOR; + } + + return stdDataSize; +} + +/** + * std net size per entry taking into account additional metadata vectors + * we add on top for variable length, cardinality etc. + * For variable-width columns, we use 50 as std data size for entry width. + * For repeated column, we assume repetition of 10. + */ +public int getStdNetSizePerEntry() { + int stdNetSize; + try { +stdNetSize = TypeHelper.getSize(metadata.getType()); + } catch (Exception e) { +stdNetSize = 0; + } + + if (isOptional) { +stdNetSize += BIT_VECTOR_WIDTH; + } + + if (isRepeated) { +stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH; + } + + for (ColumnSize columnSize : children.values()) { +stdNetSize += columnSize.getStdNetSizePerEntry(); + } + + if (isRepeatedList()) { +stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH; +
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172105435 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/complex/RepeatedMapVector.java --- @@ -608,4 +608,22 @@ public void collectLedgers(Set ledgers) { public void toNullable(ValueVector nullableVector) { throw new UnsupportedOperationException(); } + + @Override + public int getPayloadByteCount(int valueCount) { +if (valueCount == 0) { + return 0; +} + +int count = 0; + +int entryCount = offsets.getAccessor().get(valueCount); +count += offsets.getPayloadByteCount(valueCount); --- End diff -- ``` int count = offsets.get... ``` ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172105104 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -76,110 +82,327 @@ * greater than (but unlikely) same as the row count. */ -public final int valueCount; +private final int valueCount; /** - * Total number of elements for a repeated type, or 1 if this is - * a non-repeated type. That is, a batch of 100 rows may have an - * array with 10 elements per row. In this case, the element count - * is 1000. + * Total number of elements for a repeated type, or same as + * valueCount if this is a non-repeated type. That is, a batch + * of 100 rows may have an array with 10 elements per row. + * In this case, the element count is 1000. */ -public final int elementCount; +private int elementCount; /** - * Size of the top level value vector. For map and repeated list, - * this is just size of offset vector. + * The estimated, average number of elements per parent value. + * Always 1 for a non-repeated type. For a repeated type, + * this is the average entries per array (per repeated element). */ -public int dataSize; + +private float estElementCountPerArray; /** - * Total size of the column includes the sum total of memory for all - * value vectors representing the column. + * Indicates if it is variable width column. + * For map columns, this is true if any of the children is variable + * width column. */ -public int netSize; + +private boolean isVariableWidth; /** - * The estimated, average number of elements per parent value. - * Always 1 for a non-repeated type. For a repeated type, - * this is the average entries per array (per repeated element). + * Indicates if cardinality is repeated(top level only). + */ + +private boolean isRepeated; + +/** + * Indicates if cardinality is optional i.e. nullable(top level only). + */ +private boolean isOptional; + +/** + * Child columns if this is a map column. + */ +private Mapchildren = CaseInsensitiveMap.newHashMap(); + +/** + * std pure data size per entry from Drill metadata, based on type. + * Does not include metadata vector overhead we add for cardinality, + * variable length etc. + * For variable-width columns, we use 50 as std size for entry width. + * For repeated column, we assume repetition of 10. + */ +public int getStdDataSizePerEntry() { + int stdDataSize; + + try { +stdDataSize = TypeHelper.getSize(metadata.getType()); + +// For variable width, typeHelper includes offset vector width. Adjust for that. +if (isVariableWidth) { + stdDataSize -= OFFSET_VECTOR_WIDTH; +} + +if (isRepeated) { + stdDataSize = stdDataSize * STD_REPETITION_FACTOR; +} + } catch (Exception e) { +// For unsupported types, just set stdSize to 0. +// Map, Union, List etc. +stdDataSize = 0; + } + + // Add sizes of children. + for (ColumnSize columnSize : children.values()) { +stdDataSize += columnSize.getStdDataSizePerEntry(); + } + + if (isRepeatedList()) { +stdDataSize = stdDataSize * STD_REPETITION_FACTOR; + } + + return stdDataSize; +} + +/** + * std net size per entry taking into account additional metadata vectors + * we add on top for variable length, cardinality etc. + * For variable-width columns, we use 50 as std data size for entry width. + * For repeated column, we assume repetition of 10. + */ +public int getStdNetSizePerEntry() { + int stdNetSize; + try { +stdNetSize = TypeHelper.getSize(metadata.getType()); + } catch (Exception e) { +stdNetSize = 0; + } + + if (isOptional) { +stdNetSize += BIT_VECTOR_WIDTH; + } + + if (isRepeated) { +stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH; + } + + for (ColumnSize columnSize : children.values()) { +stdNetSize += columnSize.getStdNetSizePerEntry(); + } + + if (isRepeatedList()) { +stdNetSize = (stdNetSize * STD_REPETITION_FACTOR) + OFFSET_VECTOR_WIDTH; +
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172103839 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java --- @@ -492,8 +504,12 @@ private void allocateBatch(boolean newSchema) { } else { container.zeroVectors(); } + +// Allocate memory for the vectors. +// This will iteratively allocate memory for all nested columns underneath. for (VectorWrapper w : container) { - AllocationHelper.allocateNew(w.getValueVector(), Character.MAX_VALUE); + RecordBatchSizer.ColumnSize colSize = mergeJoinMemoryManager.getColumnSize(w.getField().getName()); + colSize.allocateVector(w.getValueVector(), mergeJoinMemoryManager.getOutputRowCount()); --- End diff -- Nice! Much simpler. Just one nit: might as well get the row count once and store it in a local variable to avoid getting the same number repeatedly. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172104598 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -76,110 +82,327 @@ * greater than (but unlikely) same as the row count. */ -public final int valueCount; +private final int valueCount; /** - * Total number of elements for a repeated type, or 1 if this is - * a non-repeated type. That is, a batch of 100 rows may have an - * array with 10 elements per row. In this case, the element count - * is 1000. + * Total number of elements for a repeated type, or same as + * valueCount if this is a non-repeated type. That is, a batch + * of 100 rows may have an array with 10 elements per row. + * In this case, the element count is 1000. */ -public final int elementCount; +private int elementCount; /** - * Size of the top level value vector. For map and repeated list, - * this is just size of offset vector. + * The estimated, average number of elements per parent value. + * Always 1 for a non-repeated type. For a repeated type, + * this is the average entries per array (per repeated element). */ -public int dataSize; + +private float estElementCountPerArray; /** - * Total size of the column includes the sum total of memory for all - * value vectors representing the column. + * Indicates if it is variable width column. + * For map columns, this is true if any of the children is variable + * width column. */ -public int netSize; + +private boolean isVariableWidth; /** - * The estimated, average number of elements per parent value. - * Always 1 for a non-repeated type. For a repeated type, - * this is the average entries per array (per repeated element). + * Indicates if cardinality is repeated(top level only). + */ + +private boolean isRepeated; --- End diff -- Might be fun to check out the new metadata classes added for the result set loader. They parse the `MajorType` to pull out this kind of information. You could embed an instance of the `ColumnMetadata` class here to provide this detailed information. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172104395 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -76,110 +82,327 @@ * greater than (but unlikely) same as the row count. */ -public final int valueCount; +private final int valueCount; /** - * Total number of elements for a repeated type, or 1 if this is - * a non-repeated type. That is, a batch of 100 rows may have an - * array with 10 elements per row. In this case, the element count - * is 1000. + * Total number of elements for a repeated type, or same as + * valueCount if this is a non-repeated type. That is, a batch + * of 100 rows may have an array with 10 elements per row. + * In this case, the element count is 1000. */ -public final int elementCount; +private int elementCount; /** - * Size of the top level value vector. For map and repeated list, - * this is just size of offset vector. + * The estimated, average number of elements per parent value. + * Always 1 for a non-repeated type. For a repeated type, + * this is the average entries per array (per repeated element). */ -public int dataSize; + +private float estElementCountPerArray; --- End diff -- Perhaps `estCardinality`? I've been using the term "cardinality" in new code to refer to array sizes. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r171999276 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer initializer) { else if (width > 0) { initializer.variableWidth(name, width); } + + for (ColumnSize columnSize : childColumnSizes.values()) { +columnSize.buildVectorInitializer(initializer); + } } + } public static ColumnSize getColumn(ValueVector v, String prefix) { return new ColumnSize(v, prefix); } + public ColumnSize getColumn(String name) { +return allColumnSizes.get(name); + } + public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; // 16 MiB - private MapcolumnSizes = CaseInsensitiveMap.newHashMap(); + // This keeps information for all columns i.e. all top columns and nested columns underneath + private Map allColumnSizes = CaseInsensitiveMap.newHashMap(); --- End diff -- yes, I got rid of allColumnSizes. We will have only top level columns. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r171999148 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -418,11 +438,13 @@ private void measureColumn(ValueVector v, String prefix) { netRowWidthCap50 += ! colSize.isVariableWidth ? colSize.estSize : 8 /* offset vector */ + roundUpToPowerOf2(Math.min(colSize.estSize,50)); // above change 8 to 4 after DRILL-5446 is fixed + +return colSize; } - private void expandMap(AbstractMapVector mapVector, String prefix) { + private void expandMap(ColumnSize colSize, AbstractMapVector mapVector, String prefix) { for (ValueVector vector : mapVector) { - measureColumn(vector, prefix); + colSize.childColumnSizes.put(prefix + vector.getField().getName(), measureColumn(vector, prefix)); --- End diff -- I made the change to maintain trees of columns for trees of maps. Any one who wants to access the lower level columns, they get top level column and have to walk through the tree. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169859674 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer initializer) { else if (width > 0) { initializer.variableWidth(name, width); } + + for (ColumnSize columnSize : childColumnSizes.values()) { +columnSize.buildVectorInitializer(initializer); + } } + } public static ColumnSize getColumn(ValueVector v, String prefix) { return new ColumnSize(v, prefix); } + public ColumnSize getColumn(String name) { +return allColumnSizes.get(name); + } + public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; // 16 MiB - private MapcolumnSizes = CaseInsensitiveMap.newHashMap(); + // This keeps information for all columns i.e. all top columns and nested columns underneath + private Map allColumnSizes = CaseInsensitiveMap.newHashMap(); --- End diff -- I'm a bit confused. I saw above that a `ColumnSize` has a list of children. Why are the children repeated here, introducing the naming issues described below? Given the column alias issues, and how column index is used elsewhere, can this just be an index of top-level columns? Then, to size map vectors (the only one that has the nesting issue), use the code for recursion that already exists in the `VectorInitializer`? ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169860846 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java --- @@ -266,12 +270,18 @@ public void addComplexWriter(ComplexWriter writer) { complexWriters.add(writer); } - private boolean doAlloc() { -//Allocate vv in the allocationVectors. + private boolean doAlloc(int recordCount) { + +//Allocate v in the allocationVectors. for (ValueVector v : this.allocationVectors) { - if (!v.allocateNewSafe()) { -return false; - } + // build vector initializer for the column. + // This will iteratively include all nested columns underneath. + RecordBatchSizer.ColumnSize colSize = flattenMemoryManager.getColumnSize(v.getField().getName()); + VectorInitializer initializer = new VectorInitializer(); + colSize.buildVectorInitializer(initializer); + // Allocate memory for the vector. If it is map, it will allocate memory + // for all nested child columns as well. + initializer.allocateVector(v, "", recordCount); --- End diff -- While this code can be made to work, it does introduce a more complex path than was intended. The idea is that a `VectorInirializer` is a recursive structure. The top-level one holds hints for the row. Nested instances can hold data for maps. Because this class was meant to be temporary, it holds "hints": the information is used if available, defaults used if the hints are not available. So, a better approach would be to assemble a `VectorInitializer` for the output row in one step. Then, apply it to the entire row in another step. Further if we do that, we don't have to create initializer objects for every vector allocation; we can reuse the same set if the output row sizes don't change. Further, the code will be easier to reason about since we won't have two distinct paths. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169857960 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -418,11 +438,13 @@ private void measureColumn(ValueVector v, String prefix) { netRowWidthCap50 += ! colSize.isVariableWidth ? colSize.estSize : 8 /* offset vector */ + roundUpToPowerOf2(Math.min(colSize.estSize,50)); // above change 8 to 4 after DRILL-5446 is fixed + +return colSize; } - private void expandMap(AbstractMapVector mapVector, String prefix) { + private void expandMap(ColumnSize colSize, AbstractMapVector mapVector, String prefix) { for (ValueVector vector : mapVector) { - measureColumn(vector, prefix); + colSize.childColumnSizes.put(prefix + vector.getField().getName(), measureColumn(vector, prefix)); --- End diff -- This is subject to aliasing. Suppose I have two maps: ``` aa(b) a(ab) ``` When I add the child vectors, both will produce a combined name of `aab`. We can't use dots n names for the same reason: ``` a.b(c) a(b.c) ``` Both will produce `a.b.c`. In the new "result set loader" code, all places that handle trees of columns use actual trees of maps. A crude-but-effecive solution is to use a non-legal name character. The only valid one is the back-tick since we use that in SQL to quote names. If we do that, we now have ``` aa`b a`ab a.b`c a`b.c ``` And the names are now un-aliased. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169860310 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer initializer) { else if (width > 0) { initializer.variableWidth(name, width); } + --- End diff -- There may be a but in the original code above this line. ``` String name = prefix + metadata.getName(); ... initializer.variableWidthArray(name, ... ``` First problem: this method does not recurse into the child vectors of maps. Second, if it did, since each initializer is supposed to hold a name relative to its parent tuple (row or map), the prefix is not needed. The vector initializer code was originally created for sort and none of the tests for sort include maps, so I probably did not catch the fact that maps don't work correctly with the original code. Suggestion: add unit tests for this stuff. This code is getting very complex and tests would be very helpful to catch these subtle bugs. Disclaimer: this is all supposed to be replaced with the ResultSetLoader code. The vector allocation code there is much cleaner, does handle maps, and has been unit tested. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169858105 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -108,6 +108,12 @@ public final float estElementCountPerArray; public final boolean isVariableWidth; +public MapchildColumnSizes = CaseInsensitiveMap.newHashMap(); --- End diff -- Nit: `children` might be shorter and simpler. It is clear that the children are sizes from the member type. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169859837 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorInitializer.java --- @@ -97,7 +97,7 @@ public void allocateBatch(VectorAccessible va, int recordCount) { } } - private void allocateVector(ValueVector vector, String prefix, int recordCount) { + public void allocateVector(ValueVector vector, String prefix, int recordCount) { --- End diff -- Note the error in the member variable in this class: ``` private Maphints = new HashMap<>(); ``` Should be a case insensitive map. ---
[GitHub] drill pull request #1125: DRILL-6126: Allocate memory for value vectors upfr...
GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/1125 DRILL-6126: Allocate memory for value vectors upfront in flatten operator Made changes to allocate memory upfront for flatten operator based on sizing calculations. Need to do allocation of single column (can be nested) for a particular record count and allocation hints. Refactored the code a bit for that. Instead of assuming worst case fragmentation factor of 2, changed the logic to round down the number of rows calculated to nearest power of two. This will allow us to pack value vectors more densely and will help with memory utilization. RepeatedMapvector and RepeatedListVector are extended from RepeatedFixedWidthVectorLike. This is wrong and causing problems in Allocation logic (allocatePrecomputedChildCount in AllocationHelper more specifically). Fixed that. This PR has 2 commits. One for all of the above and second one for DRILL-6162: Enhance record batch sizer to retain nesting information. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-6126 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1125.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1125 commit 58c6b9ad584e56c71d982feaaa43ad32b5011eef Author: Padma PenumarthyDate: 2018-02-21T17:33:12Z DRILL-6162: Enhance record batch sizer to retain nesting information for map columns. commit f7c09131179b75d10ffe195785c9aef3b9c7ed97 Author: Padma Penumarthy Date: 2018-02-21T17:35:47Z DRILL-6126: Allocate memory for value vectors upfront in flatten operator ---