[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1175 ---
[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/1175#discussion_r175629518 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) { // Calculate pure data size. if (isVariableWidth) { -UInt4Vector offsetVector = ((RepeatedValueVector) v).getOffsetVector(); -int innerValueCount = offsetVector.getAccessor().get(valueCount); VariableWidthVector dataVector = ((VariableWidthVector) ((RepeatedValueVector) v).getDataVector()); -totalDataSize = dataVector.getOffsetVector().getAccessor().get(innerValueCount); +totalDataSize = dataVector.getCurrentSizeInBytes(); --- End diff -- @paul-rogers - I don't think `totalDataSize` includes both offset vector side and bytes size. It was meant to only include **pure data size only** for all entries in that column and that's what comment also suggests. Instead `totalNetSize` includes the size for data and offset vector which is used for computing the rowWidth. ---
[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1175#discussion_r175619764 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) { // Calculate pure data size. if (isVariableWidth) { -UInt4Vector offsetVector = ((RepeatedValueVector) v).getOffsetVector(); -int innerValueCount = offsetVector.getAccessor().get(valueCount); VariableWidthVector dataVector = ((VariableWidthVector) ((RepeatedValueVector) v).getDataVector()); -totalDataSize = dataVector.getOffsetVector().getAccessor().get(innerValueCount); +totalDataSize = dataVector.getCurrentSizeInBytes(); --- End diff -- Good improvement. The original code exposes far too much of the implementation. After all these changes, does the "dataSize" include both the offset vector and bytes? It should, else calls will be wrong. There are supposed to be three sizes: * Payload size: actual data bytes. * Data size: data + offsets + bits * Overall size: full length of all vectors. Payload size is what the user sees. Data size is how we calculate row width (since the rows must contain the overhead bytes). Vector length, here, only helps compute density, but is generated elsewhere. The point is, keep all three in mind, but keep the code separate. Otherwise, it is *very* easy to get confused and have the calculations blow up... ---
[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...
GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/1175 DRILL-6262: IndexOutOfBoundException in RecordBatchSize for empty var⦠â¦iableWidthVector You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-6262 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1175 commit dbde22ea37c486b483601ab551e7fc7c23fb96b8 Author: Sorabh HamirwasiaDate: 2018-03-16T23:57:12Z DRILL-6262: IndexOutOfBoundException in RecordBatchSize for empty variableWidthVector ---