[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1175


---


[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-19 Thread sohami
Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/1175#discussion_r175629518
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) {
 
   // Calculate pure data size.
   if (isVariableWidth) {
-UInt4Vector offsetVector = ((RepeatedValueVector) 
v).getOffsetVector();
-int innerValueCount = 
offsetVector.getAccessor().get(valueCount);
 VariableWidthVector dataVector = ((VariableWidthVector) 
((RepeatedValueVector) v).getDataVector());
-totalDataSize = 
dataVector.getOffsetVector().getAccessor().get(innerValueCount);
+totalDataSize = dataVector.getCurrentSizeInBytes();
--- End diff --

@paul-rogers - I don't think `totalDataSize` includes both offset vector 
side and bytes size. It was meant to only include **pure data size only** for 
all entries in that column and that's what comment also suggests.

Instead `totalNetSize` includes the size for data and offset vector which 
is used for computing the rowWidth.


---


[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-19 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1175#discussion_r175619764
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
@@ -321,10 +321,8 @@ public ColumnSize(ValueVector v, String prefix) {
 
   // Calculate pure data size.
   if (isVariableWidth) {
-UInt4Vector offsetVector = ((RepeatedValueVector) 
v).getOffsetVector();
-int innerValueCount = 
offsetVector.getAccessor().get(valueCount);
 VariableWidthVector dataVector = ((VariableWidthVector) 
((RepeatedValueVector) v).getDataVector());
-totalDataSize = 
dataVector.getOffsetVector().getAccessor().get(innerValueCount);
+totalDataSize = dataVector.getCurrentSizeInBytes();
--- End diff --

Good improvement. The original code exposes far too much of the 
implementation.

After all these changes, does the "dataSize" include both the offset vector 
and bytes? It should, else calls will be wrong. There are supposed to be three 
sizes:

* Payload size: actual data bytes.
* Data size: data + offsets + bits
* Overall size: full length of all vectors.

Payload size is what the user sees. Data size is how we calculate row width 
(since the rows must contain the overhead bytes). Vector length, here, only 
helps compute density, but is generated elsewhere. The point is, keep all three 
in mind, but keep the code separate. Otherwise, it is *very* easy to get 
confused and have the calculations blow up...


---


[GitHub] drill pull request #1175: DRILL-6262: IndexOutOfBoundException in RecordBatc...

2018-03-19 Thread sohami
GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/1175

DRILL-6262: IndexOutOfBoundException in RecordBatchSize for empty var…

…iableWidthVector

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-6262

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1175


commit dbde22ea37c486b483601ab551e7fc7c23fb96b8
Author: Sorabh Hamirwasia 
Date:   2018-03-16T23:57:12Z

DRILL-6262: IndexOutOfBoundException in RecordBatchSize for empty 
variableWidthVector




---