[GitHub] [arrow] jacques-n commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be uncha

2020-06-23 Thread GitBox


jacques-n commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r444514257



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
*/
   private void splitAndTransferValidityBuffer(int startIndex, int length,
   BaseVariableWidthVector target) {
-int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
-int byteSizeTarget = getValidityBufferSizeFromCount(length);
-int offset = startIndex % 8;
+if (length <= 0) {

Review comment:
   I'm not understanding why this is okay. It seems like you're leaving 
this function with a different number of references than the other exit points.

##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
*/
   private void splitAndTransferValidityBuffer(int startIndex, int length,
   BaseVariableWidthVector target) {
-int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
-int byteSizeTarget = getValidityBufferSizeFromCount(length);
-int offset = startIndex % 8;
+if (length <= 0) {
+  return;
+}
 
-if (length > 0) {
-  if (offset == 0) {
-// slice
-if (target.validityBuffer != null) {
-  target.validityBuffer.getReferenceManager().release();
-}
-target.validityBuffer = validityBuffer.slice(firstByteSource, 
byteSizeTarget);
-target.validityBuffer.getReferenceManager().retain();
+final int firstByteSource = BitVectorHelper.byteIndex(startIndex);
+final int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
+final int byteSizeTarget = getValidityBufferSizeFromCount(length);
+final int offset = startIndex % 8;
+
+if (offset == 0) {
+  // slice
+  if (target.validityBuffer != null) {
+target.validityBuffer.getReferenceManager().release();
+  }
+  final ArrowBuf slicedValidityBuffer = 
validityBuffer.slice(firstByteSource, byteSizeTarget);
+  target.validityBuffer = transferBuffer(slicedValidityBuffer, 
target.allocator);

Review comment:
   i think you can just return here (or make small modifications to do so) 
avoiding the large else block





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jacques-n commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be uncha

2020-06-11 Thread GitBox


jacques-n commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439003304



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   splitAndTransfer is trying to do best effort transfer of data, not copy. 
However, for some situations some allocations have to occur. This is only true 
for split and transfer. Normal transfers never have to copy data and only about 
ownership. 
   
   I can't actually get to the commit these comments are on to confirm the 
specific details. It makes sense that a split and transfer that starts at the 
same initial offset doesn't need to rewrite the offsets and therefore would do 
a transfer. A clearer/more complete optimization would probably to simply read 
the initial starting offset at the index that we're slicing from and if that is 
zero, don't do the allocate/copy. In that case, if you have a bunch of empty 
strings or null values, you still can do a split and transfer with zero 
copy/allocation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org