[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-07-10 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r453139681



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
*/
   private void splitAndTransferValidityBuffer(int startIndex, int length,
   BaseVariableWidthVector target) {
-int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
-int byteSizeTarget = getValidityBufferSizeFromCount(length);
-int offset = startIndex % 8;
+if (length <= 0) {

Review comment:
   Maybe for clarity, I could avoid doing these changes and simply 
[fix](https://github.com/apache/arrow/pull/6402/files#diff-db6c4f9e4030c5da8ccbcfe93a41b8a0R775)
 the validity buffer transfer. Would that be better ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-27 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r446497720



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
*/
   private void splitAndTransferValidityBuffer(int startIndex, int length,
   BaseVariableWidthVector target) {
-int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
-int byteSizeTarget = getValidityBufferSizeFromCount(length);
-int offset = startIndex % 8;
+if (length <= 0) {
+  return;
+}
 
-if (length > 0) {
-  if (offset == 0) {
-// slice
-if (target.validityBuffer != null) {
-  target.validityBuffer.getReferenceManager().release();
-}
-target.validityBuffer = validityBuffer.slice(firstByteSource, 
byteSizeTarget);
-target.validityBuffer.getReferenceManager().retain();
+final int firstByteSource = BitVectorHelper.byteIndex(startIndex);
+final int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
+final int byteSizeTarget = getValidityBufferSizeFromCount(length);
+final int offset = startIndex % 8;
+
+if (offset == 0) {
+  // slice
+  if (target.validityBuffer != null) {
+target.validityBuffer.getReferenceManager().release();
+  }
+  final ArrowBuf slicedValidityBuffer = 
validityBuffer.slice(firstByteSource, byteSizeTarget);
+  target.validityBuffer = transferBuffer(slicedValidityBuffer, 
target.allocator);

Review comment:
   Done in 076e9964740f663a813829a7c436439f6604123f





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-26 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r446429063



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
*/
   private void splitAndTransferValidityBuffer(int startIndex, int length,
   BaseVariableWidthVector target) {
-int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
-int byteSizeTarget = getValidityBufferSizeFromCount(length);
-int offset = startIndex % 8;
+if (length <= 0) {

Review comment:
   >  It seems like you're leaving this function with a different number of 
references than the other exit points.
   
   I'm not sure I understand.
   
   Before, the body of the function was contained in a big if: I thought it was 
good to return early for clarity and avoid declaring variables like 
`firstByteSource` or `lastByteSource` for nothing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit 2e13c5239137e54b0cd128240e6a3d0895c0c36f. Do they make sense ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439780145



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   > If the testSplitAndTransfer tests were run with two child buffer 
allocators I think they would fail.
   
   @rymurr there was indeed a problem in that case where the sliced data was 
not transferred to the target allocator. See 
681327e22f6dbda71b78e408b9fee6a8af0040b0 for a test about that.
   
   > A clearer/more complete optimization would probably to simply read the 
initial starting offset at the index that we're slicing from and if that is 
zero, don't do the allocate/copy
   
   @jacques-n I have extended the condition to include also that case of 
empties/nulls. See 2e13c5239137e54b0cd128240e6a3d0895c0c36f





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439780145



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   > If the testSplitAndTransfer tests were run with two child buffer 
allocators I think they would fail.
   
   @rymurr there was indeed a problem in that case where the sliced data was 
not transferred to the target allocator. See 
da7e3eacaf021580b4aae0f17492c17dbbef44a0 for a test about that.
   
   > A clearer/more complete optimization would probably to simply read the 
initial starting offset at the index that we're slicing from and if that is 
zero, don't do the allocate/copy
   
   @jacques-n I have extended the condition to include also that case of 
empties/nulls. See e304816f1e88609ce79a18d45c657901489e0b13





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit e304816f1e88609ce79a18d45c657901489e0b13. Do they make sense ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit e304816f1e88609ce79a18d45c657901489e0b13.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778364



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Of course ;o) I'm used to hamcrest where that is the opposite...

##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-05-15 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r425610823



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   @emkornfield I based the logic of that branch (relying on `retain()`) on 
the same case below for splitting the vailidty buffer:
   
   
https://github.com/apache/arrow/blob/29545bcdef35f4151379e72e69ef7ce619f1a517/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L772
   
   However, for the value buffer, `transferOwnership` is used, which is called 
from `transferBuffer`:
   
   
https://github.com/apache/arrow/blob/29545bcdef35f4151379e72e69ef7ce619f1a517/java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java#L752
   
   I am not sure about the difference, but from the documentation I understand 
that the ownership is not transferred to the new allocator when using `retain`. 
Does that mean the validity buffer case needs to be updated and use 
`transferOwnership` as well ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-05-08 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r422419677



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   > We should also verify that reference management and accounting is 
intact after this change
   
   @siddharthteotia  I have added a test but this is not checked. Do you have 
some pointers on how I can check this?
   
   > do you want to transfer the buffer here as well?
   
   @emkornfield What do you mean?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org