[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-10 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r452934813



##
File path: dev/archery/archery/integration/runner.py
##
@@ -126,6 +126,8 @@ def _gold_tests(self, gold_dir):
 if f.name == name).skip
 except StopIteration:
 skip = set()
+if name == 'union' and prefix == '0.17.1':

Review comment:
   This is a minor hack to remove Java from 0.17.1 union gold tests. Not 
the most elegant way to do it but doing it correctly would be a non-trivial 
change to archery imho. Thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r452103046



##
File path: java/vector/src/main/codegen/templates/DenseUnionVector.java
##
@@ -812,12 +757,17 @@ public int getValueCount() {
   }
 
   public boolean isNull(int index) {

Review comment:
   I set `isNull` to always be false and all the tests seem to pass (with 
minor modifications). Take a look and let me know what you think.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r452086650



##
File path: java/vector/src/main/codegen/templates/DenseUnionVector.java
##
@@ -812,12 +757,17 @@ public int getValueCount() {
   }
 
   public boolean isNull(int index) {

Review comment:
   I am ok with changing this to always return false. However I think this 
means anyone using a Union vector would have to extract the child and check for 
null as an extra step. All other nested vectors have a top level validity 
buffer. I don't know what the ramifications of changing this are?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r452084718



##
File path: java/vector/src/main/codegen/templates/DenseUnionVector.java
##
@@ -283,7 +285,7 @@ public long getValidityBufferAddress() {
   public ArrowBuf getDataBuffer() { throw new UnsupportedOperationException(); 
}
 
   public StructVector getStruct(byte typeId) {
-StructVector structVector = (StructVector) childVectors[typeId];
+StructVector structVector = typeId < 0 ? null : (StructVector) 
childVectors[typeId];

Review comment:
   correct, its a bit strange as there is no way to tell between null and 
not yet allocated. One of hte only benefits of a separate validity buffer. I 
thought this was a good compromise though I dont love it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r452082717



##
File path: java/vector/src/main/codegen/templates/DenseUnionVector.java
##
@@ -268,11 +270,11 @@ public long getDataBufferAddress() {
 
   @Override
   public long getValidityBufferAddress() {
-return validityBuffer.memoryAddress();
+return typeBuffer.memoryAddress();
   }
 
   @Override
-  public ArrowBuf getValidityBuffer() { return validityBuffer; }
+  public ArrowBuf getValidityBuffer() { return typeBuffer; }

Review comment:
   same here
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r452082646



##
File path: java/vector/src/main/codegen/templates/DenseUnionVector.java
##
@@ -268,11 +270,11 @@ public long getDataBufferAddress() {
 
   @Override
   public long getValidityBufferAddress() {
-return validityBuffer.memoryAddress();
+return typeBuffer.memoryAddress();

Review comment:
   Ahh, yes. I copied UnionVector (which returns `typeBuffer`) but I didn't 
really like it. Fixed on both





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-16 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r441052320



##
File path: java/vector/src/main/codegen/templates/UnionVector.java
##
@@ -325,12 +361,45 @@ private void allocateTypeBuffer() {
 typeBuffer.setZero(0, typeBuffer.capacity());
   }
 
+  private void allocateValidityBuffer() {
+validityBuffer = allocator.buffer(validityBufferAllocationSizeInBytes);
+validityBuffer.readerIndex(0);
+validityBuffer.setZero(0, validityBuffer.capacity());
+  }
+
   @Override
   public void reAlloc() {
 internalStruct.reAlloc();
+reallocValidityBuffer();
 reallocTypeBuffer();
   }
 
+  private void reallocValidityBuffer() {
+final long currentBufferCapacity = validityBuffer.capacity();
+long newAllocationSize = currentBufferCapacity * 2;
+if (newAllocationSize == 0) {
+  if (validityBufferAllocationSizeInBytes > 0) {
+newAllocationSize = validityBufferAllocationSizeInBytes;
+  } else {
+newAllocationSize = 
DataSizeRoundingUtil.divideBy8Ceil(BaseValueVector.INITIAL_VALUE_ALLOCATION) * 
2;
+  }

Review comment:
   removed. Probably a copy/paste error





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-16 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r441050322



##
File path: java/vector/src/main/codegen/templates/UnionVector.java
##
@@ -586,7 +686,9 @@ public ValueVector getVectorByType(int typeId) {
   }
 
 public ValueVector getVectorByType(int typeId, ArrowType arrowType) {
-  switch (MinorType.values()[typeId]) {
+  Types.MinorType type = (typeIds[typeId] != null) ?
+  Types.getMinorTypeForArrowType(typeIds[typeId].getType()) : 
Types.MinorType.values()[typeId];
+  switch (type) {

Review comment:
   This switch uses the `typeIds` array to extract the MinorType associated 
with the logical type. If it does not exist in the array (usually because a 
logical type wasn't previously assigned) it defaults to the id of the minor 
type. So Java always uses the MinorType as the type id and allows c++ (and 
others) to send arbitrary type mappings.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-16 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r441047163



##
File path: java/vector/src/main/codegen/templates/UnionVector.java
##
@@ -493,6 +576,19 @@ public void splitAndTransfer(int startIndex, int length) {
   Preconditions.checkArgument(startIndex >= 0 && length >= 0 && startIndex 
+ length <= valueCount,
   "Invalid parameters startIndex: %s, length: %s for valueCount: %s", 
startIndex, length, valueCount);
   to.clear();
+  // transfer validity buffer
+  while (to.getValidityBufferValueCapacity() < length) {
+to.reallocValidityBuffer();
+  }
+  for (int i = 0; i < length; i++) {
+int validity = BitVectorHelper.get(validityBuffer, startIndex + i);
+if (validity == 0) {
+  BitVectorHelper.unsetBit(to.validityBuffer, i);
+} else {
+  BitVectorHelper.setBit(to.validityBuffer, i);
+}

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-16 Thread GitBox


rymurr commented on a change in pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#discussion_r441045982



##
File path: java/vector/src/main/codegen/templates/DenseUnionVector.java
##
@@ -104,6 +105,7 @@
* The index is the type id, and the value is the type field.
*/
   private Field[] typeFields = new Field[Byte.MAX_VALUE + 1];
+  private byte[] typeMapFields = new byte[Byte.MAX_VALUE + 1];

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org