[GitHub] [arrow] kou closed pull request #7431: ARROW-9116: [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length
kou closed pull request #7431: URL: https://github.com/apache/arrow/pull/7431 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] sergeyt commented on pull request #7400: ARROW-9088: [Rust] Make prettyprint optional
sergeyt commented on pull request #7400: URL: https://github.com/apache/arrow/pull/7400#issuecomment-643713603 @houqp `prettytable-rs` depends on `term` which depends on `dirs` which is causes error. actually `dirs` has [a code for wasm32](https://github.com/dirs-dev/dirs-rs/blob/master/src/wasm.rs), but it seems it is broken This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7431: ARROW-9116: [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length
github-actions[bot] commented on pull request #7431: URL: https://github.com/apache/arrow/pull/7431#issuecomment-643710945 https://issues.apache.org/jira/browse/ARROW-9116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7431: ARROW-9116: [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length
wesm opened a new pull request #7431: URL: https://github.com/apache/arrow/pull/7431 This also protects against the offsets buffer being null. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows
github-actions[bot] commented on pull request #7430: URL: https://github.com/apache/arrow/pull/7430#issuecomment-643703705 https://issues.apache.org/jira/browse/ARROW-9126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ctring opened a new pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows
ctring opened a new pull request #7430: URL: https://github.com/apache/arrow/pull/7430 In Linux, the configuration step of Boost allows selecting a subset of the libraries being built using the `--with-libraries=` flag. However, this flag does not exist in the configure script on Windows, so now we're building all libraries, including those being trimmed: ``` Component configuration: - atomic : building - chrono : building - container : building - date_time : building - exception : building - filesystem : building - headers : building - iostreams : building - locale : building - log : building - mpi : building - program_options : building - python : building - random : building - regex : building - serialization : building - system : building - test : building - thread : building - timer : building - wave : building ``` To select only the necessary libraries on Windows, we can provide the `--with-` flag to `./b2`. Doing so helped getting me pass the boost building step but the thrift building step failed. This time, it is still due to a header file `typeof/incr_registration_group.hpp` missing in Boost (this file is only required on Windows but not on Linux); thus, added that header file to the `trim-boost.sh` script. I verified that this can now build on both Windows and Linux with the commands: ``` mkdir build cd build cmake .. -DARROW_PARQUET=ON cmake --build . ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -886,6 +886,65 @@ public void testNullableFixedType4() { * -- VarBinaryVector */ + @Test /* VarCharVector */ + public void testSplitAndTransfer1() { +try (final VarCharVector target = newVarCharVector("split-target", allocator)) { + try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, allocator)) { +vector.allocateNew(1024 * 10, 1024); + +vector.set(0, STR1); +vector.set(1, STR2); +vector.set(2, STR3); +vector.setValueCount(3); + +final long allocatedMem = allocator.getAllocatedMemory(); +final int validityRefCnt = vector.getValidityBuffer().refCnt(); +final int offsetRefCnt = vector.getOffsetBuffer().refCnt(); +final int dataRefCnt = vector.getDataBuffer().refCnt(); + +// split and transfer with slice starting at the beginning: this should not allocate anything new +vector.splitAndTransferTo(0, 2, target); +assertEquals(allocator.getAllocatedMemory(), allocatedMem); +// 2 = validity and offset buffers are stored in the same arrowbuf +assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2); Review comment: Not at all, that's a good question. I've improved the assertion's comment in commit 2e13c5239137e54b0cd128240e6a3d0895c0c36f. Do they make sense ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r439780145 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, int length, BaseVariab final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH); final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH); final int dataLength = end - start; -target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH); -for (int i = 0; i < length + 1; i++) { - final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * OFFSET_WIDTH) - start; - target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset); + +if (startIndex == 0) { + target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH); + target.offsetBuffer.getReferenceManager().retain(); Review comment: > If the testSplitAndTransfer tests were run with two child buffer allocators I think they would fail. @rymurr there was indeed a problem in that case where the sliced data was not transferred to the target allocator. See 681327e22f6dbda71b78e408b9fee6a8af0040b0 for a test about that. > A clearer/more complete optimization would probably to simply read the initial starting offset at the index that we're slicing from and if that is zero, don't do the allocate/copy @jacques-n I have extended the condition to include also that case of empties/nulls. See 2e13c5239137e54b0cd128240e6a3d0895c0c36f This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
scampi commented on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-643699840 @wesm Thanks all for the comments. Please have a look again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r439780145 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, int length, BaseVariab final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH); final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH); final int dataLength = end - start; -target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH); -for (int i = 0; i < length + 1; i++) { - final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * OFFSET_WIDTH) - start; - target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset); + +if (startIndex == 0) { + target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH); + target.offsetBuffer.getReferenceManager().retain(); Review comment: > If the testSplitAndTransfer tests were run with two child buffer allocators I think they would fail. @rymurr there was indeed a problem in that case where the sliced data was not transferred to the target allocator. See da7e3eacaf021580b4aae0f17492c17dbbef44a0 for a test about that. > A clearer/more complete optimization would probably to simply read the initial starting offset at the index that we're slicing from and if that is zero, don't do the allocate/copy @jacques-n I have extended the condition to include also that case of empties/nulls. See e304816f1e88609ce79a18d45c657901489e0b13 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7426: ARROW-9116: [C++] Add BaseBinaryArray::total_values_length
kou closed pull request #7426: URL: https://github.com/apache/arrow/pull/7426 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -886,6 +886,65 @@ public void testNullableFixedType4() { * -- VarBinaryVector */ + @Test /* VarCharVector */ + public void testSplitAndTransfer1() { +try (final VarCharVector target = newVarCharVector("split-target", allocator)) { + try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, allocator)) { +vector.allocateNew(1024 * 10, 1024); + +vector.set(0, STR1); +vector.set(1, STR2); +vector.set(2, STR3); +vector.setValueCount(3); + +final long allocatedMem = allocator.getAllocatedMemory(); +final int validityRefCnt = vector.getValidityBuffer().refCnt(); +final int offsetRefCnt = vector.getOffsetBuffer().refCnt(); +final int dataRefCnt = vector.getDataBuffer().refCnt(); + +// split and transfer with slice starting at the beginning: this should not allocate anything new +vector.splitAndTransferTo(0, 2, target); +assertEquals(allocator.getAllocatedMemory(), allocatedMem); +// 2 = validity and offset buffers are stored in the same arrowbuf +assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2); Review comment: Not at all, that's a good question. I've improved the assertion's comment in commit e304816f1e88609ce79a18d45c657901489e0b13. Do they make sense ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
kou closed pull request #7429: URL: https://github.com/apache/arrow/pull/7429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
kou commented on pull request #7429: URL: https://github.com/apache/arrow/pull/7429#issuecomment-643698751 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -886,6 +886,65 @@ public void testNullableFixedType4() { * -- VarBinaryVector */ + @Test /* VarCharVector */ + public void testSplitAndTransfer1() { +try (final VarCharVector target = newVarCharVector("split-target", allocator)) { + try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, allocator)) { +vector.allocateNew(1024 * 10, 1024); + +vector.set(0, STR1); +vector.set(1, STR2); +vector.set(2, STR3); +vector.setValueCount(3); + +final long allocatedMem = allocator.getAllocatedMemory(); +final int validityRefCnt = vector.getValidityBuffer().refCnt(); +final int offsetRefCnt = vector.getOffsetBuffer().refCnt(); +final int dataRefCnt = vector.getDataBuffer().refCnt(); + +// split and transfer with slice starting at the beginning: this should not allocate anything new +vector.splitAndTransferTo(0, 2, target); +assertEquals(allocator.getAllocatedMemory(), allocatedMem); +// 2 = validity and offset buffers are stored in the same arrowbuf +assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2); Review comment: Not at all, that's a good question. I've improved the assertion's comment in commit e304816f1e88609ce79a18d45c657901489e0b13. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r439778364 ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -886,6 +886,65 @@ public void testNullableFixedType4() { * -- VarBinaryVector */ + @Test /* VarCharVector */ + public void testSplitAndTransfer1() { +try (final VarCharVector target = newVarCharVector("split-target", allocator)) { + try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, allocator)) { +vector.allocateNew(1024 * 10, 1024); + +vector.set(0, STR1); +vector.set(1, STR2); +vector.set(2, STR3); +vector.setValueCount(3); + +final long allocatedMem = allocator.getAllocatedMemory(); +final int validityRefCnt = vector.getValidityBuffer().refCnt(); +final int offsetRefCnt = vector.getOffsetBuffer().refCnt(); +final int dataRefCnt = vector.getDataBuffer().refCnt(); + +// split and transfer with slice starting at the beginning: this should not allocate anything new +vector.splitAndTransferTo(0, 2, target); +assertEquals(allocator.getAllocatedMemory(), allocatedMem); +// 2 = validity and offset buffers are stored in the same arrowbuf +assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2); Review comment: Of course ;o) I'm used to hamcrest where that is the opposite... ## File path: java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java ## @@ -886,6 +886,65 @@ public void testNullableFixedType4() { * -- VarBinaryVector */ + @Test /* VarCharVector */ + public void testSplitAndTransfer1() { +try (final VarCharVector target = newVarCharVector("split-target", allocator)) { + try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, allocator)) { +vector.allocateNew(1024 * 10, 1024); + +vector.set(0, STR1); +vector.set(1, STR2); +vector.set(2, STR3); +vector.setValueCount(3); + +final long allocatedMem = allocator.getAllocatedMemory(); +final int validityRefCnt = vector.getValidityBuffer().refCnt(); +final int offsetRefCnt = vector.getOffsetBuffer().refCnt(); +final int dataRefCnt = vector.getDataBuffer().refCnt(); + +// split and transfer with slice starting at the beginning: this should not allocate anything new +vector.splitAndTransferTo(0, 2, target); +assertEquals(allocator.getAllocatedMemory(), allocatedMem); +// 2 = validity and offset buffers are stored in the same arrowbuf +assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2); Review comment: Not at all, that's a good question. I've improved the assertion's comment in commit . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
github-actions[bot] commented on pull request #7429: URL: https://github.com/apache/arrow/pull/7429#issuecomment-643694502 https://issues.apache.org/jira/browse/ARROW-9125 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
github-actions[bot] commented on pull request #7429: URL: https://github.com/apache/arrow/pull/7429#issuecomment-643694452 Revision: 8245d4bead7f7c4656de1d11dd2abc98410d8c21 Submitted crossbow builds: [ursa-labs/crossbow @ actions-318](https://github.com/ursa-labs/crossbow/branches/all?query=actions-318) |Task|Status| ||--| |test-conda-cpp-valgrind|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-318-github-test-conda-cpp-valgrind)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-318-github-test-conda-cpp-valgrind)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
kou commented on pull request #7429: URL: https://github.com/apache/arrow/pull/7429#issuecomment-643694248 @github-actions crossbow submit test-conda-cpp-valgrind This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
kou opened a new pull request #7429: URL: https://github.com/apache/arrow/pull/7429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly
kou closed pull request #7427: URL: https://github.com/apache/arrow/pull/7427 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly
kou commented on pull request #7427: URL: https://github.com/apache/arrow/pull/7427#issuecomment-643691429 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
github@arrow.apache.org
github-actions[bot] commented on pull request #7428: URL: https://github.com/apache/arrow/pull/7428#issuecomment-643685126 https://issues.apache.org/jira/browse/ARROW-9124 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] houqp commented on pull request #7400: ARROW-9088: [Rust] Make prettyprint optional
houqp commented on pull request #7400: URL: https://github.com/apache/arrow/pull/7400#issuecomment-643684644 It's odd that a pretty print library would be have target specific issues :P This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
github@arrow.apache.org
houqp opened a new pull request #7428: URL: https://github.com/apache/arrow/pull/7428 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly
github-actions[bot] commented on pull request #7427: URL: https://github.com/apache/arrow/pull/7427#issuecomment-643683947 https://issues.apache.org/jira/browse/ARROW-9123 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly
github-actions[bot] commented on pull request #7427: URL: https://github.com/apache/arrow/pull/7427#issuecomment-643683599 Revision: 145fe31bd773518dcc408e2b46a749d060f6a32a Submitted crossbow builds: [ursa-labs/crossbow @ actions-317](https://github.com/ursa-labs/crossbow/branches/all?query=actions-317) |Task|Status| ||--| |wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux1-cp35m)| |wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux1-cp36m)| |wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux1-cp37m)| |wheel-manylinux1-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux1-cp38)| |wheel-manylinux2010-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2010-cp35m)| |wheel-manylinux2010-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2010-cp36m)| |wheel-manylinux2010-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2010-cp37m)| |wheel-manylinux2010-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2010-cp38)| |wheel-manylinux2014-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2014-cp35m)| |wheel-manylinux2014-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2014-cp36m)| |wheel-manylinux2014-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2014-cp37m)| |wheel-manylinux2014-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-317-azure-wheel-manylinux2014-cp38)| |wheel-osx-cp35m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp35m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp36m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp37m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |wheel-osx-cp38|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp38.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-317-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)| |wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-317-appveyor-wheel-win-cp36m.svg)](https://ci.a
[GitHub] [arrow] kou commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly
kou commented on pull request #7427: URL: https://github.com/apache/arrow/pull/7427#issuecomment-643683436 @github-actions crossbow submit -g wheel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly
kou opened a new pull request #7427: URL: https://github.com/apache/arrow/pull/7427 ARROW_ZSTD_USE_SHARED is introduced by ARROW-9084. We need to set ARROW_ZSTD_USE_SHARED=OFF explicitly to use static zstd library. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7426: ARROW-9116: [C++] Add BaseBinaryArray::total_values_length
github-actions[bot] commented on pull request #7426: URL: https://github.com/apache/arrow/pull/7426#issuecomment-643674572 https://issues.apache.org/jira/browse/ARROW-9116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7426: ARROW-9116: [C++] Add BaseBinaryArray::total_values_length
wesm opened a new pull request #7426: URL: https://github.com/apache/arrow/pull/7426 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7418: ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch
wesm commented on a change in pull request #7418: URL: https://github.com/apache/arrow/pull/7418#discussion_r439765677 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -37,26 +39,108 @@ struct AsciiLength { } }; -struct AsciiUpper { - // XXX: the Scalar codegen path passes template arguments that are unused - template - static std::string Call(KernelContext*, const util::string_view& val) { -std::string result = val.to_string(); -std::transform(result.begin(), result.end(), result.begin(), - [](unsigned char c) { return std::toupper(c); }); -return result; +using TransformFunc = std::function; + +void StringDataTransform(KernelContext* ctx, const ExecBatch& batch, + TransformFunc transform, Datum* out) { + if (batch[0].kind() == Datum::ARRAY) { +const ArrayData& input = *batch[0].array(); +ArrayData* out_arr = out->mutable_array(); +// Reuse offsets from input +out_arr->buffers[1] = input.buffers[1]; +int64_t data_nbytes = input.buffers[2]->size(); +KERNEL_RETURN_IF_ERROR(ctx, ctx->Allocate(data_nbytes).Value(&out_arr->buffers[2])); +transform(input.buffers[2]->data(), data_nbytes, out_arr->buffers[2]->mutable_data()); Review comment: https://issues.apache.org/jira/browse/ARROW-9122 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels
wesm commented on a change in pull request #7417: URL: https://github.com/apache/arrow/pull/7417#discussion_r439765330 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_benchmark.cc ## @@ -67,89 +68,35 @@ static void ArrayArrayKernel(benchmark::State& state) { for (auto _ : state) { ABORT_NOT_OK(Op(lhs, rhs, nullptr).status()); } + state.SetItemsProcessed(state.iterations() * array_size); } void SetArgs(benchmark::internal::Benchmark* bench) { - bench->Unit(benchmark::kMicrosecond); - - for (const auto size : kMemorySizes) { + for (const auto size : {kL1Size, kL2Size}) { Review comment: I have a processor with 22MB L3 cache so generating that much random data is quite expensive. If we want to benchmark arrays that big we should generate a smaller sample of random data and repeat/tile it to make the bigger array. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels
wesm commented on pull request #7417: URL: https://github.com/apache/arrow/pull/7417#issuecomment-643671002 +1. I fixed a few lingering issues that jumped out at me, will merge this once build passes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels
wesm commented on a change in pull request #7417: URL: https://github.com/apache/arrow/pull/7417#discussion_r439765195 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_benchmark.cc ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "benchmark/benchmark.h" + +#include + +#include "arrow/compute/api_scalar.h" +#include "arrow/compute/benchmark_util.h" +#include "arrow/compute/kernels/test_util.h" +#include "arrow/testing/gtest_util.h" +#include "arrow/testing/random.h" + +namespace arrow { +namespace compute { + +constexpr auto kSeed = 0x94378165; + +template +static void AddArrayScalarKernel(benchmark::State& state) { + RegressionArgs args(state); + + const int64_t array_size = args.size / sizeof(CType); + auto min = std::numeric_limits::lowest(); + auto max = std::numeric_limits::max(); + + auto rand = random::RandomArrayGenerator(kSeed); + auto lhs = std::static_pointer_cast>( + rand.Numeric(array_size, min, max, args.null_proportion)); + + for (auto _ : state) { +ABORT_NOT_OK(Add(lhs, Datum(CType(15))).status()); + } +} + +template +static void AddArrayArrayKernel(benchmark::State& state) { + RegressionArgs args(state); + + const int64_t array_size = args.size / sizeof(CType); + auto min = std::numeric_limits::lowest(); + auto max = std::numeric_limits::max(); + + auto rand = random::RandomArrayGenerator(kSeed); + auto lhs = std::static_pointer_cast>( + rand.Numeric(array_size, min, max, args.null_proportion)); + auto rhs = std::static_pointer_cast>( + rand.Numeric(array_size, min, max, args.null_proportion)); + + for (auto _ : state) { +ABORT_NOT_OK(Add(lhs, rhs).status()); + } Review comment: I just did this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed issue #7425: feature request: google cloud storage [GCS] filesystem backend
wesm closed issue #7425: URL: https://github.com/apache/arrow/issues/7425 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on issue #7425: feature request: google cloud storage [GCS] filesystem backend
wesm commented on issue #7425: URL: https://github.com/apache/arrow/issues/7425#issuecomment-643669698 We track our granular roadmap on JIRA, here is the issue that's already open https://issues.apache.org/jira/browse/ARROW-1231 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bobcolner opened a new issue #7425: feature request: google cloud storage [GCS] filesystem backend
bobcolner opened a new issue #7425: URL: https://github.com/apache/arrow/issues/7425 Hi, I'm new. Looking for the correct place to request a google cloud storage [GCS] filesystem backend? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7418: ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch
wesm commented on a change in pull request #7418: URL: https://github.com/apache/arrow/pull/7418#discussion_r439755593 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -37,26 +39,108 @@ struct AsciiLength { } }; -struct AsciiUpper { - // XXX: the Scalar codegen path passes template arguments that are unused - template - static std::string Call(KernelContext*, const util::string_view& val) { -std::string result = val.to_string(); -std::transform(result.begin(), result.end(), result.begin(), - [](unsigned char c) { return std::toupper(c); }); -return result; +using TransformFunc = std::function; + +void StringDataTransform(KernelContext* ctx, const ExecBatch& batch, + TransformFunc transform, Datum* out) { + if (batch[0].kind() == Datum::ARRAY) { +const ArrayData& input = *batch[0].array(); +ArrayData* out_arr = out->mutable_array(); +// Reuse offsets from input +out_arr->buffers[1] = input.buffers[1]; +int64_t data_nbytes = input.buffers[2]->size(); +KERNEL_RETURN_IF_ERROR(ctx, ctx->Allocate(data_nbytes).Value(&out_arr->buffers[2])); +transform(input.buffers[2]->data(), data_nbytes, out_arr->buffers[2]->mutable_data()); Review comment: Good point. It's not accounted for in the unit tests either, can you open a JIRA issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pierrebelzile commented on a change in pull request #7418: ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch
pierrebelzile commented on a change in pull request #7418: URL: https://github.com/apache/arrow/pull/7418#discussion_r439754427 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -37,26 +39,108 @@ struct AsciiLength { } }; -struct AsciiUpper { - // XXX: the Scalar codegen path passes template arguments that are unused - template - static std::string Call(KernelContext*, const util::string_view& val) { -std::string result = val.to_string(); -std::transform(result.begin(), result.end(), result.begin(), - [](unsigned char c) { return std::toupper(c); }); -return result; +using TransformFunc = std::function; + +void StringDataTransform(KernelContext* ctx, const ExecBatch& batch, + TransformFunc transform, Datum* out) { + if (batch[0].kind() == Datum::ARRAY) { +const ArrayData& input = *batch[0].array(); +ArrayData* out_arr = out->mutable_array(); +// Reuse offsets from input +out_arr->buffers[1] = input.buffers[1]; +int64_t data_nbytes = input.buffers[2]->size(); +KERNEL_RETURN_IF_ERROR(ctx, ctx->Allocate(data_nbytes).Value(&out_arr->buffers[2])); +transform(input.buffers[2]->data(), data_nbytes, out_arr->buffers[2]->mutable_data()); Review comment: Isn’t it necessary to consider offset of slice? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #6806: ARROW-8312: [Java][Gandiva] support TreeNode in IN expression
wesm closed pull request #6806: URL: https://github.com/apache/arrow/pull/6806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #6806: ARROW-8312: [Java][Gandiva] support TreeNode in IN expression
wesm commented on pull request #6806: URL: https://github.com/apache/arrow/pull/6806#issuecomment-643624781 Thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels
wesm commented on a change in pull request #7410: URL: https://github.com/apache/arrow/pull/7410#discussion_r439739768 ## File path: cpp/src/arrow/compute/kernels/scalar_validity.cc ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "arrow/compute/kernels/common.h" + +#include "arrow/util/bit_util.h" +#include "arrow/util/bitmap_ops.h" + +namespace arrow { +namespace compute { + +struct IsValid { + static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) { +checked_cast(out)->value = in.is_valid; + } + + static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) { +if (arr.buffers[0] != nullptr && out->offset == arr.offset && +out->length == arr.length) { + out->buffers[1] = arr.buffers[0]; + return; +} Review comment: If you want to do a zero-copy optimization, you need to configure the kernel to use `MemAllocation::COMPUTED_NO_PREALLOCATE`. This also makes it so you don't have to check the offset and length. You will however have to allocate memory when the zero copy path is not possible This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org