[GitHub] [arrow] kou closed pull request #7431: ARROW-9116: [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length

2020-06-13 Thread GitBox


kou closed pull request #7431:
URL: https://github.com/apache/arrow/pull/7431


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] sergeyt commented on pull request #7400: ARROW-9088: [Rust] Make prettyprint optional

2020-06-13 Thread GitBox


sergeyt commented on pull request #7400:
URL: https://github.com/apache/arrow/pull/7400#issuecomment-643713603


   @houqp `prettytable-rs` depends on `term` which depends on `dirs` which is 
causes error.
   
   actually `dirs` has [a code for 
wasm32](https://github.com/dirs-dev/dirs-rs/blob/master/src/wasm.rs), but it 
seems it is broken



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7431: ARROW-9116: [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7431:
URL: https://github.com/apache/arrow/pull/7431#issuecomment-643710945


   https://issues.apache.org/jira/browse/ARROW-9116



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7431: ARROW-9116: [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length

2020-06-13 Thread GitBox


wesm opened a new pull request #7431:
URL: https://github.com/apache/arrow/pull/7431


   This also protects against the offsets buffer being null. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7430:
URL: https://github.com/apache/arrow/pull/7430#issuecomment-643703705


   https://issues.apache.org/jira/browse/ARROW-9126



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ctring opened a new pull request #7430: ARROW-9126: [C++] Fix building trimmed Boost bundle on Windows

2020-06-13 Thread GitBox


ctring opened a new pull request #7430:
URL: https://github.com/apache/arrow/pull/7430


   In Linux, the configuration step of Boost allows selecting a subset of the 
libraries being built using the `--with-libraries=` flag. However, this flag 
does not exist in the configure script on Windows, so now we're building all 
libraries, including those being trimmed:
   ```
   Component configuration:
   
- atomic : building
- chrono : building
- container : building
- date_time : building
- exception : building
- filesystem : building
- headers : building
- iostreams : building
- locale : building
- log : building
- mpi : building
- program_options : building
- python : building
- random : building
- regex : building
- serialization : building
- system : building
- test : building
- thread : building
- timer : building
- wave : building
   ```
   
   To select only the necessary libraries on Windows, we can provide the 
`--with-` flag to `./b2`. Doing so helped getting me pass the boost 
building step but the thrift building step failed. This time, it is still due 
to a header file `typeof/incr_registration_group.hpp` missing in Boost (this 
file is only required on Windows but not on Linux); thus, added that header 
file to the `trim-boost.sh` script. 
   
   I verified that this can now build on both Windows and Linux with the 
commands:
   ```
   mkdir build
   cd build
   cmake .. -DARROW_PARQUET=ON
   cmake --build .
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit 2e13c5239137e54b0cd128240e6a3d0895c0c36f. Do they make sense ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439780145



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   > If the testSplitAndTransfer tests were run with two child buffer 
allocators I think they would fail.
   
   @rymurr there was indeed a problem in that case where the sliced data was 
not transferred to the target allocator. See 
681327e22f6dbda71b78e408b9fee6a8af0040b0 for a test about that.
   
   > A clearer/more complete optimization would probably to simply read the 
initial starting offset at the index that we're slicing from and if that is 
zero, don't do the allocate/copy
   
   @jacques-n I have extended the condition to include also that case of 
empties/nulls. See 2e13c5239137e54b0cd128240e6a3d0895c0c36f





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-06-13 Thread GitBox


scampi commented on pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#issuecomment-643699840


   @wesm Thanks all for the comments. Please have a look again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439780145



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   > If the testSplitAndTransfer tests were run with two child buffer 
allocators I think they would fail.
   
   @rymurr there was indeed a problem in that case where the sliced data was 
not transferred to the target allocator. See 
da7e3eacaf021580b4aae0f17492c17dbbef44a0 for a test about that.
   
   > A clearer/more complete optimization would probably to simply read the 
initial starting offset at the index that we're slicing from and if that is 
zero, don't do the allocate/copy
   
   @jacques-n I have extended the condition to include also that case of 
empties/nulls. See e304816f1e88609ce79a18d45c657901489e0b13





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7426: ARROW-9116: [C++] Add BaseBinaryArray::total_values_length

2020-06-13 Thread GitBox


kou closed pull request #7426:
URL: https://github.com/apache/arrow/pull/7426


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit e304816f1e88609ce79a18d45c657901489e0b13. Do they make sense ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread GitBox


kou closed pull request #7429:
URL: https://github.com/apache/arrow/pull/7429


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread GitBox


kou commented on pull request #7429:
URL: https://github.com/apache/arrow/pull/7429#issuecomment-643698751


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778366



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit e304816f1e88609ce79a18d45c657901489e0b13.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-13 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439778364



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Of course ;o) I'm used to hamcrest where that is the opposite...

##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestValueVector.java
##
@@ -886,6 +886,65 @@ public void testNullableFixedType4() {
*  -- VarBinaryVector
*/
 
+  @Test /* VarCharVector */
+  public void testSplitAndTransfer1() {
+try (final VarCharVector target = newVarCharVector("split-target", 
allocator)) {
+  try (final VarCharVector vector = newVarCharVector(EMPTY_SCHEMA_PATH, 
allocator)) {
+vector.allocateNew(1024 * 10, 1024);
+
+vector.set(0, STR1);
+vector.set(1, STR2);
+vector.set(2, STR3);
+vector.setValueCount(3);
+
+final long allocatedMem = allocator.getAllocatedMemory();
+final int validityRefCnt = vector.getValidityBuffer().refCnt();
+final int offsetRefCnt = vector.getOffsetBuffer().refCnt();
+final int dataRefCnt = vector.getDataBuffer().refCnt();
+
+// split and transfer with slice starting at the beginning: this 
should not allocate anything new
+vector.splitAndTransferTo(0, 2, target);
+assertEquals(allocator.getAllocatedMemory(), allocatedMem);
+// 2 = validity and offset buffers are stored in the same arrowbuf
+assertEquals(vector.getValidityBuffer().refCnt(), validityRefCnt + 2);

Review comment:
   Not at all, that's a good question. I've improved the assertion's 
comment in commit .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7429:
URL: https://github.com/apache/arrow/pull/7429#issuecomment-643694502


   https://issues.apache.org/jira/browse/ARROW-9125



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7429:
URL: https://github.com/apache/arrow/pull/7429#issuecomment-643694452


   Revision: 8245d4bead7f7c4656de1d11dd2abc98410d8c21
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-318](https://github.com/ursa-labs/crossbow/branches/all?query=actions-318)
   
   |Task|Status|
   ||--|
   |test-conda-cpp-valgrind|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-318-github-test-conda-cpp-valgrind)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-318-github-test-conda-cpp-valgrind)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread GitBox


kou commented on pull request #7429:
URL: https://github.com/apache/arrow/pull/7429#issuecomment-643694248


   @github-actions crossbow submit test-conda-cpp-valgrind



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7429: ARROW-9125: [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread GitBox


kou opened a new pull request #7429:
URL: https://github.com/apache/arrow/pull/7429


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread GitBox


kou closed pull request #7427:
URL: https://github.com/apache/arrow/pull/7427


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread GitBox


kou commented on pull request #7427:
URL: https://github.com/apache/arrow/pull/7427#issuecomment-643691429


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7428: ARROW-9124: [Rust][Datafusion] optimize DFParser::parse_sql to take query string as

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7428:
URL: https://github.com/apache/arrow/pull/7428#issuecomment-643685126


   https://issues.apache.org/jira/browse/ARROW-9124



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on pull request #7400: ARROW-9088: [Rust] Make prettyprint optional

2020-06-13 Thread GitBox


houqp commented on pull request #7400:
URL: https://github.com/apache/arrow/pull/7400#issuecomment-643684644


   It's odd that a pretty print library would be have target specific issues :P



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp opened a new pull request #7428: ARROW-9124: [Rust][Datafusion] optimize DFParser::parse_sql to take query string as

2020-06-13 Thread GitBox


houqp opened a new pull request #7428:
URL: https://github.com/apache/arrow/pull/7428


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7427:
URL: https://github.com/apache/arrow/pull/7427#issuecomment-643683947


   https://issues.apache.org/jira/browse/ARROW-9123



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7427:
URL: https://github.com/apache/arrow/pull/7427#issuecomment-643683599


   Revision: 145fe31bd773518dcc408e2b46a749d060f6a32a
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-317](https://github.com/ursa-labs/crossbow/branches/all?query=actions-317)
   
   |Task|Status|
   ||--|
   
|wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux1-cp35m)|
   
|wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux1-cp36m)|
   
|wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux1-cp37m)|
   
|wheel-manylinux1-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux1-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux1-cp38)|
   
|wheel-manylinux2010-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2010-cp35m)|
   
|wheel-manylinux2010-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2010-cp36m)|
   
|wheel-manylinux2010-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2010-cp37m)|
   
|wheel-manylinux2010-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2010-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2010-cp38)|
   
|wheel-manylinux2014-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2014-cp35m)|
   
|wheel-manylinux2014-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2014-cp36m)|
   
|wheel-manylinux2014-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2014-cp37m)|
   
|wheel-manylinux2014-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-317-azure-wheel-manylinux2014-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-317-azure-wheel-manylinux2014-cp38)|
   
|wheel-osx-cp35m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp35m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp36m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp37m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp38|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-317-travis-wheel-osx-cp38.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-317-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-317-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   

[GitHub] [arrow] kou commented on pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread GitBox


kou commented on pull request #7427:
URL: https://github.com/apache/arrow/pull/7427#issuecomment-643683436


   @github-actions crossbow submit -g wheel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7427: ARROW-9123: [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread GitBox


kou opened a new pull request #7427:
URL: https://github.com/apache/arrow/pull/7427


   ARROW_ZSTD_USE_SHARED is introduced by ARROW-9084. We need to set
   ARROW_ZSTD_USE_SHARED=OFF explicitly to use static zstd library.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7426: ARROW-9116: [C++] Add BaseBinaryArray::total_values_length

2020-06-13 Thread GitBox


github-actions[bot] commented on pull request #7426:
URL: https://github.com/apache/arrow/pull/7426#issuecomment-643674572


   https://issues.apache.org/jira/browse/ARROW-9116



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm opened a new pull request #7426: ARROW-9116: [C++] Add BaseBinaryArray::total_values_length

2020-06-13 Thread GitBox


wesm opened a new pull request #7426:
URL: https://github.com/apache/arrow/pull/7426


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7418: ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch

2020-06-13 Thread GitBox


wesm commented on a change in pull request #7418:
URL: https://github.com/apache/arrow/pull/7418#discussion_r439765677



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -37,26 +39,108 @@ struct AsciiLength {
   }
 };
 
-struct AsciiUpper {
-  // XXX: the Scalar codegen path passes template arguments that are unused
-  template 
-  static std::string Call(KernelContext*, const util::string_view& val) {
-std::string result = val.to_string();
-std::transform(result.begin(), result.end(), result.begin(),
-   [](unsigned char c) { return std::toupper(c); });
-return result;
+using TransformFunc = std::function;
+
+void StringDataTransform(KernelContext* ctx, const ExecBatch& batch,
+ TransformFunc transform, Datum* out) {
+  if (batch[0].kind() == Datum::ARRAY) {
+const ArrayData& input = *batch[0].array();
+ArrayData* out_arr = out->mutable_array();
+// Reuse offsets from input
+out_arr->buffers[1] = input.buffers[1];
+int64_t data_nbytes = input.buffers[2]->size();
+KERNEL_RETURN_IF_ERROR(ctx, 
ctx->Allocate(data_nbytes).Value(_arr->buffers[2]));
+transform(input.buffers[2]->data(), data_nbytes, 
out_arr->buffers[2]->mutable_data());

Review comment:
   https://issues.apache.org/jira/browse/ARROW-9122





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels

2020-06-13 Thread GitBox


wesm commented on a change in pull request #7417:
URL: https://github.com/apache/arrow/pull/7417#discussion_r439765330



##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_benchmark.cc
##
@@ -67,89 +68,35 @@ static void ArrayArrayKernel(benchmark::State& state) {
   for (auto _ : state) {
 ABORT_NOT_OK(Op(lhs, rhs, nullptr).status());
   }
+  state.SetItemsProcessed(state.iterations() * array_size);
 }
 
 void SetArgs(benchmark::internal::Benchmark* bench) {
-  bench->Unit(benchmark::kMicrosecond);
-
-  for (const auto size : kMemorySizes) {
+  for (const auto size : {kL1Size, kL2Size}) {

Review comment:
   I have a processor with 22MB L3 cache so generating that much random 
data is quite expensive. If we want to benchmark arrays that big we should 
generate a smaller sample of random data and repeat/tile it to make the bigger 
array. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels

2020-06-13 Thread GitBox


wesm commented on pull request #7417:
URL: https://github.com/apache/arrow/pull/7417#issuecomment-643671002


   +1. I fixed a few lingering issues that jumped out at me, will merge this 
once build passes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels

2020-06-13 Thread GitBox


wesm commented on a change in pull request #7417:
URL: https://github.com/apache/arrow/pull/7417#discussion_r439765195



##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_benchmark.cc
##
@@ -0,0 +1,92 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "benchmark/benchmark.h"
+
+#include 
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/benchmark_util.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+
+namespace arrow {
+namespace compute {
+
+constexpr auto kSeed = 0x94378165;
+
+template 
+static void AddArrayScalarKernel(benchmark::State& state) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto min = std::numeric_limits::lowest();
+  auto max = std::numeric_limits::max();
+
+  auto rand = random::RandomArrayGenerator(kSeed);
+  auto lhs = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+
+  for (auto _ : state) {
+ABORT_NOT_OK(Add(lhs, Datum(CType(15))).status());
+  }
+}
+
+template 
+static void AddArrayArrayKernel(benchmark::State& state) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto min = std::numeric_limits::lowest();
+  auto max = std::numeric_limits::max();
+
+  auto rand = random::RandomArrayGenerator(kSeed);
+  auto lhs = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+  auto rhs = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+
+  for (auto _ : state) {
+ABORT_NOT_OK(Add(lhs, rhs).status());
+  }

Review comment:
   I just did this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed issue #7425: feature request: google cloud storage [GCS] filesystem backend

2020-06-13 Thread GitBox


wesm closed issue #7425:
URL: https://github.com/apache/arrow/issues/7425


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #7425: feature request: google cloud storage [GCS] filesystem backend

2020-06-13 Thread GitBox


wesm commented on issue #7425:
URL: https://github.com/apache/arrow/issues/7425#issuecomment-643669698


   We track our granular roadmap on JIRA, here is the issue that's already open
   
   https://issues.apache.org/jira/browse/ARROW-1231



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bobcolner opened a new issue #7425: feature request: google cloud storage [GCS] filesystem backend

2020-06-13 Thread GitBox


bobcolner opened a new issue #7425:
URL: https://github.com/apache/arrow/issues/7425


   Hi, I'm new. 
   
   Looking for the correct place to request a google cloud storage [GCS] 
filesystem backend?
   
   Thanks!
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7418: ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch

2020-06-13 Thread GitBox


wesm commented on a change in pull request #7418:
URL: https://github.com/apache/arrow/pull/7418#discussion_r439755593



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -37,26 +39,108 @@ struct AsciiLength {
   }
 };
 
-struct AsciiUpper {
-  // XXX: the Scalar codegen path passes template arguments that are unused
-  template 
-  static std::string Call(KernelContext*, const util::string_view& val) {
-std::string result = val.to_string();
-std::transform(result.begin(), result.end(), result.begin(),
-   [](unsigned char c) { return std::toupper(c); });
-return result;
+using TransformFunc = std::function;
+
+void StringDataTransform(KernelContext* ctx, const ExecBatch& batch,
+ TransformFunc transform, Datum* out) {
+  if (batch[0].kind() == Datum::ARRAY) {
+const ArrayData& input = *batch[0].array();
+ArrayData* out_arr = out->mutable_array();
+// Reuse offsets from input
+out_arr->buffers[1] = input.buffers[1];
+int64_t data_nbytes = input.buffers[2]->size();
+KERNEL_RETURN_IF_ERROR(ctx, 
ctx->Allocate(data_nbytes).Value(_arr->buffers[2]));
+transform(input.buffers[2]->data(), data_nbytes, 
out_arr->buffers[2]->mutable_data());

Review comment:
   Good point. It's not accounted for in the unit tests either, can you 
open a JIRA issue?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pierrebelzile commented on a change in pull request #7418: ARROW-9115: [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch

2020-06-13 Thread GitBox


pierrebelzile commented on a change in pull request #7418:
URL: https://github.com/apache/arrow/pull/7418#discussion_r439754427



##
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##
@@ -37,26 +39,108 @@ struct AsciiLength {
   }
 };
 
-struct AsciiUpper {
-  // XXX: the Scalar codegen path passes template arguments that are unused
-  template 
-  static std::string Call(KernelContext*, const util::string_view& val) {
-std::string result = val.to_string();
-std::transform(result.begin(), result.end(), result.begin(),
-   [](unsigned char c) { return std::toupper(c); });
-return result;
+using TransformFunc = std::function;
+
+void StringDataTransform(KernelContext* ctx, const ExecBatch& batch,
+ TransformFunc transform, Datum* out) {
+  if (batch[0].kind() == Datum::ARRAY) {
+const ArrayData& input = *batch[0].array();
+ArrayData* out_arr = out->mutable_array();
+// Reuse offsets from input
+out_arr->buffers[1] = input.buffers[1];
+int64_t data_nbytes = input.buffers[2]->size();
+KERNEL_RETURN_IF_ERROR(ctx, 
ctx->Allocate(data_nbytes).Value(_arr->buffers[2]));
+transform(input.buffers[2]->data(), data_nbytes, 
out_arr->buffers[2]->mutable_data());

Review comment:
   Isn’t it necessary to consider offset of slice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #6806: ARROW-8312: [Java][Gandiva] support TreeNode in IN expression

2020-06-13 Thread GitBox


wesm closed pull request #6806:
URL: https://github.com/apache/arrow/pull/6806


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #6806: ARROW-8312: [Java][Gandiva] support TreeNode in IN expression

2020-06-13 Thread GitBox


wesm commented on pull request #6806:
URL: https://github.com/apache/arrow/pull/6806#issuecomment-643624781


   Thank you



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-13 Thread GitBox


wesm commented on a change in pull request #7410:
URL: https://github.com/apache/arrow/pull/7410#discussion_r439739768



##
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/compute/kernels/common.h"
+
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_ops.h"
+
+namespace arrow {
+namespace compute {
+
+struct IsValid {
+  static void Call(KernelContext* ctx, const Scalar& in, Scalar* out) {
+checked_cast(out)->value = in.is_valid;
+  }
+
+  static void Call(KernelContext* ctx, const ArrayData& arr, ArrayData* out) {
+if (arr.buffers[0] != nullptr && out->offset == arr.offset &&
+out->length == arr.length) {
+  out->buffers[1] = arr.buffers[0];
+  return;
+}

Review comment:
   If you want to do a zero-copy optimization, you need to configure the 
kernel to use  `MemAllocation::COMPUTED_NO_PREALLOCATE`. This also makes it so 
you don't have to check the offset and length. You will however have to 
allocate memory when the zero copy path is not possible





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org