[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7876: ARROW-9615: [Rust] Added kernel to compute length of a string.

2020-08-11 Thread GitBox


jorgecarleitao commented on a change in pull request #7876:
URL: https://github.com/apache/arrow/pull/7876#discussion_r469002405



##
File path: rust/arrow/src/compute/kernels/length.rs
##
@@ -0,0 +1,185 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines kernel for length of a string array
+
+use crate::array::*;
+use crate::{
+datatypes::DataType,
+datatypes::UInt32Type,
+error::{ArrowError, Result},
+};
+use std::sync::Arc;
+
+/// Returns an array of UInt32 denoting the number of characters in each 
string in the array.
+///
+/// * this only accepts StringArray
+/// * lenght of null is null.
+/// * length is in number of bytes
+pub fn length(array: ) -> Result {
+match array.data_type() {
+DataType::Utf8 => {
+// note: offsets are stored as u8, but they can be interpreted as 
u32
+let offsets = array.data_ref().clone().buffers()[0].clone();
+// this is a 30% improvement over iterating over u8s and building 
u32

Review comment:
   I kept this here as a justification for using `unsafe`. By making it 
explicit in a comment, an auditor of this code will have an easier time 
understand why the unsafe is being considered in this particular case, instead 
of having to spend the time trying to refactor out the `unsafe` only to find a 
performance drop.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #7879: ARROW-9618: [Rust] [DataFusion] Made it easier to write optimizers

2020-08-11 Thread GitBox


jorgecarleitao commented on pull request #7879:
URL: https://github.com/apache/arrow/pull/7879#issuecomment-672568598


   Thanks, @andygrove . This is now rebased.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7879: ARROW-9618: [Rust] [DataFusion] Made it easier to write optimizers

2020-08-11 Thread GitBox


andygrove commented on pull request #7879:
URL: https://github.com/apache/arrow/pull/7879#issuecomment-672564651


   I did a quick review of this PR and it looks good I think. @jorgecarleitao 
would you mind rebasing it and I'll find time this week for a more thorough 
review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7797: ARROW-4189 [Rust] Added coverage report.

2020-08-11 Thread GitBox


andygrove commented on pull request #7797:
URL: https://github.com/apache/arrow/pull/7797#issuecomment-672564300


   @jorgecarleitao So I saw that #7799 was merged. What does that mean for this 
PR now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-08-11 Thread GitBox


liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468987274



##
File path: java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java
##
@@ -76,6 +97,10 @@ private void appendNodes(FieldVector vector, 
List nodes, List

[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-08-11 Thread GitBox


liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468987159



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtil.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * Utilities for data compression/decompression.
+ */
+public class CompressionUtil {
+
+  private CompressionUtil() {
+  }
+
+  /**
+   * Creates the {@link ArrowBodyCompression} object, given the {@link 
CompressionCodec}.
+   * The implementation of this method should depend on the values of {@link 
CompressionType#names}.
+   */
+  public static ArrowBodyCompression createBodyCompression(CompressionCodec 
codec) {
+switch (codec.getCodecName()) {
+  case "default":
+return DefaultCompressionCodec.DEFAULT_BODY_COMPRESSION;
+  case "LZ4_FRAME":
+return new ArrowBodyCompression((byte) 0, 
BodyCompressionMethod.BUFFER);

Review comment:
   Changed to CompressionType.LZ4_FRAME and CompressionType.ZSTD, 
respectively. Thanks for your kind reminder. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-08-11 Thread GitBox


liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468986947



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
##
@@ -408,11 +408,15 @@ public static ArrowRecordBatch 
deserializeRecordBatch(RecordBatch recordBatchFB,
   ArrowBuf vectorBuffer = body.slice(bufferFB.offset(), bufferFB.length());
   buffers.add(vectorBuffer);
 }
+
+ArrowBodyCompression bodyCompression =
+new ArrowBodyCompression(recordBatchFB.compression().codec(), 
recordBatchFB.compression().method());

Review comment:
   According to our current implementation. The compression object cannot 
be null.
   For the sake of safety, we added check here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-08-11 Thread GitBox


liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468986440



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/compression/DefaultCompressionCodec.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * The default compression codec that does no compression.
+ */
+public class DefaultCompressionCodec implements CompressionCodec {
+
+  public static final DefaultCompressionCodec INSTANCE = new 
DefaultCompressionCodec();
+
+  public static final byte COMPRESSION_TYPE = -1;
+
+  public static final ArrowBodyCompression DEFAULT_BODY_COMPRESSION =
+  new ArrowBodyCompression(COMPRESSION_TYPE, BodyCompressionMethod.BUFFER);
+
+  private DefaultCompressionCodec() {
+  }
+
+  @Override
+  public ArrowBuf compress(BufferAllocator allocator, ArrowBuf 
unCompressedBuffer) {

Review comment:
   I agree with your point. This makes it easier to incorporate the codec 
into the framework. 
   Added comments in the JavaDoc. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-08-11 Thread GitBox


liyafan82 commented on a change in pull request #7326:
URL: https://github.com/apache/arrow/pull/7326#discussion_r468986032



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/compression/DefaultCompressionCodec.java
##
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.compression;
+
+import org.apache.arrow.flatbuf.BodyCompressionMethod;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.message.ArrowBodyCompression;
+
+/**
+ * The default compression codec that does no compression.
+ */
+public class DefaultCompressionCodec implements CompressionCodec {

Review comment:
   Agreed and accepted. NoCompressionCodec is a better name. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7930: ARROW-9691: [Rust] [DataFusion] Make sql_statement_to_plan method public

2020-08-11 Thread GitBox


andygrove closed pull request #7930:
URL: https://github.com/apache/arrow/pull/7930


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7935: ARROW-9696: [Rust] [DataFusion] fix nested binary expressions

2020-08-11 Thread GitBox


andygrove closed pull request #7935:
URL: https://github.com/apache/arrow/pull/7935


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7924: ARROW-9653: [Rust][DataFusion] Do not error in planner with SQL has multiple group by expressions

2020-08-11 Thread GitBox


andygrove closed pull request #7924:
URL: https://github.com/apache/arrow/pull/7924


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7936: ARROW-9679: [Rust] [DataFusion] More efficient creation of final batch from HashAggregateExec [DRAFT]

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7936:
URL: https://github.com/apache/arrow/pull/7936#issuecomment-672408880


   https://issues.apache.org/jira/browse/ARROW-9679



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7936: ARROW-9679: [Rust] [DataFusion] More efficient creation of final batch from HashAggregateExec [DRAFT]

2020-08-11 Thread GitBox


andygrove commented on pull request #7936:
URL: https://github.com/apache/arrow/pull/7936#issuecomment-672393908


   @alamb @jorgecarleitao @houqp fyi, since you've all been contributing to 
DataFusion lately



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove opened a new pull request #7936: ARROW-9679: [Rust] [DataFusion] More efficient creation of final batch from HashAggregateExec [DRAFT]

2020-08-11 Thread GitBox


andygrove opened a new pull request #7936:
URL: https://github.com/apache/arrow/pull/7936


   This isn't quite there yet. I hope to wrap this up tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn edited a comment on pull request #7898: ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices

2020-08-11 Thread GitBox


mrkn edited a comment on pull request #7898:
URL: https://github.com/apache/arrow/pull/7898#issuecomment-672392348


   @pitrou @wesm I found that `DictionaryBuilderCase` rejects that the value 
type is `NullType` though there is the specialization of 
`DictionaryBuilderBase` with `T=NullType`.  Is this intentional?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on pull request #7898: ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices

2020-08-11 Thread GitBox


mrkn commented on pull request #7898:
URL: https://github.com/apache/arrow/pull/7898#issuecomment-672392348


   @pitrou @wesm I found that DictionaryBuilderCase rejects that the value type 
is NullType though there is the specialization of DictionaryBuilderBase with 
T=NullType.  Is this intentional?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7935: ARROW-9696: [Rust] [DataFusion] fix nested binary expressions

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7935:
URL: https://github.com/apache/arrow/pull/7935#issuecomment-672284943


   https://issues.apache.org/jira/browse/ARROW-9696



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mcassels opened a new pull request #7935: ARROW-9696: [Rust] [DataFusion] fix nested binary expressions

2020-08-11 Thread GitBox


mcassels opened a new pull request #7935:
URL: https://github.com/apache/arrow/pull/7935


   Nested binary expressions like `(a+b)/2` were previously supported and were 
broken by the upgrade to sqlparser 0.6.1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7934: ARROW-9695: Improve comments on LogicalPlan enum variants

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7934:
URL: https://github.com/apache/arrow/pull/7934#issuecomment-672264125


   https://issues.apache.org/jira/browse/ARROW-9695



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb opened a new pull request #7934: ARROW-9695: Improve comments on LogicalPlan enum variants

2020-08-11 Thread GitBox


alamb opened a new pull request #7934:
URL: https://github.com/apache/arrow/pull/7934


   While reviewing the DataFusion code more (thanks again for this great piece 
of work) I have some small suggestions on improving the documentation that I 
would have personally appreciated while reading the code. I wanted to 
contribute them back



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7929: ARROW-9602: [R] Improve cmake detection in Linux build

2020-08-11 Thread GitBox


nealrichardson closed pull request #7929:
URL: https://github.com/apache/arrow/pull/7929


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou closed pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


pitrou closed pull request #7931:
URL: https://github.com/apache/arrow/pull/7931


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


pitrou commented on pull request #7931:
URL: https://github.com/apache/arrow/pull/7931#issuecomment-672166964


   +1, will merge



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-672112334


   > @corleyma Can you apply this patch? 
https://gist.github.com/pitrou/0dc970a2238b9c19b5f8fb991d2fb8f7
   
   Whoops, didn't notice you were playing in this fork too.  I'll quit being 
destructive and let you guys squash on merge.  Patch applied.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7933: [Release] Cherry-pick commits to 1.0.x maintenance branch

2020-08-11 Thread GitBox


kszucs commented on pull request #7933:
URL: https://github.com/apache/arrow/pull/7933#issuecomment-672077311


   I was using `-X theirs` during the cherry-pick which I'm going to remove to 
have explicit resolution about possible cherry-pick conflicts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7933: [Release] Cherry-pick commits to 1.0.x maintenance branch

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7933:
URL: https://github.com/apache/arrow/pull/7933#issuecomment-672062600


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #7933: [Release] Cherry-pick commits to 1.0.x maintenance branch

2020-08-11 Thread GitBox


kszucs opened a new pull request #7933:
URL: https://github.com/apache/arrow/pull/7933


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7932: [Developer] Refactor archery release scripts [WIP]

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7932:
URL: https://github.com/apache/arrow/pull/7932#issuecomment-672039769


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #7932: [Developer] Refactor archery release scripts [WIP]

2020-08-11 Thread GitBox


kszucs opened a new pull request #7932:
URL: https://github.com/apache/arrow/pull/7932


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou closed pull request #7920: ARROW-9638: [C++][Compute] Implement mode kernel

2020-08-11 Thread GitBox


pitrou closed pull request #7920:
URL: https://github.com/apache/arrow/pull/7920


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

2020-08-11 Thread GitBox


pitrou commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-672026104


   This PR is definitely a bit ad hoc, but it's required to fix the observed 
regression. I think we need to find a strategy later on for dealing with 
manipulations of GPU-located arrays. I hope that NVidia can invest some design 
and discussion time in that.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7909: ARROW-9659: [C++] Fix RecordBatchStreamReader when source is CudaBufferReader

2020-08-11 Thread GitBox


pitrou commented on pull request #7909:
URL: https://github.com/apache/arrow/pull/7909#issuecomment-672024248


   I rebased and added some changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7931:
URL: https://github.com/apache/arrow/pull/7931#issuecomment-672007067


   Revision: 8175df5754f677e213629e360e20611a492b9921
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-478](https://github.com/ursa-labs/crossbow/branches/all?query=actions-478)
   
   |Task|Status|
   ||--|
   
|wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux1-cp35m)|
   
|wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux1-cp36m)|
   
|wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux1-cp37m)|
   
|wheel-manylinux1-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux1-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux1-cp38)|
   
|wheel-manylinux2010-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2010-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2010-cp35m)|
   
|wheel-manylinux2010-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2010-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2010-cp36m)|
   
|wheel-manylinux2010-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2010-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2010-cp37m)|
   
|wheel-manylinux2010-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2010-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2010-cp38)|
   
|wheel-manylinux2014-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2014-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2014-cp35m)|
   
|wheel-manylinux2014-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2014-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2014-cp36m)|
   
|wheel-manylinux2014-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2014-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2014-cp37m)|
   
|wheel-manylinux2014-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-478-azure-wheel-manylinux2014-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-478-azure-wheel-manylinux2014-cp38)|
   
|wheel-osx-cp35m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-478-travis-wheel-osx-cp35m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-478-travis-wheel-osx-cp36m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-478-travis-wheel-osx-cp37m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp38|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-478-travis-wheel-osx-cp38.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-478-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-478-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   

[GitHub] [arrow] mrkn commented on a change in pull request #7898: ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices

2020-08-11 Thread GitBox


mrkn commented on a change in pull request #7898:
URL: https://github.com/apache/arrow/pull/7898#discussion_r468657312



##
File path: cpp/src/arrow/array/builder_dict.h
##
@@ -409,6 +409,10 @@ class DictionaryBuilder : public 
internal::DictionaryBuilderBase;
   using BASE::BASE;
 
+  Status ExpandIndexByteWidth(uint8_t new_index_byte_width) {
+return BASE::indices_builder_.ExpandIntSize(new_index_byte_width);

Review comment:
   I see. I try to rewrite this way.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


pitrou commented on pull request #7931:
URL: https://github.com/apache/arrow/pull/7931#issuecomment-672005624


   @github-actions crossbow submit -g wheel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mrkn commented on a change in pull request #7898: ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices

2020-08-11 Thread GitBox


mrkn commented on a change in pull request #7898:
URL: https://github.com/apache/arrow/pull/7898#discussion_r468656539



##
File path: cpp/src/arrow/array/array_dict_test.cc
##
@@ -22,6 +22,8 @@
 #include 
 #include 
 
+#include 

Review comment:
   This should be removed before committing but I forgot.
   Thanks for catching this!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


nealrichardson commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-672004398


   > If I disable the arrow unity build, then it succeeds in linking aws-sdk, 
but then it fails building the vendored jemalloc. If I disable both jemalloc 
and unity-build, then everything works for me.
   
   Where do you see that?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


nealrichardson commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-672004049


   In https://travis-ci.org/github/ursa-labs/crossbow/builds/716930465 it seems 
that cmake can't find aws-c-common, though I see it is downloaded, so I wonder 
if we're hitting that cmake issue I referenced yesterday @jeroen.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #7930: ARROW-9691: [Rust] [DataFusion] Make sql_statement_to_plan method public

2020-08-11 Thread GitBox


jorgecarleitao commented on pull request #7930:
URL: https://github.com/apache/arrow/pull/7930#issuecomment-672001189


   LGTM!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #7930: ARROW-9691: [Rust] [DataFusion] Make sql_statement_to_plan method public

2020-08-11 Thread GitBox


andygrove commented on pull request #7930:
URL: https://github.com/apache/arrow/pull/7930#issuecomment-671999147


   @alamb @jorgecarleitao fyi



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7918: ARROW-9521: [Rust][DataFusion] Handle custom CSV file extensions

2020-08-11 Thread GitBox


andygrove closed pull request #7918:
URL: https://github.com/apache/arrow/pull/7918


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove closed pull request #7925: ARROW-9683: [Rust][DataFusion] Add debug printing to physical plans and associated types

2020-08-11 Thread GitBox


andygrove closed pull request #7925:
URL: https://github.com/apache/arrow/pull/7925


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jeroen commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


jeroen commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671994419


   If I disable the arrow unity build, then it succeeds in linking aws-sdk, but 
then it fails building the vendored jemalloc. If I disable both jemalloc and 
unity-build, then everything works for me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7931:
URL: https://github.com/apache/arrow/pull/7931#issuecomment-671982553


   https://issues.apache.org/jira/browse/ARROW-9692



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7929: ARROW-9602: [R] Improve cmake detection in Linux build

2020-08-11 Thread GitBox


nealrichardson commented on a change in pull request #7929:
URL: https://github.com/apache/arrow/pull/7929#discussion_r468626923



##
File path: r/tools/linuxlibs.R
##
@@ -315,8 +315,13 @@ build_libarrow <- function(src_dir, dst_dir) {
 }
 
 ensure_cmake <- function() {
-  cmake <- Sys.which("cmake")
-  if (!nzchar(cmake) || cmake_version() < 3.2) {
+  cmake <- find_cmake(c(
+Sys.getenv("CMAKE"),
+Sys.which("cmake"),
+Sys.which("cmake3")
+  ))
+
+  if (is.null(cmake)) {
 # If not found, download it
 cat(" cmake\n")
 CMAKE_VERSION <- Sys.getenv("CMAKE_VERSION", "3.16.2")

Review comment:
   Right, will do. Can't remember where I got that version number from 
before--probably one of our CI jobs.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7931:
URL: https://github.com/apache/arrow/pull/7931#issuecomment-671980613


   Revision: 8175df5754f677e213629e360e20611a492b9921
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-477](https://github.com/ursa-labs/crossbow/branches/all?query=actions-477)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.6|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.6)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.6)|
   |test-conda-python-3.6-pandas-0.23|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.6-pandas-0.23)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.6-pandas-0.23)|
   |test-conda-python-3.7|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7)|
   |test-conda-python-3.7-dask-latest|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-dask-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-dask-latest)|
   |test-conda-python-3.7-hdfs-2.9.2|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-hdfs-2.9.2)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-hdfs-2.9.2)|
   |test-conda-python-3.7-kartothek-latest|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-kartothek-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-kartothek-latest)|
   |test-conda-python-3.7-kartothek-master|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-kartothek-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-kartothek-master)|
   |test-conda-python-3.7-pandas-latest|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-pandas-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-pandas-latest)|
   |test-conda-python-3.7-pandas-master|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-pandas-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-pandas-master)|
   |test-conda-python-3.7-spark-branch-3.0|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-spark-branch-3.0)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-spark-branch-3.0)|
   |test-conda-python-3.7-turbodbc-latest|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-turbodbc-latest)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-turbodbc-latest)|
   |test-conda-python-3.7-turbodbc-master|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.7-turbodbc-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.7-turbodbc-master)|
   |test-conda-python-3.8|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.8)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.8)|
   |test-conda-python-3.8-dask-master|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.8-dask-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.8-dask-master)|
   |test-conda-python-3.8-jpype|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-477-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-477-github-test-conda-python-3.8-jpype)|
   |test-conda-python-3.8-pandas-latest|[![Github 

[GitHub] [arrow] github-actions[bot] commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671979365


   Revision: 3e72b7ad35355b66ae9b34d2d26dbcb879d19f82
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-476](https://github.com/ursa-labs/crossbow/branches/all?query=actions-476)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-476-travis-homebrew-cpp-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


pitrou commented on pull request #7931:
URL: https://github.com/apache/arrow/pull/7931#issuecomment-671977144


   @github-actions crossbow submit -g python



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #7929: ARROW-9602: [R] Improve cmake detection in Linux build

2020-08-11 Thread GitBox


nealrichardson commented on a change in pull request #7929:
URL: https://github.com/apache/arrow/pull/7929#discussion_r468620814



##
File path: r/tools/linuxlibs.R
##
@@ -315,8 +315,13 @@ build_libarrow <- function(src_dir, dst_dir) {
 }
 
 ensure_cmake <- function() {
-  cmake <- Sys.which("cmake")
-  if (!nzchar(cmake) || cmake_version() < 3.2) {
+  cmake <- find_cmake(c(
+Sys.getenv("CMAKE"),
+Sys.which("cmake"),
+Sys.which("cmake3")

Review comment:
   Apparently so; it was in the report on 
https://issues.apache.org/jira/browse/ARROW-9602





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7931: ARROW-9692: [Python] Fix distutils-related warning

2020-08-11 Thread GitBox


pitrou opened a new pull request #7931:
URL: https://github.com/apache/arrow/pull/7931


   Setuptools should be imported before distutils.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


nealrichardson commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671975955


   @github-actions crossbow submit homebrew-cpp-autobrew



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jeroen commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


jeroen commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671973724


   Maybe there is a problem with the unity builds of aws-sdk-cpp or arrow



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


pitrou commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r468605319



##
File path: cpp/src/arrow/filesystem/s3fs_test.cc
##
@@ -197,6 +201,44 @@ TEST(S3Options, FromUri) {
   ASSERT_RAISES(Invalid, S3Options::FromUri("s3:///foo/bar/", ));
 }
 
+TEST(S3Options, FromAccessKey) {
+  S3Options options;
+
+  // session token is optional and should default to empty string
+  options = S3Options::FromAccessKey("access", "secret");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "");
+
+  options = S3Options::FromAccessKey("access", "secret", "token");
+  ASSERT_EQ(options.GetAccessKey(), "access");
+  ASSERT_EQ(options.GetSecretKey(), "secret");
+  ASSERT_EQ(options.GetSessionToken(), "token");
+}
+
+TEST(S3Options, FromAssumeRole) {
+  S3Options options;
+
+  // we set this envvar to speed up tests by ensuring
+  // DefaultAWSCredentialsProviderChain does not query (inaccessible)
+  // EC2 metadata endpoint
+  ASSERT_OK(SetEnvVar("AWS_EC2_METADATA_DISABLED", "true"));

Review comment:
   So, the right way to do this would be to create a test fixture:
   ```c++
   class S3OptionsTest : public ::testing::Test {
public:
 void SetUp() {
   ASSERT_OK(SetEnvVar("AWS_EC2_METADATA_DISABLED", "true"));
 }
 void TearDown() {
   ASSERT_OK(DelEnvVar("AWS_EC2_METADATA_DISABLED"));
 }
   };
   ```
   And then to replace every `TEST(S3Options, ...)` with `TEST_F(S3OptionsTest, 
...)`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


pitrou edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671959521


   >  Is there anything I need to do to get the CI systems to pick up the new 
conda-forge packages?
   
   I don't think so, perhaps there is a CDN in-between that delays the package 
updates.
   
   Edit: no, that's because the conda packages are installed in a Docker image. 
We'll need to wait for the Docker image to be rebuilt...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


pitrou edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671963588


   @corleyma Can you apply this patch? 
https://gist.github.com/pitrou/0dc970a2238b9c19b5f8fb991d2fb8f7



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671963778


   Revision: 9a8364dd3e89052f2925f34fc591f5e17563ab90
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-475](https://github.com/ursa-labs/crossbow/branches/all?query=actions-475)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-475-travis-homebrew-cpp-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671963588


   @corleyma Can you apply this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


nealrichardson commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671961594


   @github-actions crossbow submit homebrew-cpp-autobrew



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671961201


   @corleyma Hmm, it seems that by force-pushing you deleted the changes I made 
yesterday.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7803: ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options

2020-08-11 Thread GitBox


pitrou commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671959521


   >  Is there anything I need to do to get the CI systems to pick up the new 
conda-forge packages?
   
   I don't think so, perhaps there is a CDN in-between that delays the package 
updates. Let's try again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7920: ARROW-9638: [C++][Compute] Implement mode kernel

2020-08-11 Thread GitBox


pitrou commented on pull request #7920:
URL: https://github.com/apache/arrow/pull/7920#issuecomment-671958639


   Rebased and pushed a tiny change. Will merge if/when CI is green.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #7876: ARROW-9615: [Rust] Added kernel to compute length of a string.

2020-08-11 Thread GitBox


nevi-me commented on a change in pull request #7876:
URL: https://github.com/apache/arrow/pull/7876#discussion_r468582014



##
File path: rust/arrow/src/compute/kernels/length.rs
##
@@ -0,0 +1,185 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines kernel for length of a string array
+
+use crate::array::*;
+use crate::{
+datatypes::DataType,
+datatypes::UInt32Type,
+error::{ArrowError, Result},
+};
+use std::sync::Arc;
+
+/// Returns an array of UInt32 denoting the number of characters in each 
string in the array.
+///
+/// * this only accepts StringArray
+/// * lenght of null is null.
+/// * length is in number of bytes
+pub fn length(array: ) -> Result {
+match array.data_type() {
+DataType::Utf8 => {
+// note: offsets are stored as u8, but they can be interpreted as 
u32
+let offsets = array.data_ref().clone().buffers()[0].clone();
+// this is a 30% improvement over iterating over u8s and building 
u32

Review comment:
   nit: we could probably remove this comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #7876: ARROW-9615: [Rust] Added kernel to compute length of a string.

2020-08-11 Thread GitBox


nevi-me commented on a change in pull request #7876:
URL: https://github.com/apache/arrow/pull/7876#discussion_r468581694



##
File path: rust/arrow/src/compute/kernels/length.rs
##
@@ -0,0 +1,185 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines kernel for length of a string array
+
+use crate::array::*;
+use crate::{
+datatypes::DataType,
+datatypes::UInt32Type,
+error::{ArrowError, Result},
+};
+use std::sync::Arc;
+
+/// Returns an array of UInt32 denoting the number of characters in each 
string in the array.
+///
+/// * this only accepts StringArray
+/// * lenght of null is null.

Review comment:
   There's a typo here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7930: ARROW-9691: [Rust] [DataFusion] Make sql_statement_to_plan method public

2020-08-11 Thread GitBox


github-actions[bot] commented on pull request #7930:
URL: https://github.com/apache/arrow/pull/7930#issuecomment-671948694


   https://issues.apache.org/jira/browse/ARROW-9691



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove opened a new pull request #7930: ARROW-9691: [Rust] [DataFusion] Make sql_statement_to_plan method public

2020-08-11 Thread GitBox


andygrove opened a new pull request #7930:
URL: https://github.com/apache/arrow/pull/7930


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed issue #7910: OSError: Could not connect to socket

2020-08-11 Thread GitBox


wesm closed issue #7910:
URL: https://github.com/apache/arrow/issues/7910


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7928: ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows

2020-08-11 Thread GitBox


pitrou commented on pull request #7928:
URL: https://github.com/apache/arrow/pull/7928#issuecomment-671932929


   No idea. Can we have the entire output from `make`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7768: ARROW-9475: [Java] Clean up usages of BaseAllocator, use BufferAllocator in…

2020-08-11 Thread GitBox


rymurr commented on a change in pull request #7768:
URL: https://github.com/apache/arrow/pull/7768#discussion_r468539432



##
File path: 
java/memory/memory-core/src/main/java/org/apache/arrow/memory/BufferAllocator.java
##
@@ -126,6 +134,30 @@ BufferAllocator newChildAllocator(
*/
   long getHeadroom();
 
+  /**
+   * Forcibly allocate bytes. Returns whether the allocation fit within limits.
+   *
+   * @param size to increase
+   * @return Whether the allocation fit within limits.
+   */
+  boolean forceAllocate(long size);

Review comment:
   :+1:  sorry, missed that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7908: ARROW-9358: [Integration] remove generated_large_batch.json

2020-08-11 Thread GitBox


pitrou commented on pull request #7908:
URL: https://github.com/apache/arrow/pull/7908#issuecomment-671836605


   No urgency at all. This is just an improvement in development comfort.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7908: ARROW-9358: [Integration] remove generated_large_batch.json

2020-08-11 Thread GitBox


emkornfield commented on pull request #7908:
URL: https://github.com/apache/arrow/pull/7908#issuecomment-671835723


   Is there an urgency for this?  Have my hands full at work today and tomorrow 
at least, i can try to look later in the week?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7929: ARROW-9602: [R] Improve cmake detection in Linux build

2020-08-11 Thread GitBox


pitrou commented on a change in pull request #7929:
URL: https://github.com/apache/arrow/pull/7929#discussion_r468419449



##
File path: r/tools/linuxlibs.R
##
@@ -315,8 +315,13 @@ build_libarrow <- function(src_dir, dst_dir) {
 }
 
 ensure_cmake <- function() {
-  cmake <- Sys.which("cmake")
-  if (!nzchar(cmake) || cmake_version() < 3.2) {
+  cmake <- find_cmake(c(
+Sys.getenv("CMAKE"),
+Sys.which("cmake"),
+Sys.which("cmake3")
+  ))
+
+  if (is.null(cmake)) {
 # If not found, download it
 cat(" cmake\n")
 CMAKE_VERSION <- Sys.getenv("CMAKE_VERSION", "3.16.2")

Review comment:
   You probably want to bump that version: latest is either 3.18.1, 3.17.4 
or 3.16.8.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #7929: ARROW-9602: [R] Improve cmake detection in Linux build

2020-08-11 Thread GitBox


pitrou commented on a change in pull request #7929:
URL: https://github.com/apache/arrow/pull/7929#discussion_r468418886



##
File path: r/tools/linuxlibs.R
##
@@ -315,8 +315,13 @@ build_libarrow <- function(src_dir, dst_dir) {
 }
 
 ensure_cmake <- function() {
-  cmake <- Sys.which("cmake")
-  if (!nzchar(cmake) || cmake_version() < 3.2) {
+  cmake <- find_cmake(c(
+Sys.getenv("CMAKE"),
+Sys.which("cmake"),
+Sys.which("cmake3")

Review comment:
   Does `cmake3` actually exist in the wild?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] corleyma edited a comment on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

2020-08-11 Thread GitBox


corleyma edited a comment on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671774899


   > I've submitted a PR for conda-forge at 
[conda-forge/aws-sdk-cpp-feedstock#124](https://github.com/conda-forge/aws-sdk-cpp-feedstock/pull/124)
   
   Thanks for that @pitrou.  Is there anything I need to do to get the CI 
systems to pick up the new conda-forge packages?  I see relevant releases here: 
https://anaconda.org/conda-forge/aws-sdk-cpp/files?version=1.7.164, but I 
recently pushed a change that is still failing conda builds.  It looks like the 
previous release is still be used for the respective platforms.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] corleyma commented on pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

2020-08-11 Thread GitBox


corleyma commented on pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#issuecomment-671774899


   > I've submitted a PR for conda-forge at 
[conda-forge/aws-sdk-cpp-feedstock#124](https://github.com/conda-forge/aws-sdk-cpp-feedstock/pull/124)
   
   Thanks for that @pitrou.  Is there anything I need to do to get the CI 
systems to pick up the new conda-forge packages?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] corleyma commented on a change in pull request #7803: ARROW-9517: [C++/Python] Add optional session_token to S3Options

2020-08-11 Thread GitBox


corleyma commented on a change in pull request #7803:
URL: https://github.com/apache/arrow/pull/7803#discussion_r468369686



##
File path: cpp/src/arrow/filesystem/s3fs.h
##
@@ -62,10 +75,18 @@ struct ARROW_EXPORT S3Options {
   void ConfigureAnonymousCredentials();
 
   /// Configure with explicit access and secret key.
-  void ConfigureAccessKey(const std::string& access_key, const std::string& 
secret_key);
+  void ConfigureAccessKey(const std::string& access_key, const std::string& 
secret_key,
+  const std::string& session_token = "");
+
+  /// Configure with credentials from an assumed role.
+  void ConfigureAssumeRoleCredentials(
+  const std::string& role_arn, const std::string& session_name = "",
+  const std::string& external_id = "", int load_frequency = 900,
+  const std::shared_ptr& stsClient = nullptr);

Review comment:
   I inferred this is something to do with compliance with C++/CLI; 
replaced with existing NULLPTR macro.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-08-11 Thread GitBox


kiszk commented on pull request #7507:
URL: https://github.com/apache/arrow/pull/7507#issuecomment-671751774


   @kou I addressed your two comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org