[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-03-04 Thread GitBox


mathyingzhou commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-791209166


   @pitrou Yup I found your changes to Random 6 days ago in Arrow-11662. Things 
did break after that. Now I’m trying to figure out whether it is my ORC writer 
or Decimal128 generator that needs to be fixed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-03-04 Thread GitBox


liyafan82 commented on a change in pull request #8949:
URL: https://github.com/apache/arrow/pull/8949#discussion_r588067931



##
File path: java/compression/pom.xml
##
@@ -0,0 +1,56 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.arrow
+arrow-java-root
+4.0.0-SNAPSHOT
+  
+  arrow-compression
+  Arrow Compression
+  (Experimental/Contrib) A library for working with the 
compression/decompression of Arrow data.
+
+  
+
+  org.apache.arrow
+  arrow-format
+  ${project.version}
+
+
+  org.apache.arrow
+  arrow-vector
+  ${project.version}
+  ${arrow.vector.classifier}
+
+
+  org.apache.arrow
+  arrow-memory-core
+  ${project.version}
+
+
+  org.apache.arrow
+  arrow-memory-unsafe
+  ${project.version}
+  test
+
+
+  org.apache.commons
+  commons-compress
+  1.20
+
+
+  io.netty

Review comment:
   The `vector` module requires this library. If we do not add it in the 
dependency list, we get an errow when building with maven:
   
   ```
   Used undeclared dependencies found: 
io.netty:netty-common:jar:4.1.48.Final:compile
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-03-04 Thread GitBox


liyafan82 commented on pull request #8949:
URL: https://github.com/apache/arrow/pull/8949#issuecomment-791193697


   > @liyafan82 nice work. Left a few comments about API and structure let me 
know what you think.
   
   @emkornfield Thanks a lot for your comments. I will resolve them one by one 
ASAP. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on issue #9628: write_feather incorrectly deletes files

2021-03-04 Thread GitBox


emkornfield commented on issue #9628:
URL: https://github.com/apache/arrow/issues/9628#issuecomment-791183396


   First one seems reasonable to me.
   
   Second one, I'm not sure about it.  This seems intentional: 
https://github.com/apache/arrow/commit/96f3d6176d8c95717f4ff45e4226161de3168b05 
but this could be revisited (maybe discuss on dev@ mailing list before 
investing time to fix)
   
   Please open JIRAs for each when you get  chance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-03-04 Thread GitBox


emkornfield commented on a change in pull request #8949:
URL: https://github.com/apache/arrow/pull/8949#discussion_r588044636



##
File path: 
java/compression/src/main/java/org/apache/arrow/compression/CommonsCompressionFactory.java
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.compression;
+
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.vector.compression.CompressionCodec;
+import org.apache.arrow.vector.compression.NoCompressionCodec;
+
+/**
+ * A factory implementation based on Apache Commons library.
+ */
+public class CommonsCompressionFactory implements CompressionCodec.Factory {
+
+  public static final CommonsCompressionFactory INSTANCE = new 
CommonsCompressionFactory();
+
+  @Override
+  public CompressionCodec createCodec(byte codecType) {

Review comment:
   It seems like byte isn't the right interface here.  Can we decouple from 
the flatbuf enum for the factory?

##
File path: 
java/compression/src/main/java/org/apache/arrow/compression/Lz4CompressionCodec.java
##
@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+import org.apache.arrow.flatbuf.CompressionType;
+import org.apache.arrow.memory.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.util.MemoryUtil;
+import org.apache.arrow.util.Preconditions;
+import org.apache.arrow.vector.compression.CompressionCodec;
+import org.apache.arrow.vector.compression.CompressionUtil;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream;
+import 
org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorOutputStream;
+import org.apache.commons.compress.utils.IOUtils;
+
+import io.netty.util.internal.PlatformDependent;
+
+/**
+ * Compression codec for the LZ4 algorithm.
+ */
+public class Lz4CompressionCodec implements CompressionCodec {
+
+  @Override
+  public ArrowBuf compress(BufferAllocator allocator, ArrowBuf 
uncompressedBuffer) {
+Preconditions.checkArgument(uncompressedBuffer.writerIndex() <= 
Integer.MAX_VALUE,
+"The uncompressed buffer size exceeds the integer limit");
+
+if (uncompressedBuffer.writerIndex() == 0L) {
+  // shortcut for empty buffer
+  ArrowBuf compressedBuffer = 
allocator.buffer(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH);
+  compressedBuffer.setLong(0, 0);
+  
compressedBuffer.writerIndex(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH);
+  uncompressedBuffer.close();
+  return compressedBuffer;
+}
+
+try {
+  ArrowBuf compressedBuffer = doCompress(allocator, uncompressedBuffer);
+  long compressedLength = compressedBuffer.writerIndex() - 
CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH;
+  if (compressedLength > uncompressedBuffer.writerIndex()) {
+// compressed buffer is larger, send the raw buffer
+compressedBuffer.close();
+compressedBuffer = CompressionUtil.compressRawBuffer(allocator, 
uncompressedBuffer);
+  }
+
+  uncompressedBuffer.close();
+  return compressedBuffer;
+} catch (IOException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  private ArrowBuf doCompress(BufferAllocator allocator, 

[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791158165


   Revision: 6eda5896c638ca1c7af577c02ec56bd802a2fe42
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-186](https://github.com/ursacomputing/crossbow/branches/all?query=actions-186)
   
   |Task|Status|
   ||--|
   
|test-r-without-dataset-parquet-s3|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-186-azure-test-r-without-dataset-parquet-s3)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-186-azure-test-r-without-dataset-parquet-s3)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791158004


   Revision: 6eda5896c638ca1c7af577c02ec56bd802a2fe42
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-185](https://github.com/ursacomputing/crossbow/branches/all?query=actions-185)
   
   |Task|Status|
   ||--|
   
|conda-linux-gcc-py36-cpu-r36|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-linux-gcc-py36-cpu-r36)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-linux-gcc-py36-cpu-r36)|
   
|conda-linux-gcc-py37-cpu-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-linux-gcc-py37-cpu-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-linux-gcc-py37-cpu-r40)|
   
|conda-osx-clang-py36-r36|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-osx-clang-py36-r36)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-osx-clang-py36-r36)|
   
|conda-osx-clang-py37-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-osx-clang-py37-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-osx-clang-py37-r40)|
   
|conda-win-vs2017-py36-r36|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-win-vs2017-py36-r36)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-win-vs2017-py36-r36)|
   
|conda-win-vs2017-py37-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-win-vs2017-py37-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-win-vs2017-py37-r40)|
   |homebrew-r-autobrew|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-185-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-185-github-homebrew-r-autobrew)|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-185-github-test-r-linux-as-cran)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-185-github-test-r-linux-as-cran)|
   
|test-r-rhub-ubuntu-gcc-release|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rhub-ubuntu-gcc-release)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rhub-ubuntu-gcc-release)|
   
|test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rocker-r-base-latest)|
   
|test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-bionic)|
   
|test-r-rstudio-r-base-3.6-centos7-devtoolset-8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)|
   
|test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-centos8)|
   
|test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-opensuse15)|
   

[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791155609


   @github-actions crossbow submit test-r-without-dataset-parquet-s3



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook removed a comment on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook removed a comment on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791154080


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791154662


   @github-actions crossbow submit -g r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791154080


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] michalursa commented on pull request #9621: ARROW-11591: [C++][Compute] Grouped aggregation

2021-03-04 Thread GitBox


michalursa commented on pull request #9621:
URL: https://github.com/apache/arrow/pull/9621#issuecomment-791153213


   > Before digging into the details too much, my main issue with what I see is 
that I don't agree with making hash aggregation a callable function through 
`CallFunction`.
   > 
   > In the context of a query engine, the interface for this operator looks 
something like:
   > 
   > ```
   > class ExecNode {
   >  public:
   >   virtual void Push(int node_index, const ExecBatch& batch) = 0;
   >   virtual void PushDone(int node_index) = 0;
   > };
   > 
   > class HashAggregationNode : public ExecNode {
   >   ...
   > }; 
   > ```
   > 
   > (some query engines use a "pull"-based model, in which the data flow is 
inverted — there are pros and cons to both approaches, see 
https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf)
   > 
   > If you want a simple one-shot version of the algorithm (rather than a 
general "streaming" one like the above), then you can break the input data into 
record batches of the desired size (e.g. 4K - 64K rows, depending on 
heuristics) and then push the chunks into the node that you create (note that 
the HashAggregationNode should push its result into a terminal "OutputNode" 
when you invoke `hash_agg_node->PushDone(0)`).
   > 
   > The API can be completely crude / preliminary, but would it be possible to 
use the query-engine-type approach for this? I think it would be best to start 
taking some strides in this direction rather than bolting this onto the array 
function execution machinery which doesn't make sense in a query processing 
context (because aggregation is fundamentally a streaming algorithm)
   > 
   > On the hash aggregation functions themselves, perhaps it makes sense to 
add a `HASH_AGGREGATE` function type and define the kernel interface for these 
functions, then look up these functions using the general dispatch machinery?
   
   I like these points. Let me break into subtopics what I think we are facing 
now and what we should think about for the future. I read the comment above as 
advocating bringing support for two concepts: a) relational operators (that are 
building blocks of a query execution pipeline or a more general query execution 
plan) and b) pipelines / streaming processing. 
   
   **1. How we got here**
   
   Part of the problem is that we don't have a problem just yet. We didn't bite 
a big enough chunk of work to force us to do the appropriate refactoring. And 
the problem I am referring to is that of pipelining, which we do not have 
support for yet. From what I understand, currently we cannot bind together 
multiple operations into a single processing pipeline executed using a single 
Arrow function call, computing expressions on the fly without persisting their 
results for an entire set of rows. 
   
   Related to the pipelining is the fact that in this PR we do not stream the 
group by output. That way, at some level of abstraction (squinting eyes) we can 
treat it as a scalar aggregate (we output a single item which happens to be an 
entire collection of arrays; variants make it possible).
   
   But to me the bigger problem here is that we are mixing together two 
separate worlds: scalar operators and relational operators and that muddies the 
general picture. That mixing happens as a consequence of treating the group by 
as a modified scalar aggregate in the code.
   
   ** 2. Scalar operators vs relational operators**
   
   One way to think about it is that scalar expressions (compositions of scalar 
operators) are like C++ lambdas (e.g. comparison of two values) provided in a 
call to C++ STL algorithms / containers (e.g. std::sort) while relational 
operators correspond to these algorithms / containers (except that they 
additionally have support for composability - creating execution pipelines / 
trees / DAGs). 
   
   The point of confusion may come from the fact that once you vectorize scalar 
expressions (which you probably want to do, for performance reasons), their 
interfaces start looking very similar (if not the same) as would some special 
relational operators (namely: filter, project and scalar aggregate). I claim 
that current kernel executor classes are relational operators in disguise - 
project (aka compute scalar) and reduce (aka scalar aggregate). 
   Relational operators inside current group by
   Interestingly, group by present in this PR, can itself be treated as a DAG 
of multiple relational operators with some kind of pipelines connecting them. 
The input batch is first processed by the code that assigns integer group id to 
every row based on key columns. The output of that is then processed by zero, 
one or more group by aggregate operators that update aggregate related 
accumulators in their internal arrays. At the end of the input, the output 
array related to group id mapping component is concatenated to output arrays 
for individual group by aggregate operators to produce output 

[GitHub] [arrow] xuanqing94 commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

2021-03-04 Thread GitBox


xuanqing94 commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-791152745


   @joeyac Sorry to bring it up again... but I am deeply confused about this 
line of code:
   `auto column_struct = 
std::static_pointer_cast(table->GetColumnByName("column")->chunk(0));
   `
   since GetColumnByName returns a shared_ptr to ChunkedArray, would accessing 
only the chunk(0) returns incomplete data? 
   like if the data file is huge and truncated (from the document I couldn't 
really get the idea). 
   Could you say a bit more about this? Thanks a lot!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on pull request #9633: ARROW-11560: [C++][FlightRPC] fix mutex error on SIGINT

2021-03-04 Thread GitBox


cyb70289 commented on pull request #9633:
URL: https://github.com/apache/arrow/pull/9633#issuecomment-791150699


   Also update PR comment to match latest code? `semaphore` -> `pipe`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook removed a comment on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook removed a comment on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791139873


   @github-actions crossbow submit -g r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r588021512



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   The build looks good after 9479c6c:
   
https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-184-azure-test-r-minimal-build





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791139873


   @github-actions crossbow submit -g r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


kou closed pull request #9553:
URL: https://github.com/apache/arrow/pull/9553


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791117825


   Revision: 9479c6ce079c7230b0e290dcb5658436d1d62fde
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-184](https://github.com/ursacomputing/crossbow/branches/all?query=actions-184)
   
   |Task|Status|
   ||--|
   
|test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-184-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-184-azure-test-r-minimal-build)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791117362


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587998671



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Ok, thanks. I tried a few different things just for kicks, then used 
`-e` per your suggestion in 9479c6ce079c7230b0e290dcb5658436d1d62fde





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791110809


   Revision: 4068f471efc55584dbfea5df454af19dfcb1492d
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-183](https://github.com/ursacomputing/crossbow/branches/all?query=actions-183)
   
   |Task|Status|
   ||--|
   |test-r-install-macos|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-183-github-test-r-install-macos)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-183-github-test-r-install-macos)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


jonkeane commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791106222


   @github-actions crossbow submit test-r-install-macos
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791104150


   Revision: a810b3fce373c7f5090e367e5a6afc70db60414a
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-182](https://github.com/ursacomputing/crossbow/branches/all?query=actions-182)
   
   |Task|Status|
   ||--|
   
|test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-182-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-182-azure-test-r-minimal-build)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791103746


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791095801


   Revision: 27c514cb1b14e7125e8ba50714c6480c93dc231a
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-181](https://github.com/ursacomputing/crossbow/branches/all?query=actions-181)
   
   |Task|Status|
   ||--|
   
|test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-181-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-181-azure-test-r-minimal-build)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook removed a comment on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook removed a comment on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-790887608


   @github-actions crossbow submit -g r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791095464


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


nealrichardson commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587961901



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Ah yeah, I think it's the same problem--note (for comparison) that 
`R_ORG` et al. are in the .env file, and they're environment variables for 
docker-compose, not passed into the docker container.
   
   The fact that `ARROW_R_DEV` is in the env file, and then passed through in 
docker-compose.yml to the build environments, may just be due to my ignorance 
of all of this at the time I added it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#issuecomment-791077751


   Revision: 2e6905a7262a6139c52c4007fd95f58b378e71ca
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-180](https://github.com/ursacomputing/crossbow/branches/all?query=actions-180)
   
   |Task|Status|
   ||--|
   |test-build-vcpkg-win|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-180-github-test-build-vcpkg-win)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-180-github-test-build-vcpkg-win)|
   |wheel-manylinux2010-cp36m|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-180-github-wheel-manylinux2010-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-180-github-wheel-manylinux2010-cp36m)|
   |wheel-windows-cp36m|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-180-github-wheel-windows-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-180-github-wheel-windows-cp36m)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs

2021-03-04 Thread GitBox


nealrichardson closed pull request #9442:
URL: https://github.com/apache/arrow/pull/9442


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587956206



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Ok, if folks might be running these commands locally then I can see how 
that would cause damage.
   Looks like it didn't work anyhow. I will investigate.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587955204



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")

Review comment:
   Changed to `FILEPATH` in 32295895be14b799eba9f3150c4c7b973e14dee3





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#issuecomment-791072177


   @github-actions crossbow submit test-build-vcpkg-win 
wheel-manylinux2010-cp36m wheel-windows-cp36m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587952686



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")

Review comment:
   Oh, I see, thanks. I'll make that change to `FILEPATH`  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791061098


   Revision: 9856d8aa91f4f334897e00333b560150fbe54296
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-179](https://github.com/ursacomputing/crossbow/branches/all?query=actions-179)
   
   |Task|Status|
   ||--|
   |test-r-install-macos|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-179-github-test-r-install-macos)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-179-github-test-r-install-macos)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


jonkeane commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791044239


   @github-actions crossbow submit test-r-install-macos



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs

2021-03-04 Thread GitBox


nealrichardson commented on pull request #9442:
URL: https://github.com/apache/arrow/pull/9442#issuecomment-791043943


   @pitrou can you give this a review/merge please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] martinblostein edited a comment on pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs

2021-03-04 Thread GitBox


martinblostein edited a comment on pull request #9442:
URL: https://github.com/apache/arrow/pull/9442#issuecomment-791042920


   Is there anything I can do here? Seems like a straightforward change.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] martinblostein commented on pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs

2021-03-04 Thread GitBox


martinblostein commented on pull request #9442:
URL: https://github.com/apache/arrow/pull/9442#issuecomment-791042920


   Is there anything I can do here? Seems like low-hanging fruit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


kou commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587924707



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")
+endif()
+message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}")
+
+# --
+# Get VCPKG_TARGET_TRIPLET
+
+if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET)
+  set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}")
+endif()
+# Explicitly set manifest mode on if it is not set and vcpkg.json exists
+if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS 
"${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json")
+  set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest")
+  message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON")
+endif()
+# vcpkg can install packages in two different places
+set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here 
first

Review comment:
   Oh, sorry.





This is an 

[GitHub] [arrow] kou commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


kou commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587924544



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")

Review comment:
   Thanks. I understand.
   
   For `FILEPATH`, `CMAKE_TOOLCHAIN_FILE` is for CMake not vcpkg. It's defined 
as `FILEPATH` in `Modules/CMakeDetermineSystem.cmake`: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/CMakeDetermineSystem.cmake#L126-131





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-04 Thread GitBox


nealrichardson commented on pull request #9586:
URL: https://github.com/apache/arrow/pull/9586#issuecomment-791039254


   I'll add a couple more tests, particularly around error handling, but I'd 
like to move on to other issues and get this merged so that others can test it 
out more widely. cc @ianmcook @jonkeane 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#issuecomment-791037231


   Revision: d7f0e2236b55ce0ecd2815ccd07e7235288f385b
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-178](https://github.com/ursacomputing/crossbow/branches/all?query=actions-178)
   
   |Task|Status|
   ||--|
   |test-build-vcpkg-win|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-178-github-test-build-vcpkg-win)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-178-github-test-build-vcpkg-win)|
   |wheel-manylinux2010-cp36m|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-178-github-wheel-manylinux2010-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-178-github-wheel-manylinux2010-cp36m)|
   |wheel-windows-cp36m|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-178-github-wheel-windows-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-178-github-wheel-windows-cp36m)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#issuecomment-791036882


   @github-actions crossbow submit test-build-vcpkg-win 
wheel-manylinux2010-cp36m wheel-windows-cp36m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


nealrichardson commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587918983



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   I think the only danger to appending to the .env file is that if you ran 
this locally, wouldn't it modify the file locally?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791036468


   Revision: 318371fbaa07f1b6fcdf71f58d7d5460a931da13
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-177](https://github.com/ursacomputing/crossbow/branches/all?query=actions-177)
   
   |Task|Status|
   ||--|
   
|test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-177-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-177-azure-test-r-minimal-build)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-791036107


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587917624



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Oops, need to append not overwrite. Fixed in 
318371fbaa07f1b6fcdf71f58d7d5460a931da13





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587917115



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Got it, thanks. Let's see if it works to add them to `.env` 
(066ce065124cf7dcaa08225cae83177594da3965) and if not then I'll do it with `-e`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791031621


   Revision: d1f7611e496c99c7c8d9f32c4b0d35fe4fee0b38
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-176](https://github.com/ursacomputing/crossbow/branches/all?query=actions-176)
   
   |Task|Status|
   ||--|
   |test-r-install-macos|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-176-github-test-r-install-macos)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-176-github-test-r-install-macos)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


jonkeane commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791031149


   @github-actions crossbow submit test-r-install-macos



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


jonkeane commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791027678


   @github-actions crossbow submit test-r-install-macos



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587899030



##
File path: run-cmake-format.py
##
@@ -52,6 +52,7 @@
 'cpp/cmake_modules/FindRapidJSONAlt.cmake',
 'cpp/cmake_modules/FindSnappyAlt.cmake',
 'cpp/cmake_modules/FindThrift.cmake',
+'cpp/cmake_modules/Usevcpkg.cmake',

Review comment:
   Done in d7f0e2236b55ce0ecd2815ccd07e7235288f385b





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


jonkeane commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791019223


   @github-actions crossbow submit test-r-install-macos



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587898227



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")
+endif()
+message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}")
+
+# --
+# Get VCPKG_TARGET_TRIPLET
+
+if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET)
+  set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}")
+endif()
+# Explicitly set manifest mode on if it is not set and vcpkg.json exists
+if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS 
"${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json")
+  set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest")
+  message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON")
+endif()
+# vcpkg can install packages in two different places
+set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here 
first

Review comment:
   Fixed in 8543407dd614a38cd0a577063e8d790cc07e044f





[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


nealrichardson commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587897840



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Unfortunately this didn't pass through all the way: 
https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=1721=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=240
   
   Setting them here affects the environment in which docker-compose is called, 
but it doesn't forward into the docker environment.
   
   I believe what you instead want is to pass them into docker-compose below, 
something like `docker-compose -e ARROW_DATASET={{ arrow_dataset|default("ON") 
}} -e ARROW_PARQUET={{ arrow_parquet|default("ON") }} -e ARROW_S3={{ 
arrow_s3|default("ON") }} r`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-04 Thread GitBox


jonkeane commented on pull request #9579:
URL: https://github.com/apache/arrow/pull/9579#issuecomment-791014063


   @github-actions crossbow submit test-r-install-macos



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


nealrichardson commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587897840



##
File path: dev/tasks/r/azure.linux.yml
##
@@ -47,6 +47,9 @@ jobs:
 export R_ORG={{ r_org }}
 export R_IMAGE={{ r_image }}
 export R_TAG={{ r_tag }}
+export ARROW_DATASET={{ arrow_dataset|default("ON") }}
+export ARROW_PARQUET={{ arrow_parquet|default("ON") }}

Review comment:
   Unfortunately this didn't pass through all the way: 
https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=1721=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=240
   
   I believe what you instead want is to pass them into docker-compose below, 
something like `docker-compose -e ARROW_DATASET={{ arrow_dataset|default("ON") 
}} -e ARROW_PARQUET={{ arrow_parquet|default("ON") }} -e ARROW_S3={{ 
arrow_s3|default("ON") }} r`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587896214



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")
+endif()
+message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}")
+
+# --
+# Get VCPKG_TARGET_TRIPLET
+
+if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET)
+  set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}")
+endif()
+# Explicitly set manifest mode on if it is not set and vcpkg.json exists
+if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS 
"${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json")
+  set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest")
+  message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON")
+endif()
+# vcpkg can install packages in two different places
+set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here 
first

Review comment:
   Whoops, this is supposed to be `CMAKE_CURRENT_BINARY_DIR` not 
`CMAKE_CURRENT_BUILD_DIR`


[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587889678



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")

Review comment:
   I am leery of changing any of these `STRING`s to `FILEPATH`s because 
vcpkg expects everything to be strings





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587887231



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")
+endif()
+message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}")
+
+# --
+# Get VCPKG_TARGET_TRIPLET
+
+if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET)
+  set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}")
+endif()
+# Explicitly set manifest mode on if it is not set and vcpkg.json exists
+if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS 
"${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json")
+  set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest")
+  message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON")
+endif()
+# vcpkg can install packages in two different places
+set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here 
first

Review comment:
   Implemented this change in aeda9041d161ca3fdb3c46e6d1c8b5b99847c3a2





[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587884449



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")
+endif()
+message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}")
+
+# --
+# Get VCPKG_TARGET_TRIPLET
+
+if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET)
+  set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}")
+endif()
+# Explicitly set manifest mode on if it is not set and vcpkg.json exists
+if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS 
"${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json")
+  set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest")
+  message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON")
+endif()
+# vcpkg can install packages in two different places
+set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here 
first

Review comment:
   Thank you, this comment was very helpful because it made me realize that 
this script 

[GitHub] [arrow] nealrichardson commented on pull request #9621: ARROW-11591: [C++][Compute] Grouped aggregation

2021-03-04 Thread GitBox


nealrichardson commented on pull request #9621:
URL: https://github.com/apache/arrow/pull/9621#issuecomment-790998553


   With an assist from @bkietz, I've written a very basic R wrapper that 
exercises this in 
https://github.com/apache/arrow/commit/aa530cb586462bee98390d129575fe1622ffb222.
 It's enough to expose some issues to address, to say nothing of the interface 
questions.
   
   ```r
   library(arrow)
   library(dplyr)
   
   # The commit uses this option to switch to use the group_by compute function
   options(arrow.summarize = TRUE)
   # If the Arrow aggregation function isn't implemented, or if the Arrow call 
errors,
   # it falls back to pulling the data in R and evaluating in R.
   
   # mtcars is a standard dataset that ships with R
   mt <- Table$create(mtcars)
   mt %>%
 group_by(cyl) %>%
 summarize(total_hp = sum(hp))
   # Warning: Error : NotImplemented: Key of typedouble
   # ../src/arrow/compute/function.cc:178  kernel_ctx.status()
   # ; pulling data into R
   # # A tibble: 3 x 2
   # cyl total_hp
   # * 
   # 1 4  909
   # 2 6  856
   # 3 8 2929
   
   # That's unfortunate. R blurs the distinction for users between integer and 
double,
   # so it's not uncommon to have integer data stored as a float.
   # (Also, the error message is missing some whitespace.)
   
   # We can cast that to an integer and try again
   
   mt$cyl <- mt$cyl$cast(int32())
   unique(mt$cyl)
   # Array
   # 
   # [
   #   6,
   #   4,
   #   8
   # ]
   
   mt %>%
 group_by(cyl) %>%
 summarize(total_hp = sum(hp))
   # StructArray
   # >
   # -- is_valid: all not null
   # -- child 0 type: double
   #   [
   # 856,
   # 909,
   # 2929
   #   ]
   # -- child 1 type: int64
   #   [
   # 17179869190,
   # 8,
   # 0
   #   ]
   
   # Alright, it computed and got the same numbers, but the StructArray
   # is not valid. Type says int32 but data says int64 and we have misplaced 
bits
   
   # Let's try a different stock dataset
   ir <- Table$create(iris)
   ir %>%
 group_by(Species) %>%
 summarize(total_length = sum(Sepal.Length))
   # Warning: Error : NotImplemented: Key of typedictionary
   # ../src/arrow/compute/function.cc:178  kernel_ctx.status()
   # ; pulling data into R
   # # A tibble: 3 x 2
   #   Speciestotal_length
   # *  
   # 1 setosa 250.
   # 2 versicolor 297.
   # 3 virginica  329.
   
   # Hmm. dictionary types really need to be supported.
   # Let's work around and cast it to string
   
   ir$Species <- ir$Species$cast(utf8())
   unique(ir$Species)
   # Array
   # 
   # [
   #   "setosa",
   #   "versicolor",
   #   "virginica"
   # ]
   ir %>%
 group_by(Species) %>%
 summarize(total_length = sum(Sepal.Length))
   # Warning: Error : Invalid: Negative buffer resize: -219443965
   # ../src/arrow/buffer.cc:262  buffer->Resize(size)
   # ../src/arrow/compute/kernels/aggregate_basic.cc:1005  
(_error_or_value9).status()
   # ../src/arrow/compute/function.cc:193  
executor->Execute(implicitly_cast_args, listener.get())
   # ; pulling data into R
   # # A tibble: 3 x 2
   #   Speciestotal_length
   # *  
   # 1 setosa 250.
   # 2 versicolor 297.
   # 3 virginica  329.
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587872742



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")
+endif()
+message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}")
+message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}")
+
+# --
+# Get VCPKG_TARGET_TRIPLET
+
+if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET)
+  set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}")
+endif()
+# Explicitly set manifest mode on if it is not set and vcpkg.json exists
+if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS 
"${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json")
+  set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest")

Review comment:
   Yes `CACHE` is required here. Without it, vcpkg does not see the 
variable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For 

[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r587870368



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")

Review comment:
   `CACHE` is necessary here and and in many of the other places where I 
used it in this file. I initially attempted to cache none of the variables, but 
that caused numerous errors in the CMake scripts, in vcpkg, and in the build. I 
experimented with using `PARENT_SCOPE` but that did not solve most of the 
problems, so I used `CACHE` for all the variables whose scope of interaction 
extends beyond this one script.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] yordan-pavlov edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


yordan-pavlov edited a comment on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790979386


   @Dandandan I would be happy to collaborate on this;
   I have been using MS Visual Studio for profiling DataFusion and Arrow, and 
most of the time it works fairly well and gives useful insight.
   From what I have observed, after my change to push filers down to parquet 
(by filtering out entire row groups), now about half the time is spent in 
`ComplexObjectArrayReader::next_batch`; then inside this method
   * about 16% of total runtime is spent on 
`data_buffer.resize_with(batch_size, T::T::default);` where in the case of 
`StringArray`, `data_bufer` is `Vec` 
   * about 8% of total time is spent on `data_buffer.truncate(num_read);`
   * about 10% of total time is spent in 
`data_buffer.into_iter().zip(self.def_levels_buffer.as_ref().unwrap().iter()).map(...).collect()`
   * another 10% of total time is spent in `let mut array = 
self.converter.convert(data)?;` - this is where the `Utf8ArrayConverter` is used
   * more time is also spent at the very end of the `next_batch` function, I 
imagine to deallocate the large number of `ByteArray` objects created
   
   I have managed to achieve about 10-15% improvement by replacing 
`data_buffer.truncate(num_read)` with using an iterator instead, but there is a 
lot more work to do. As I commented above, there is a number of improvements 
that could be done:
   * not converting into intermediate `ByteArray` objects on the way into a 
`StringArray` should result in a significant improvement
   * also using iterators to fetch data loaded from parquet, instead of writing 
values in pre-allocated arrays should avoid a lot of unnecessary allocation
   * for optimal performance, ideally not do any intermediate conversion at 
all, and just copy bytes slices from an iterator (over parquet data pages) into 
an arrow array
   
   All of these changes, I think, will propagate all the way to the low-level 
decoders and will take time, but when done should put us in a good position to 
transition to `async` using `Stream`s (my understanding is that a `Stream` is 
effectively an `async Iterator`).
   I am still figuring out how to make this work exactly and how to split the 
work into smaller pieces which could be done over time. I hope to be able to 
make more progress on this in the next few days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] yordan-pavlov edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


yordan-pavlov edited a comment on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790979386


   @Dandandan I would be happy to collaborate on this;
   I have been using MS Visual Studio for profiling DataFusion and Arrow, and 
most of the time it works fairly well and gives useful insight.
   From what I have observed, after my change to push filers down to parquet 
(by filtering out entire row groups), now about half the time is spent in 
`ComplexObjectArrayReader::next_batch`; then inside this method
   * about 16% of total runtime is spent on 
`data_buffer.resize_with(batch_size, T::T::default);` where in the case of 
`StringArray`, `data_bufer` is `Vec` 
   * about 8% of total time is spent on `data_buffer.truncate(num_read);`
   * about 10% of total time is spent in 
`data_buffer.into_iter().zip(self.def_levels_buffer.as_ref().unwrap().iter()).map(...).collect()`
   * another 10% of total time is spent in `let mut array = 
self.converter.convert(data)?;` - this is where the `Utf8ArrayConverter` is used
   * more time is also spent at the very end of the `next_batch` function, I 
imagine to deallocate the large number of `ByteArray` objects created
   
   I have managed to achieve about 10-15% improvement by replacing 
`data_buffer.truncate(num_read)` with using an iterator instead, but there is a 
lot more work to do. As I commented above, there is a number of improvements 
that could be done:
   * not converting into intermediate `ByteArray` objects on the way into a 
`StringArray` should result in a significant improvement
   * also using iterators to fetch data loaded from parquet, instead of writing 
values in pre-allocated arrays should avoid a lot of unnecessary allocation
   * for optimal performance, ideally not do any intermediate conversion at 
all, and just copy bytes slices from an iterator (over parquet data pages) into 
an arrow array
   
   All of these changes, I think, will propagate all the way to the low-level 
decoders and will take time, but when done should put us in a good position to 
transition to async using Streams (my understanding is that a Stream is 
effectively an async Iterator).
   I am still figuring out how to make this work exactly and how to split the 
work into smaller pieces which could be done over time. I hope to be able to 
make more progress on this in the next few days.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


jorgecarleitao commented on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790980785


   fwiw, I strongly agree with you, @yordan-pavlov on _all_ those marks. I 
recently went through the `arrow/` module of the parquet crate and concluded 
exactly the same (including the "takes time" part).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] yordan-pavlov commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


yordan-pavlov commented on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790979386


   @Dandandan I would be happy to collaborate on this;
   I have been using MS Visual Studio for profiling DataFusion and Arrow, and 
most of the time it works fairly well and gives useful insight.
   From what I have observed, after my change to push filers down to parquet 
(by filtering out entire row groups), now about half the time is spent in 
`ComplexObjectArrayReader::next_batch`; then inside this method
   * about 16% of total runtime is spent on 
`data_buffer.resize_with(batch_size, T::T::default);` where in the case of 
`StringArray`, `data_bufer` is `Vec` 
   * about 8% of total time is spent on `data_buffer.truncate(num_read);`
   * about 10% of total time is spent in 
`data_buffer.into_iter().zip(self.def_levels_buffer.as_ref().unwrap().iter()).map(...).collect()`
   * another 10% of total time is spent in `let mut array = 
self.converter.convert(data)?;` - this is where the `Utf8ArrayConverter` is used
   * more time is also spent at the very end of the `next_batch` function, I 
imagine to deallocate the large number of `ByteArray` objects created
   
   I have managed to achieve about 10-15% improvement by replacing 
`data_buffer.truncate(num_read)` with using an iterator instead, but there is a 
lot more work to do. As I commented above, there is a number of improvements 
that could be done:
   * not converting into intermediate `ByteArray` objects on the way into a 
`StringArray` should result in a significant improvement
   * also using iterators to fetch data loaded from parquet, instead of writing 
values in pre-allocated arrays should avoid a lot of unnecessary allocation
   * for optimal performance, ideally not do any intermediate conversion at 
all, and just copy bytes slices from an iterator (over parquet data pages) into 
an arrow array
   
   All of these changes, I think, will propagate all the way to the low-level 
decoders and will take time, but when done should put us in a good position to 
transition to async using Streams (my understanding is that a Stream is 
effectively an async Iterator).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao commented on pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes

2021-03-04 Thread GitBox


jorgecarleitao commented on pull request #9624:
URL: https://github.com/apache/arrow/pull/9624#issuecomment-790976766


   fwiw, I think that there is a different way of approaching this.
   
   The only reason we are implementing to_isize/to_usize on NativeTpe is 
because we have a function to represent an array (for `Display`) that accepts a 
generic physical type T, and then tries to convert it to a `isize` depending on 
a logical type (`DataType::Date`). However, there is already a Many to one 
relationship between logical and physical types.
   
   Thus, a solution for this is to have the display function branch off 
depending on the (logical) datatype, implementing the custom string 
representation depending on it, instead of having a loop of native type T and 
then branching off according to the DataType inside the loop.
   
   I.e. instead of 
   
   ```rust
   for i in ... {
  match data_type {
DataType::Date32 => represent array[i] as date
DataType::Int32 => represent array[i] as int
  }
   }
   ```
   
   imo we should have 
   
   ```rust
   match data_type {
DataType::Date32 => for i in ... {represent array[i] as date}
DataType::Int32 => for i in ... {represent array[i] as int}
   }
   ```
   
   i.e. treat the `Display` as any other "kernel", where behavior is logical, 
not physical, type-dependent.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


Dandandan edited a comment on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611


   @yordan-pavlov 
   Would be very interested in a faster parquet reader.
   Was doing some benchmarking with some queries in DataFusion with Parquet, 
looks like there is a lot to win in Parquet when looking at profiling results. 
For example, query 5 showing about 50% of instruction fetches are for reading 
parquet:
   
   
![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png)
   
   And reading all the parquet files of the benchmark to memory (and running 
some queries) (left inclusive % instruction fetch, right self %).:
   
   
![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on a change in pull request #9232: ARROW-10818: [Rust] Implement DecimalType

2021-03-04 Thread GitBox


alamb commented on a change in pull request #9232:
URL: https://github.com/apache/arrow/pull/9232#discussion_r587851545



##
File path: rust/datafusion/src/physical_plan/group_scalar.rs
##
@@ -22,10 +22,12 @@ use std::convert::{From, TryFrom};
 
 use crate::error::{DataFusionError, Result};
 use crate::scalar::ScalarValue;
+use arrow::datatypes::Decimal128Type;
 
 /// Enumeration of types that can be used in a GROUP BY expression
 #[derive(Debug, PartialEq, Eq, Hash, Clone)]
 pub(crate) enum GroupByScalar {
+Decimal128(i128, usize, usize),

Review comment:
   Awesome -- thank you  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9634: ARROW-11864: [R] Document arrow.int64_downcast option

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9634:
URL: https://github.com/apache/arrow/pull/9634#issuecomment-790968132


   https://issues.apache.org/jira/browse/ARROW-11864



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on a change in pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


alamb commented on a change in pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#discussion_r587849026



##
File path: rust/arrow/src/array/array_binary.rs
##
@@ -258,6 +258,8 @@ where
 }
 }
 
+// calculate actual data_len, which may be different from the 
iterator's upper bound
+let data_len = offsets.len() - 1;

Review comment:
   https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint  
says
   
   > It is not enforced that an iterator implementation yields the declared 
number of elements. A buggy iterator may yield less than the lower bound or 
more than the upper bound of elements.
   
   > size_hint() is primarily intended to be used for optimizations such as 
reserving space for the elements of the iterator, but must not be trusted to 
e.g., omit bounds checks in unsafe code. An incorrect implementation of 
size_hint() should not lead to memory safety violations.
   
   I figured i would paste that into this PR as I had looked it up to satisfy 
my own curiosity





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ovr commented on a change in pull request #9232: ARROW-10818: [Rust] Implement DecimalType

2021-03-04 Thread GitBox


ovr commented on a change in pull request #9232:
URL: https://github.com/apache/arrow/pull/9232#discussion_r587848244



##
File path: rust/datafusion/src/physical_plan/group_scalar.rs
##
@@ -22,10 +22,12 @@ use std::convert::{From, TryFrom};
 
 use crate::error::{DataFusionError, Result};
 use crate::scalar::ScalarValue;
+use arrow::datatypes::Decimal128Type;
 
 /// Enumeration of types that can be used in a GROUP BY expression
 #[derive(Debug, PartialEq, Eq, Hash, Clone)]
 pub(crate) enum GroupByScalar {
+Decimal128(i128, usize, usize),

Review comment:
   > your goal is to implement the Decimal128 and Decimal256 types
   
   right. In this PR only `Decimal128 `, `Decimal256 ` will be in next PR

##
File path: rust/datafusion/src/physical_plan/group_scalar.rs
##
@@ -22,10 +22,12 @@ use std::convert::{From, TryFrom};
 
 use crate::error::{DataFusionError, Result};
 use crate::scalar::ScalarValue;
+use arrow::datatypes::Decimal128Type;
 
 /// Enumeration of types that can be used in a GROUP BY expression
 #[derive(Debug, PartialEq, Eq, Hash, Clone)]
 pub(crate) enum GroupByScalar {
+Decimal128(i128, usize, usize),

Review comment:
   > your goal is to implement the Decimal128 and Decimal256 types
   
   right. In this PR only `Decimal128 `, `Decimal256 ` will be in the next PR





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on a change in pull request #9232: ARROW-10818: [Rust] Implement DecimalType

2021-03-04 Thread GitBox


alamb commented on a change in pull request #9232:
URL: https://github.com/apache/arrow/pull/9232#discussion_r587847557



##
File path: rust/datafusion/src/physical_plan/group_scalar.rs
##
@@ -22,10 +22,12 @@ use std::convert::{From, TryFrom};
 
 use crate::error::{DataFusionError, Result};
 use crate::scalar::ScalarValue;
+use arrow::datatypes::Decimal128Type;
 
 /// Enumeration of types that can be used in a GROUP BY expression
 #[derive(Debug, PartialEq, Eq, Hash, Clone)]
 pub(crate) enum GroupByScalar {
+Decimal128(i128, usize, usize),

Review comment:
   I am not sure @ovr  -- I have sort of lost the context of this PR in my 
head. When you are ready let me know and I can find time to review this again.
   
   Just to be super clear, your goal is to implement the Decimal128 and 
Decimal256 types that are defined in the Arrow spec, right? Not an extension?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-790966208


   Revision: 6f70c9911262d5d956c7f4ded49647269af848d9
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-175](https://github.com/ursacomputing/crossbow/branches/all?query=actions-175)
   
   |Task|Status|
   ||--|
   
|test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-175-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-175-azure-test-r-minimal-build)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9634: ARROW-11864: [R] Document arrow.int64_downcast option

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9634:
URL: https://github.com/apache/arrow/pull/9634#issuecomment-790966162


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


Dandandan edited a comment on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611


   @yordan-pavlov 
   Would be very interested in a faster parquet reader.
   Was doing some benchmarking with some queries in DataFusion with Parquet, 
looks like there is a lot to win in Parquet when looking at profiling results. 
For example, query 5:
   
   
![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png)
   
   And reading all the parquet files of the benchmark to memory (and running 
some queries) (left inclusive % instruction fetch, right self %).:
   
   
![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


kou commented on pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#issuecomment-790965856


   Oh, sorry for my late review...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


kou commented on a change in pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#discussion_r584365689



##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Using vcpkg to find dependencies")
+
+# --
+# Define macros
+
+# macro to list subdirectirectories (non-recursive)
+macro(list_subdirs SUBDIRS DIR)
+  file(GLOB children_ RELATIVE ${DIR} ${DIR}/*)
+  set(subdirs_ "")
+  foreach(child_ ${children_})
+if(IS_DIRECTORY "${DIR}/${child_}")
+  list(APPEND subdirs_ ${child_})
+endif()
+  endforeach()
+  set("${SUBDIRS}" ${subdirs_})
+  unset(children_)
+  unset(subdirs_)
+endmacro()
+
+# --
+# Get VCPKG_ROOT
+
+if(DEFINED CMAKE_TOOLCHAIN_FILE)
+  # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE
+  get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME)
+  if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL 
"vcpkg.cmake")
+get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" 
DIRECTORY)
+get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." 
ABSOLUTE)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg toolchain file not found at path specified in 
-DCMAKE_TOOLCHAIN_FILE")
+  endif()
+else()
+  if(DEFINED VCPKG_ROOT)
+# Get it from the CMake variable VCPKG_ROOT
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(FATAL_ERROR "vcpkg not found in directory specified in 
-DVCPKG_ROOT")
+endif()
+  elseif(DEFINED ENV{VCPKG_ROOT})
+# Get it from the environment variable VCPKG_ROOT
+set(VCPKG_ROOT ENV{VCPKG_ROOT})
+find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+if(NOT _VCPKG_BIN)
+  message(
+FATAL_ERROR "vcpkg not found in directory in environment variable 
VCPKG_ROOT")
+endif()
+  else()
+# Get it from the file vcpkg.path.txt
+find_program(_VCPKG_BIN vcpkg)
+if(_VCPKG_BIN)
+  get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH)
+  get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY)
+else()
+  if(CMAKE_HOST_WIN32)
+set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt")
+  else()
+set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt")
+  endif()
+  if(EXISTS "${_VCPKG_PATH_TXT}")
+file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT)
+  else()
+message(
+  FATAL_ERROR
+"vcpkg not found. Install vcpkg if not installed, "
+"then run vcpkg integrate install or set environment variable 
VCPKG_ROOT.")
+  endif()
+  find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH)
+  if(NOT _VCPKG_BIN)
+message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install "
+"or set environment variable VCPKG_ROOT.")
+  endif()
+endif()
+  endif()
+  set(CMAKE_TOOLCHAIN_FILE
+  "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake"
+  CACHE STRING "Path to vcpkg CMake toolchain file")

Review comment:
   ```suggestion
 CACHE FILEPATH "Path to vcpkg CMake toolchain file" FORCE)
   ```
   
   Do we need `CACHE ...` here? `set(CMAKE_TOOLCHAIN_FILE "${VCPKG_ROOT}/...")` 
doesn't work?
   (Sorry. I didn't confirm it.)

##
File path: cpp/cmake_modules/Usevcpkg.cmake
##
@@ -0,0 +1,214 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT 

[GitHub] [arrow] Dandandan edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


Dandandan edited a comment on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611


   @yordan-pavlov 
   Would be very interested in a faster parquet reader.
   Was doing some benchmarking with some queries in DataFusion with Parquet, 
looks like there is a lot to win in Parquet when looking at profiling results. 
For example, query 5:
   
   
![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png)
   
   And reading all the parquet files of the benchmark to memory (left inclusive 
% instruction fetch, right self %).:
   
   
![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG

2021-03-04 Thread GitBox


ianmcook commented on pull request #9553:
URL: https://github.com/apache/arrow/pull/9553#issuecomment-790963128


   @kou let me know if there is anything I can do to help you complete your 
review. Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Dandandan commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator

2021-03-04 Thread GitBox


Dandandan commented on pull request #9588:
URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611


   @yordan-pavlov 
   Would be very interested in a faster parquet reader.
   Was doing some benchmarking with some queries in DataFusion with Parquet, 
looks like there is a lot to win in Parquet when looking at profiling results. 
For example, query 5:
   
   
![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png)
   
   And reading all the parquet files to memory (left inclusive % instruction 
fetch, right self %).:
   
   
![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


nealrichardson commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587841807



##
File path: dev/tasks/tasks.yml
##
@@ -1789,6 +1789,18 @@ tasks:
   r_image: r-base
   r_tag: 3.6-opensuse42
   not_cran: "TRUE"
+  
+  test-r-minimal-build:

Review comment:
   But it is minimal in the sense that the cpp minimal build in here is. I 
think it's fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


nealrichardson commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-790954184


   > @nealrichardson do you want `LIBARROW_MINIMAL` to toggle Dataset and 
Parquet? I assume no.
   
   Correct, we don't want to change that meaning (yet).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] trucnguyenlam commented on pull request #9489: ARROW-11497: [Python] Provide parquet enable compliant nested type flag for python binding

2021-03-04 Thread GitBox


trucnguyenlam commented on pull request #9489:
URL: https://github.com/apache/arrow/pull/9489#issuecomment-790932818


   @jorisvandenbossche could you please have a look at the PR? cheers



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on a change in pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#discussion_r587812949



##
File path: dev/tasks/tasks.yml
##
@@ -1789,6 +1789,18 @@ tasks:
   r_image: r-base
   r_tag: 3.6-opensuse42
   not_cran: "TRUE"
+  
+  test-r-minimal-build:

Review comment:
   TODO: Use a different word because "minimal" already has a specific 
meaning (`LIBARROW_MINIMAL`) that's not this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-790929111


   @nealrichardson do you want `LIBARROW_MINIMAL` to toggle Dataset and 
Parquet? I assume no.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] msummersgill opened a new pull request #9634: Update R Package Arrow.Rmd vignette to document option(arrow.int64_downcast)

2021-03-04 Thread GitBox


msummersgill opened a new pull request #9634:
URL: https://github.com/apache/arrow/pull/9634


   Per discussion on https://issues.apache.org/jira/browse/ARROW-9083



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-790925316


   @github-actions crossbow submit test-r-minimal-build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9633: ARROW-11560: [C++][FlightRPC] fix mutex error on SIGINT

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9633:
URL: https://github.com/apache/arrow/pull/9633#issuecomment-790919414


   https://issues.apache.org/jira/browse/ARROW-11560



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #9606: ARROW-10405: [C++] IsIn kernel should be able to lookup dictionary in string

2021-03-04 Thread GitBox


rok commented on a change in pull request #9606:
URL: https://github.com/apache/arrow/pull/9606#discussion_r587790806



##
File path: cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc
##
@@ -231,6 +252,25 @@ TEST_F(TestIsInKernel, Decimal) {
 /*skip_nulls=*/true);
 }
 
+TEST_F(TestIsInKernel, DictionaryArray) {
+  for (auto index_ty : all_dictionary_index_types()) {
+CheckIsInDictionary(/*type=*/utf8(),
+/*index_type=*/index_ty,
+/*input_dictionary_json=*/R"(["A", "B", "C", "D"])",
+/*input_index_json=*/"[1, 2, null, 0]",
+/*value_set_json=*/R"(["A", "B", "C"])",
+/*expected_json=*/"[true, true, false, true]",
+/*skip_nulls=*/false);
+CheckIsInDictionary(/*type=*/float32(),

Review comment:
   Added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on a change in pull request #9606: ARROW-10405: [C++] IsIn kernel should be able to lookup dictionary in string

2021-03-04 Thread GitBox


rok commented on a change in pull request #9606:
URL: https://github.com/apache/arrow/pull/9606#discussion_r587790691



##
File path: cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc
##
@@ -72,6 +72,27 @@ void CheckIsInChunked(const std::shared_ptr& 
input,
   AssertChunkedEqual(*expected, *actual);
 }
 
+void CheckIsInDictionary(const std::shared_ptr& type,
+ const std::shared_ptr& index_type,
+ const std::string& input_dictionary_json,
+ const std::string& input_index_json,
+ const std::string& value_set_json,
+ const std::string& expected_json, bool skip_nulls = 
false) {
+  auto dict_type = dictionary(index_type, type);
+  auto indices = ArrayFromJSON(index_type, input_index_json);
+  auto dict = ArrayFromJSON(type, value_set_json);

Review comment:
   Ugh, thanks for catching that. Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #9630: ARROW-11860: [Rust] [DataFusion] Add DataFusion logos

2021-03-04 Thread GitBox


github-actions[bot] commented on pull request #9630:
URL: https://github.com/apache/arrow/pull/9630#issuecomment-790895297


   https://issues.apache.org/jira/browse/ARROW-11860



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ericwburden commented on pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes

2021-03-04 Thread GitBox


ericwburden commented on pull request #9624:
URL: https://github.com/apache/arrow/pull/9624#issuecomment-790893609


   Sorry, just learned that the "Close" button on my phone didn't mean the 
window...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ericwburden closed pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes

2021-03-04 Thread GitBox


ericwburden closed pull request #9624:
URL: https://github.com/apache/arrow/pull/9624


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb commented on a change in pull request #9494: ARROW-11626: [Rust][DataFusion] Move [DataFusion] examples to own project

2021-03-04 Thread GitBox


alamb commented on a change in pull request #9494:
URL: https://github.com/apache/arrow/pull/9494#discussion_r58078



##
File path: rust/datafusion-examples/Cargo.toml
##
@@ -0,0 +1,38 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+[package]
+name = "datafusion-examples"

Review comment:
   FYI @andygrove  -- I am not sure what you think about possibly releasing 
the examples as a crate of their own. I personally think it would be necessary. 
However perhaps it would be the right way to to eventually to somehow link to 
the example code from the docs.rs page 
https://docs.rs/datafusion/3.0.0/datafusion/
   
   I filed this JIRA https://issues.apache.org/jira/browse/ARROW-11863 to track 
that

##
File path: rust/datafusion/Cargo.toml
##
@@ -71,9 +71,6 @@ unicode-segmentation = "^1.7.1"
 rand = "0.8"
 criterion = "0.3"
 tempfile = "3"
-prost = "0.7"

Review comment:
    





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-04 Thread GitBox


ianmcook commented on pull request #9610:
URL: https://github.com/apache/arrow/pull/9610#issuecomment-790887608


   @github-actions crossbow submit -g r



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >