[GitHub] [arrow] mathyingzhou commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support
mathyingzhou commented on pull request #8648: URL: https://github.com/apache/arrow/pull/8648#issuecomment-791209166 @pitrou Yup I found your changes to Random 6 days ago in Arrow-11662. Things did break after that. Now I’m trying to figure out whether it is my ORC writer or Decimal128 generator that needs to be fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4
liyafan82 commented on a change in pull request #8949: URL: https://github.com/apache/arrow/pull/8949#discussion_r588067931 ## File path: java/compression/pom.xml ## @@ -0,0 +1,56 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + 4.0.0 + +org.apache.arrow +arrow-java-root +4.0.0-SNAPSHOT + + arrow-compression + Arrow Compression + (Experimental/Contrib) A library for working with the compression/decompression of Arrow data. + + + + org.apache.arrow + arrow-format + ${project.version} + + + org.apache.arrow + arrow-vector + ${project.version} + ${arrow.vector.classifier} + + + org.apache.arrow + arrow-memory-core + ${project.version} + + + org.apache.arrow + arrow-memory-unsafe + ${project.version} + test + + + org.apache.commons + commons-compress + 1.20 + + + io.netty Review comment: The `vector` module requires this library. If we do not add it in the dependency list, we get an errow when building with maven: ``` Used undeclared dependencies found: io.netty:netty-common:jar:4.1.48.Final:compile ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4
liyafan82 commented on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-791193697 > @liyafan82 nice work. Left a few comments about API and structure let me know what you think. @emkornfield Thanks a lot for your comments. I will resolve them one by one ASAP. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on issue #9628: write_feather incorrectly deletes files
emkornfield commented on issue #9628: URL: https://github.com/apache/arrow/issues/9628#issuecomment-791183396 First one seems reasonable to me. Second one, I'm not sure about it. This seems intentional: https://github.com/apache/arrow/commit/96f3d6176d8c95717f4ff45e4226161de3168b05 but this could be revisited (maybe discuss on dev@ mailing list before investing time to fix) Please open JIRAs for each when you get chance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4
emkornfield commented on a change in pull request #8949: URL: https://github.com/apache/arrow/pull/8949#discussion_r588044636 ## File path: java/compression/src/main/java/org/apache/arrow/compression/CommonsCompressionFactory.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.compression; + +import org.apache.arrow.flatbuf.CompressionType; +import org.apache.arrow.vector.compression.CompressionCodec; +import org.apache.arrow.vector.compression.NoCompressionCodec; + +/** + * A factory implementation based on Apache Commons library. + */ +public class CommonsCompressionFactory implements CompressionCodec.Factory { + + public static final CommonsCompressionFactory INSTANCE = new CommonsCompressionFactory(); + + @Override + public CompressionCodec createCodec(byte codecType) { Review comment: It seems like byte isn't the right interface here. Can we decouple from the flatbuf enum for the factory? ## File path: java/compression/src/main/java/org/apache/arrow/compression/Lz4CompressionCodec.java ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; + +import org.apache.arrow.flatbuf.CompressionType; +import org.apache.arrow.memory.ArrowBuf; +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.memory.util.MemoryUtil; +import org.apache.arrow.util.Preconditions; +import org.apache.arrow.vector.compression.CompressionCodec; +import org.apache.arrow.vector.compression.CompressionUtil; +import org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorInputStream; +import org.apache.commons.compress.compressors.lz4.FramedLZ4CompressorOutputStream; +import org.apache.commons.compress.utils.IOUtils; + +import io.netty.util.internal.PlatformDependent; + +/** + * Compression codec for the LZ4 algorithm. + */ +public class Lz4CompressionCodec implements CompressionCodec { + + @Override + public ArrowBuf compress(BufferAllocator allocator, ArrowBuf uncompressedBuffer) { +Preconditions.checkArgument(uncompressedBuffer.writerIndex() <= Integer.MAX_VALUE, +"The uncompressed buffer size exceeds the integer limit"); + +if (uncompressedBuffer.writerIndex() == 0L) { + // shortcut for empty buffer + ArrowBuf compressedBuffer = allocator.buffer(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH); + compressedBuffer.setLong(0, 0); + compressedBuffer.writerIndex(CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH); + uncompressedBuffer.close(); + return compressedBuffer; +} + +try { + ArrowBuf compressedBuffer = doCompress(allocator, uncompressedBuffer); + long compressedLength = compressedBuffer.writerIndex() - CompressionUtil.SIZE_OF_UNCOMPRESSED_LENGTH; + if (compressedLength > uncompressedBuffer.writerIndex()) { +// compressed buffer is larger, send the raw buffer +compressedBuffer.close(); +compressedBuffer = CompressionUtil.compressRawBuffer(allocator, uncompressedBuffer); + } + + uncompressedBuffer.close(); + return compressedBuffer; +} catch (IOException e) { + throw new RuntimeException(e); +} + } + + private ArrowBuf doCompress(BufferAllocator allocator,
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791158165 Revision: 6eda5896c638ca1c7af577c02ec56bd802a2fe42 Submitted crossbow builds: [ursacomputing/crossbow @ actions-186](https://github.com/ursacomputing/crossbow/branches/all?query=actions-186) |Task|Status| ||--| |test-r-without-dataset-parquet-s3|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-186-azure-test-r-without-dataset-parquet-s3)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-186-azure-test-r-without-dataset-parquet-s3)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791158004 Revision: 6eda5896c638ca1c7af577c02ec56bd802a2fe42 Submitted crossbow builds: [ursacomputing/crossbow @ actions-185](https://github.com/ursacomputing/crossbow/branches/all?query=actions-185) |Task|Status| ||--| |conda-linux-gcc-py36-cpu-r36|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-linux-gcc-py36-cpu-r36)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-linux-gcc-py36-cpu-r36)| |conda-linux-gcc-py37-cpu-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-linux-gcc-py37-cpu-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-linux-gcc-py37-cpu-r40)| |conda-osx-clang-py36-r36|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-osx-clang-py36-r36)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-osx-clang-py36-r36)| |conda-osx-clang-py37-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-osx-clang-py37-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-osx-clang-py37-r40)| |conda-win-vs2017-py36-r36|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-win-vs2017-py36-r36)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-win-vs2017-py36-r36)| |conda-win-vs2017-py37-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-conda-win-vs2017-py37-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-conda-win-vs2017-py37-r40)| |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-185-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-185-github-homebrew-r-autobrew)| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-185-github-test-r-linux-as-cran)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-185-github-test-r-linux-as-cran)| |test-r-rhub-ubuntu-gcc-release|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rhub-ubuntu-gcc-release)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rhub-ubuntu-gcc-release)| |test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rocker-r-base-latest)| |test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-bionic)| |test-r-rstudio-r-base-3.6-centos7-devtoolset-8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)| |test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-centos8)| |test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-185-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-185-azure-test-r-rstudio-r-base-3.6-opensuse15)|
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791155609 @github-actions crossbow submit test-r-without-dataset-parquet-s3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook removed a comment on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook removed a comment on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791154080 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791154662 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791154080 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] michalursa commented on pull request #9621: ARROW-11591: [C++][Compute] Grouped aggregation
michalursa commented on pull request #9621: URL: https://github.com/apache/arrow/pull/9621#issuecomment-791153213 > Before digging into the details too much, my main issue with what I see is that I don't agree with making hash aggregation a callable function through `CallFunction`. > > In the context of a query engine, the interface for this operator looks something like: > > ``` > class ExecNode { > public: > virtual void Push(int node_index, const ExecBatch& batch) = 0; > virtual void PushDone(int node_index) = 0; > }; > > class HashAggregationNode : public ExecNode { > ... > }; > ``` > > (some query engines use a "pull"-based model, in which the data flow is inverted — there are pros and cons to both approaches, see https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf) > > If you want a simple one-shot version of the algorithm (rather than a general "streaming" one like the above), then you can break the input data into record batches of the desired size (e.g. 4K - 64K rows, depending on heuristics) and then push the chunks into the node that you create (note that the HashAggregationNode should push its result into a terminal "OutputNode" when you invoke `hash_agg_node->PushDone(0)`). > > The API can be completely crude / preliminary, but would it be possible to use the query-engine-type approach for this? I think it would be best to start taking some strides in this direction rather than bolting this onto the array function execution machinery which doesn't make sense in a query processing context (because aggregation is fundamentally a streaming algorithm) > > On the hash aggregation functions themselves, perhaps it makes sense to add a `HASH_AGGREGATE` function type and define the kernel interface for these functions, then look up these functions using the general dispatch machinery? I like these points. Let me break into subtopics what I think we are facing now and what we should think about for the future. I read the comment above as advocating bringing support for two concepts: a) relational operators (that are building blocks of a query execution pipeline or a more general query execution plan) and b) pipelines / streaming processing. **1. How we got here** Part of the problem is that we don't have a problem just yet. We didn't bite a big enough chunk of work to force us to do the appropriate refactoring. And the problem I am referring to is that of pipelining, which we do not have support for yet. From what I understand, currently we cannot bind together multiple operations into a single processing pipeline executed using a single Arrow function call, computing expressions on the fly without persisting their results for an entire set of rows. Related to the pipelining is the fact that in this PR we do not stream the group by output. That way, at some level of abstraction (squinting eyes) we can treat it as a scalar aggregate (we output a single item which happens to be an entire collection of arrays; variants make it possible). But to me the bigger problem here is that we are mixing together two separate worlds: scalar operators and relational operators and that muddies the general picture. That mixing happens as a consequence of treating the group by as a modified scalar aggregate in the code. ** 2. Scalar operators vs relational operators** One way to think about it is that scalar expressions (compositions of scalar operators) are like C++ lambdas (e.g. comparison of two values) provided in a call to C++ STL algorithms / containers (e.g. std::sort) while relational operators correspond to these algorithms / containers (except that they additionally have support for composability - creating execution pipelines / trees / DAGs). The point of confusion may come from the fact that once you vectorize scalar expressions (which you probably want to do, for performance reasons), their interfaces start looking very similar (if not the same) as would some special relational operators (namely: filter, project and scalar aggregate). I claim that current kernel executor classes are relational operators in disguise - project (aka compute scalar) and reduce (aka scalar aggregate). Relational operators inside current group by Interestingly, group by present in this PR, can itself be treated as a DAG of multiple relational operators with some kind of pipelines connecting them. The input batch is first processed by the code that assigns integer group id to every row based on key columns. The output of that is then processed by zero, one or more group by aggregate operators that update aggregate related accumulators in their internal arrays. At the end of the input, the output array related to group id mapping component is concatenated to output arrays for individual group by aggregate operators to produce output
[GitHub] [arrow] xuanqing94 commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?
xuanqing94 commented on issue #9325: URL: https://github.com/apache/arrow/issues/9325#issuecomment-791152745 @joeyac Sorry to bring it up again... but I am deeply confused about this line of code: `auto column_struct = std::static_pointer_cast(table->GetColumnByName("column")->chunk(0)); ` since GetColumnByName returns a shared_ptr to ChunkedArray, would accessing only the chunk(0) returns incomplete data? like if the data file is huge and truncated (from the document I couldn't really get the idea). Could you say a bit more about this? Thanks a lot! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #9633: ARROW-11560: [C++][FlightRPC] fix mutex error on SIGINT
cyb70289 commented on pull request #9633: URL: https://github.com/apache/arrow/pull/9633#issuecomment-791150699 Also update PR comment to match latest code? `semaphore` -> `pipe` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook removed a comment on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook removed a comment on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791139873 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588021512 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: The build looks good after 9479c6c: https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-184-azure-test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791139873 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
kou closed pull request #9553: URL: https://github.com/apache/arrow/pull/9553 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791117825 Revision: 9479c6ce079c7230b0e290dcb5658436d1d62fde Submitted crossbow builds: [ursacomputing/crossbow @ actions-184](https://github.com/ursacomputing/crossbow/branches/all?query=actions-184) |Task|Status| ||--| |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-184-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-184-azure-test-r-minimal-build)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791117362 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587998671 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Ok, thanks. I tried a few different things just for kicks, then used `-e` per your suggestion in 9479c6ce079c7230b0e290dcb5658436d1d62fde This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791110809 Revision: 4068f471efc55584dbfea5df454af19dfcb1492d Submitted crossbow builds: [ursacomputing/crossbow @ actions-183](https://github.com/ursacomputing/crossbow/branches/all?query=actions-183) |Task|Status| ||--| |test-r-install-macos|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-183-github-test-r-install-macos)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-183-github-test-r-install-macos)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791106222 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791104150 Revision: a810b3fce373c7f5090e367e5a6afc70db60414a Submitted crossbow builds: [ursacomputing/crossbow @ actions-182](https://github.com/ursacomputing/crossbow/branches/all?query=actions-182) |Task|Status| ||--| |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-182-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-182-azure-test-r-minimal-build)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791103746 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791095801 Revision: 27c514cb1b14e7125e8ba50714c6480c93dc231a Submitted crossbow builds: [ursacomputing/crossbow @ actions-181](https://github.com/ursacomputing/crossbow/branches/all?query=actions-181) |Task|Status| ||--| |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-181-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-181-azure-test-r-minimal-build)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook removed a comment on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook removed a comment on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-790887608 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791095464 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587961901 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Ah yeah, I think it's the same problem--note (for comparison) that `R_ORG` et al. are in the .env file, and they're environment variables for docker-compose, not passed into the docker container. The fact that `ARROW_R_DEV` is in the env file, and then passed through in docker-compose.yml to the build environments, may just be due to my ignorance of all of this at the time I added it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
github-actions[bot] commented on pull request #9553: URL: https://github.com/apache/arrow/pull/9553#issuecomment-791077751 Revision: 2e6905a7262a6139c52c4007fd95f58b378e71ca Submitted crossbow builds: [ursacomputing/crossbow @ actions-180](https://github.com/ursacomputing/crossbow/branches/all?query=actions-180) |Task|Status| ||--| |test-build-vcpkg-win|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-180-github-test-build-vcpkg-win)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-180-github-test-build-vcpkg-win)| |wheel-manylinux2010-cp36m|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-180-github-wheel-manylinux2010-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-180-github-wheel-manylinux2010-cp36m)| |wheel-windows-cp36m|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-180-github-wheel-windows-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-180-github-wheel-windows-cp36m)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs
nealrichardson closed pull request #9442: URL: https://github.com/apache/arrow/pull/9442 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587956206 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Ok, if folks might be running these commands locally then I can see how that would cause damage. Looks like it didn't work anyhow. I will investigate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587955204 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") Review comment: Changed to `FILEPATH` in 32295895be14b799eba9f3150c4c7b973e14dee3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on pull request #9553: URL: https://github.com/apache/arrow/pull/9553#issuecomment-791072177 @github-actions crossbow submit test-build-vcpkg-win wheel-manylinux2010-cp36m wheel-windows-cp36m This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587952686 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") Review comment: Oh, I see, thanks. I'll make that change to `FILEPATH` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791061098 Revision: 9856d8aa91f4f334897e00333b560150fbe54296 Submitted crossbow builds: [ursacomputing/crossbow @ actions-179](https://github.com/ursacomputing/crossbow/branches/all?query=actions-179) |Task|Status| ||--| |test-r-install-macos|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-179-github-test-r-install-macos)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-179-github-test-r-install-macos)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791044239 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs
nealrichardson commented on pull request #9442: URL: https://github.com/apache/arrow/pull/9442#issuecomment-791043943 @pitrou can you give this a review/merge please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] martinblostein edited a comment on pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs
martinblostein edited a comment on pull request #9442: URL: https://github.com/apache/arrow/pull/9442#issuecomment-791042920 Is there anything I can do here? Seems like a straightforward change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] martinblostein commented on pull request #9442: ARROW-6818: [DOC] Remove reference to Apache Drill design docs
martinblostein commented on pull request #9442: URL: https://github.com/apache/arrow/pull/9442#issuecomment-791042920 Is there anything I can do here? Seems like low-hanging fruit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
kou commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587924707 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") +endif() +message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}") +message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}") + +# -- +# Get VCPKG_TARGET_TRIPLET + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET) + set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}") +endif() +# Explicitly set manifest mode on if it is not set and vcpkg.json exists +if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json") + set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest") + message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON") +endif() +# vcpkg can install packages in two different places +set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here first Review comment: Oh, sorry. This is an
[GitHub] [arrow] kou commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
kou commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587924544 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") Review comment: Thanks. I understand. For `FILEPATH`, `CMAKE_TOOLCHAIN_FILE` is for CMake not vcpkg. It's defined as `FILEPATH` in `Modules/CMakeDetermineSystem.cmake`: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/CMakeDetermineSystem.cmake#L126-131 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets
nealrichardson commented on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791039254 I'll add a couple more tests, particularly around error handling, but I'd like to move on to other issues and get this merged so that others can test it out more widely. cc @ianmcook @jonkeane This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
github-actions[bot] commented on pull request #9553: URL: https://github.com/apache/arrow/pull/9553#issuecomment-791037231 Revision: d7f0e2236b55ce0ecd2815ccd07e7235288f385b Submitted crossbow builds: [ursacomputing/crossbow @ actions-178](https://github.com/ursacomputing/crossbow/branches/all?query=actions-178) |Task|Status| ||--| |test-build-vcpkg-win|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-178-github-test-build-vcpkg-win)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-178-github-test-build-vcpkg-win)| |wheel-manylinux2010-cp36m|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-178-github-wheel-manylinux2010-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-178-github-wheel-manylinux2010-cp36m)| |wheel-windows-cp36m|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-178-github-wheel-windows-cp36m)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-178-github-wheel-windows-cp36m)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on pull request #9553: URL: https://github.com/apache/arrow/pull/9553#issuecomment-791036882 @github-actions crossbow submit test-build-vcpkg-win wheel-manylinux2010-cp36m wheel-windows-cp36m This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587918983 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: I think the only danger to appending to the .env file is that if you ran this locally, wouldn't it modify the file locally? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791036468 Revision: 318371fbaa07f1b6fcdf71f58d7d5460a931da13 Submitted crossbow builds: [ursacomputing/crossbow @ actions-177](https://github.com/ursacomputing/crossbow/branches/all?query=actions-177) |Task|Status| ||--| |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-177-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-177-azure-test-r-minimal-build)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791036107 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587917624 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Oops, need to append not overwrite. Fixed in 318371fbaa07f1b6fcdf71f58d7d5460a931da13 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587917115 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Got it, thanks. Let's see if it works to add them to `.env` (066ce065124cf7dcaa08225cae83177594da3965) and if not then I'll do it with `-e`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791031621 Revision: d1f7611e496c99c7c8d9f32c4b0d35fe4fee0b38 Submitted crossbow builds: [ursacomputing/crossbow @ actions-176](https://github.com/ursacomputing/crossbow/branches/all?query=actions-176) |Task|Status| ||--| |test-r-install-macos|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-176-github-test-r-install-macos)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-176-github-test-r-install-macos)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791031149 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791027678 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587899030 ## File path: run-cmake-format.py ## @@ -52,6 +52,7 @@ 'cpp/cmake_modules/FindRapidJSONAlt.cmake', 'cpp/cmake_modules/FindSnappyAlt.cmake', 'cpp/cmake_modules/FindThrift.cmake', +'cpp/cmake_modules/Usevcpkg.cmake', Review comment: Done in d7f0e2236b55ce0ecd2815ccd07e7235288f385b This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791019223 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587898227 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") +endif() +message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}") +message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}") + +# -- +# Get VCPKG_TARGET_TRIPLET + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET) + set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}") +endif() +# Explicitly set manifest mode on if it is not set and vcpkg.json exists +if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json") + set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest") + message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON") +endif() +# vcpkg can install packages in two different places +set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here first Review comment: Fixed in 8543407dd614a38cd0a577063e8d790cc07e044f
[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587897840 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Unfortunately this didn't pass through all the way: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=1721=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=240 Setting them here affects the environment in which docker-compose is called, but it doesn't forward into the docker environment. I believe what you instead want is to pass them into docker-compose below, something like `docker-compose -e ARROW_DATASET={{ arrow_dataset|default("ON") }} -e ARROW_PARQUET={{ arrow_parquet|default("ON") }} -e ARROW_S3={{ arrow_s3|default("ON") }} r` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791014063 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587897840 ## File path: dev/tasks/r/azure.linux.yml ## @@ -47,6 +47,9 @@ jobs: export R_ORG={{ r_org }} export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} +export ARROW_DATASET={{ arrow_dataset|default("ON") }} +export ARROW_PARQUET={{ arrow_parquet|default("ON") }} Review comment: Unfortunately this didn't pass through all the way: https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=1721=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=240 I believe what you instead want is to pass them into docker-compose below, something like `docker-compose -e ARROW_DATASET={{ arrow_dataset|default("ON") }} -e ARROW_PARQUET={{ arrow_parquet|default("ON") }} -e ARROW_S3={{ arrow_s3|default("ON") }} r` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587896214 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") +endif() +message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}") +message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}") + +# -- +# Get VCPKG_TARGET_TRIPLET + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET) + set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}") +endif() +# Explicitly set manifest mode on if it is not set and vcpkg.json exists +if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json") + set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest") + message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON") +endif() +# vcpkg can install packages in two different places +set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here first Review comment: Whoops, this is supposed to be `CMAKE_CURRENT_BINARY_DIR` not `CMAKE_CURRENT_BUILD_DIR`
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587889678 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") Review comment: I am leery of changing any of these `STRING`s to `FILEPATH`s because vcpkg expects everything to be strings This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587887231 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") +endif() +message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}") +message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}") + +# -- +# Get VCPKG_TARGET_TRIPLET + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET) + set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}") +endif() +# Explicitly set manifest mode on if it is not set and vcpkg.json exists +if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json") + set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest") + message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON") +endif() +# vcpkg can install packages in two different places +set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here first Review comment: Implemented this change in aeda9041d161ca3fdb3c46e6d1c8b5b99847c3a2
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587884449 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") +endif() +message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}") +message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}") + +# -- +# Get VCPKG_TARGET_TRIPLET + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET) + set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}") +endif() +# Explicitly set manifest mode on if it is not set and vcpkg.json exists +if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json") + set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest") + message(STATUS "vcpkg.json manifest found. Using VCPKG_MANIFEST_MODE: ON") +endif() +# vcpkg can install packages in two different places +set(_INST_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg_installed") # try here first Review comment: Thank you, this comment was very helpful because it made me realize that this script
[GitHub] [arrow] nealrichardson commented on pull request #9621: ARROW-11591: [C++][Compute] Grouped aggregation
nealrichardson commented on pull request #9621: URL: https://github.com/apache/arrow/pull/9621#issuecomment-790998553 With an assist from @bkietz, I've written a very basic R wrapper that exercises this in https://github.com/apache/arrow/commit/aa530cb586462bee98390d129575fe1622ffb222. It's enough to expose some issues to address, to say nothing of the interface questions. ```r library(arrow) library(dplyr) # The commit uses this option to switch to use the group_by compute function options(arrow.summarize = TRUE) # If the Arrow aggregation function isn't implemented, or if the Arrow call errors, # it falls back to pulling the data in R and evaluating in R. # mtcars is a standard dataset that ships with R mt <- Table$create(mtcars) mt %>% group_by(cyl) %>% summarize(total_hp = sum(hp)) # Warning: Error : NotImplemented: Key of typedouble # ../src/arrow/compute/function.cc:178 kernel_ctx.status() # ; pulling data into R # # A tibble: 3 x 2 # cyl total_hp # * # 1 4 909 # 2 6 856 # 3 8 2929 # That's unfortunate. R blurs the distinction for users between integer and double, # so it's not uncommon to have integer data stored as a float. # (Also, the error message is missing some whitespace.) # We can cast that to an integer and try again mt$cyl <- mt$cyl$cast(int32()) unique(mt$cyl) # Array # # [ # 6, # 4, # 8 # ] mt %>% group_by(cyl) %>% summarize(total_hp = sum(hp)) # StructArray # > # -- is_valid: all not null # -- child 0 type: double # [ # 856, # 909, # 2929 # ] # -- child 1 type: int64 # [ # 17179869190, # 8, # 0 # ] # Alright, it computed and got the same numbers, but the StructArray # is not valid. Type says int32 but data says int64 and we have misplaced bits # Let's try a different stock dataset ir <- Table$create(iris) ir %>% group_by(Species) %>% summarize(total_length = sum(Sepal.Length)) # Warning: Error : NotImplemented: Key of typedictionary # ../src/arrow/compute/function.cc:178 kernel_ctx.status() # ; pulling data into R # # A tibble: 3 x 2 # Speciestotal_length # * # 1 setosa 250. # 2 versicolor 297. # 3 virginica 329. # Hmm. dictionary types really need to be supported. # Let's work around and cast it to string ir$Species <- ir$Species$cast(utf8()) unique(ir$Species) # Array # # [ # "setosa", # "versicolor", # "virginica" # ] ir %>% group_by(Species) %>% summarize(total_length = sum(Sepal.Length)) # Warning: Error : Invalid: Negative buffer resize: -219443965 # ../src/arrow/buffer.cc:262 buffer->Resize(size) # ../src/arrow/compute/kernels/aggregate_basic.cc:1005 (_error_or_value9).status() # ../src/arrow/compute/function.cc:193 executor->Execute(implicitly_cast_args, listener.get()) # ; pulling data into R # # A tibble: 3 x 2 # Speciestotal_length # * # 1 setosa 250. # 2 versicolor 297. # 3 virginica 329. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587872742 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") +endif() +message(STATUS "Using CMAKE_TOOLCHAIN_FILE: ${CMAKE_TOOLCHAIN_FILE}") +message(STATUS "Using VCPKG_ROOT: ${VCPKG_ROOT}") + +# -- +# Get VCPKG_TARGET_TRIPLET + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET} AND NOT DEFINED VCPKG_TARGET_TRIPLET) + set(VCPKG_TARGET_TRIPLET "$ENV{VCPKG_DEFAULT_TRIPLET}") +endif() +# Explicitly set manifest mode on if it is not set and vcpkg.json exists +if(NOT DEFINED VCPKG_MANIFEST_MODE AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json") + set(VCPKG_MANIFEST_MODE ON CACHE BOOL "Use vcpkg.json manifest") Review comment: Yes `CACHE` is required here. Without it, vcpkg does not see the variable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For
[GitHub] [arrow] ianmcook commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r587870368 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") Review comment: `CACHE` is necessary here and and in many of the other places where I used it in this file. I initially attempted to cache none of the variables, but that caused numerous errors in the CMake scripts, in vcpkg, and in the build. I experimented with using `PARENT_SCOPE` but that did not solve most of the problems, so I used `CACHE` for all the variables whose scope of interaction extends beyond this one script. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] yordan-pavlov edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
yordan-pavlov edited a comment on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790979386 @Dandandan I would be happy to collaborate on this; I have been using MS Visual Studio for profiling DataFusion and Arrow, and most of the time it works fairly well and gives useful insight. From what I have observed, after my change to push filers down to parquet (by filtering out entire row groups), now about half the time is spent in `ComplexObjectArrayReader::next_batch`; then inside this method * about 16% of total runtime is spent on `data_buffer.resize_with(batch_size, T::T::default);` where in the case of `StringArray`, `data_bufer` is `Vec` * about 8% of total time is spent on `data_buffer.truncate(num_read);` * about 10% of total time is spent in `data_buffer.into_iter().zip(self.def_levels_buffer.as_ref().unwrap().iter()).map(...).collect()` * another 10% of total time is spent in `let mut array = self.converter.convert(data)?;` - this is where the `Utf8ArrayConverter` is used * more time is also spent at the very end of the `next_batch` function, I imagine to deallocate the large number of `ByteArray` objects created I have managed to achieve about 10-15% improvement by replacing `data_buffer.truncate(num_read)` with using an iterator instead, but there is a lot more work to do. As I commented above, there is a number of improvements that could be done: * not converting into intermediate `ByteArray` objects on the way into a `StringArray` should result in a significant improvement * also using iterators to fetch data loaded from parquet, instead of writing values in pre-allocated arrays should avoid a lot of unnecessary allocation * for optimal performance, ideally not do any intermediate conversion at all, and just copy bytes slices from an iterator (over parquet data pages) into an arrow array All of these changes, I think, will propagate all the way to the low-level decoders and will take time, but when done should put us in a good position to transition to `async` using `Stream`s (my understanding is that a `Stream` is effectively an `async Iterator`). I am still figuring out how to make this work exactly and how to split the work into smaller pieces which could be done over time. I hope to be able to make more progress on this in the next few days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] yordan-pavlov edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
yordan-pavlov edited a comment on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790979386 @Dandandan I would be happy to collaborate on this; I have been using MS Visual Studio for profiling DataFusion and Arrow, and most of the time it works fairly well and gives useful insight. From what I have observed, after my change to push filers down to parquet (by filtering out entire row groups), now about half the time is spent in `ComplexObjectArrayReader::next_batch`; then inside this method * about 16% of total runtime is spent on `data_buffer.resize_with(batch_size, T::T::default);` where in the case of `StringArray`, `data_bufer` is `Vec` * about 8% of total time is spent on `data_buffer.truncate(num_read);` * about 10% of total time is spent in `data_buffer.into_iter().zip(self.def_levels_buffer.as_ref().unwrap().iter()).map(...).collect()` * another 10% of total time is spent in `let mut array = self.converter.convert(data)?;` - this is where the `Utf8ArrayConverter` is used * more time is also spent at the very end of the `next_batch` function, I imagine to deallocate the large number of `ByteArray` objects created I have managed to achieve about 10-15% improvement by replacing `data_buffer.truncate(num_read)` with using an iterator instead, but there is a lot more work to do. As I commented above, there is a number of improvements that could be done: * not converting into intermediate `ByteArray` objects on the way into a `StringArray` should result in a significant improvement * also using iterators to fetch data loaded from parquet, instead of writing values in pre-allocated arrays should avoid a lot of unnecessary allocation * for optimal performance, ideally not do any intermediate conversion at all, and just copy bytes slices from an iterator (over parquet data pages) into an arrow array All of these changes, I think, will propagate all the way to the low-level decoders and will take time, but when done should put us in a good position to transition to async using Streams (my understanding is that a Stream is effectively an async Iterator). I am still figuring out how to make this work exactly and how to split the work into smaller pieces which could be done over time. I hope to be able to make more progress on this in the next few days. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
jorgecarleitao commented on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790980785 fwiw, I strongly agree with you, @yordan-pavlov on _all_ those marks. I recently went through the `arrow/` module of the parquet crate and concluded exactly the same (including the "takes time" part). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] yordan-pavlov commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
yordan-pavlov commented on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790979386 @Dandandan I would be happy to collaborate on this; I have been using MS Visual Studio for profiling DataFusion and Arrow, and most of the time it works fairly well and gives useful insight. From what I have observed, after my change to push filers down to parquet (by filtering out entire row groups), now about half the time is spent in `ComplexObjectArrayReader::next_batch`; then inside this method * about 16% of total runtime is spent on `data_buffer.resize_with(batch_size, T::T::default);` where in the case of `StringArray`, `data_bufer` is `Vec` * about 8% of total time is spent on `data_buffer.truncate(num_read);` * about 10% of total time is spent in `data_buffer.into_iter().zip(self.def_levels_buffer.as_ref().unwrap().iter()).map(...).collect()` * another 10% of total time is spent in `let mut array = self.converter.convert(data)?;` - this is where the `Utf8ArrayConverter` is used * more time is also spent at the very end of the `next_batch` function, I imagine to deallocate the large number of `ByteArray` objects created I have managed to achieve about 10-15% improvement by replacing `data_buffer.truncate(num_read)` with using an iterator instead, but there is a lot more work to do. As I commented above, there is a number of improvements that could be done: * not converting into intermediate `ByteArray` objects on the way into a `StringArray` should result in a significant improvement * also using iterators to fetch data loaded from parquet, instead of writing values in pre-allocated arrays should avoid a lot of unnecessary allocation * for optimal performance, ideally not do any intermediate conversion at all, and just copy bytes slices from an iterator (over parquet data pages) into an arrow array All of these changes, I think, will propagate all the way to the low-level decoders and will take time, but when done should put us in a good position to transition to async using Streams (my understanding is that a Stream is effectively an async Iterator). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes
jorgecarleitao commented on pull request #9624: URL: https://github.com/apache/arrow/pull/9624#issuecomment-790976766 fwiw, I think that there is a different way of approaching this. The only reason we are implementing to_isize/to_usize on NativeTpe is because we have a function to represent an array (for `Display`) that accepts a generic physical type T, and then tries to convert it to a `isize` depending on a logical type (`DataType::Date`). However, there is already a Many to one relationship between logical and physical types. Thus, a solution for this is to have the display function branch off depending on the (logical) datatype, implementing the custom string representation depending on it, instead of having a loop of native type T and then branching off according to the DataType inside the loop. I.e. instead of ```rust for i in ... { match data_type { DataType::Date32 => represent array[i] as date DataType::Int32 => represent array[i] as int } } ``` imo we should have ```rust match data_type { DataType::Date32 => for i in ... {represent array[i] as date} DataType::Int32 => for i in ... {represent array[i] as int} } ``` i.e. treat the `Display` as any other "kernel", where behavior is logical, not physical, type-dependent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Dandandan edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
Dandandan edited a comment on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611 @yordan-pavlov Would be very interested in a faster parquet reader. Was doing some benchmarking with some queries in DataFusion with Parquet, looks like there is a lot to win in Parquet when looking at profiling results. For example, query 5 showing about 50% of instruction fetches are for reading parquet: ![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png) And reading all the parquet files of the benchmark to memory (and running some queries) (left inclusive % instruction fetch, right self %).: ![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9232: ARROW-10818: [Rust] Implement DecimalType
alamb commented on a change in pull request #9232: URL: https://github.com/apache/arrow/pull/9232#discussion_r587851545 ## File path: rust/datafusion/src/physical_plan/group_scalar.rs ## @@ -22,10 +22,12 @@ use std::convert::{From, TryFrom}; use crate::error::{DataFusionError, Result}; use crate::scalar::ScalarValue; +use arrow::datatypes::Decimal128Type; /// Enumeration of types that can be used in a GROUP BY expression #[derive(Debug, PartialEq, Eq, Hash, Clone)] pub(crate) enum GroupByScalar { +Decimal128(i128, usize, usize), Review comment: Awesome -- thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9634: ARROW-11864: [R] Document arrow.int64_downcast option
github-actions[bot] commented on pull request #9634: URL: https://github.com/apache/arrow/pull/9634#issuecomment-790968132 https://issues.apache.org/jira/browse/ARROW-11864 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
alamb commented on a change in pull request #9588: URL: https://github.com/apache/arrow/pull/9588#discussion_r587849026 ## File path: rust/arrow/src/array/array_binary.rs ## @@ -258,6 +258,8 @@ where } } +// calculate actual data_len, which may be different from the iterator's upper bound +let data_len = offsets.len() - 1; Review comment: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint says > It is not enforced that an iterator implementation yields the declared number of elements. A buggy iterator may yield less than the lower bound or more than the upper bound of elements. > size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g., omit bounds checks in unsafe code. An incorrect implementation of size_hint() should not lead to memory safety violations. I figured i would paste that into this PR as I had looked it up to satisfy my own curiosity This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ovr commented on a change in pull request #9232: ARROW-10818: [Rust] Implement DecimalType
ovr commented on a change in pull request #9232: URL: https://github.com/apache/arrow/pull/9232#discussion_r587848244 ## File path: rust/datafusion/src/physical_plan/group_scalar.rs ## @@ -22,10 +22,12 @@ use std::convert::{From, TryFrom}; use crate::error::{DataFusionError, Result}; use crate::scalar::ScalarValue; +use arrow::datatypes::Decimal128Type; /// Enumeration of types that can be used in a GROUP BY expression #[derive(Debug, PartialEq, Eq, Hash, Clone)] pub(crate) enum GroupByScalar { +Decimal128(i128, usize, usize), Review comment: > your goal is to implement the Decimal128 and Decimal256 types right. In this PR only `Decimal128 `, `Decimal256 ` will be in next PR ## File path: rust/datafusion/src/physical_plan/group_scalar.rs ## @@ -22,10 +22,12 @@ use std::convert::{From, TryFrom}; use crate::error::{DataFusionError, Result}; use crate::scalar::ScalarValue; +use arrow::datatypes::Decimal128Type; /// Enumeration of types that can be used in a GROUP BY expression #[derive(Debug, PartialEq, Eq, Hash, Clone)] pub(crate) enum GroupByScalar { +Decimal128(i128, usize, usize), Review comment: > your goal is to implement the Decimal128 and Decimal256 types right. In this PR only `Decimal128 `, `Decimal256 ` will be in the next PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9232: ARROW-10818: [Rust] Implement DecimalType
alamb commented on a change in pull request #9232: URL: https://github.com/apache/arrow/pull/9232#discussion_r587847557 ## File path: rust/datafusion/src/physical_plan/group_scalar.rs ## @@ -22,10 +22,12 @@ use std::convert::{From, TryFrom}; use crate::error::{DataFusionError, Result}; use crate::scalar::ScalarValue; +use arrow::datatypes::Decimal128Type; /// Enumeration of types that can be used in a GROUP BY expression #[derive(Debug, PartialEq, Eq, Hash, Clone)] pub(crate) enum GroupByScalar { +Decimal128(i128, usize, usize), Review comment: I am not sure @ovr -- I have sort of lost the context of this PR in my head. When you are ready let me know and I can find time to review this again. Just to be super clear, your goal is to implement the Decimal128 and Decimal256 types that are defined in the Arrow spec, right? Not an extension? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-790966208 Revision: 6f70c9911262d5d956c7f4ded49647269af848d9 Submitted crossbow builds: [ursacomputing/crossbow @ actions-175](https://github.com/ursacomputing/crossbow/branches/all?query=actions-175) |Task|Status| ||--| |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-175-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-175-azure-test-r-minimal-build)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9634: ARROW-11864: [R] Document arrow.int64_downcast option
github-actions[bot] commented on pull request #9634: URL: https://github.com/apache/arrow/pull/9634#issuecomment-790966162 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Dandandan edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
Dandandan edited a comment on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611 @yordan-pavlov Would be very interested in a faster parquet reader. Was doing some benchmarking with some queries in DataFusion with Parquet, looks like there is a lot to win in Parquet when looking at profiling results. For example, query 5: ![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png) And reading all the parquet files of the benchmark to memory (and running some queries) (left inclusive % instruction fetch, right self %).: ![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
kou commented on pull request #9553: URL: https://github.com/apache/arrow/pull/9553#issuecomment-790965856 Oh, sorry for my late review... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on a change in pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
kou commented on a change in pull request #9553: URL: https://github.com/apache/arrow/pull/9553#discussion_r584365689 ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +message(STATUS "Using vcpkg to find dependencies") + +# -- +# Define macros + +# macro to list subdirectirectories (non-recursive) +macro(list_subdirs SUBDIRS DIR) + file(GLOB children_ RELATIVE ${DIR} ${DIR}/*) + set(subdirs_ "") + foreach(child_ ${children_}) +if(IS_DIRECTORY "${DIR}/${child_}") + list(APPEND subdirs_ ${child_}) +endif() + endforeach() + set("${SUBDIRS}" ${subdirs_}) + unset(children_) + unset(subdirs_) +endmacro() + +# -- +# Get VCPKG_ROOT + +if(DEFINED CMAKE_TOOLCHAIN_FILE) + # Get it from the CMake variable CMAKE_TOOLCHAIN_FILE + get_filename_component(_VCPKG_DOT_CMAKE "${CMAKE_TOOLCHAIN_FILE}" NAME) + if(EXISTS "${CMAKE_TOOLCHAIN_FILE}" AND _VCPKG_DOT_CMAKE STREQUAL "vcpkg.cmake") +get_filename_component(_VCPKG_BUILDSYSTEMS_DIR "${CMAKE_TOOLCHAIN_FILE}" DIRECTORY) +get_filename_component(VCPKG_ROOT "${_VCPKG_BUILDSYSTEMS_DIR}/../.." ABSOLUTE) + else() +message( + FATAL_ERROR +"vcpkg toolchain file not found at path specified in -DCMAKE_TOOLCHAIN_FILE") + endif() +else() + if(DEFINED VCPKG_ROOT) +# Get it from the CMake variable VCPKG_ROOT +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message(FATAL_ERROR "vcpkg not found in directory specified in -DVCPKG_ROOT") +endif() + elseif(DEFINED ENV{VCPKG_ROOT}) +# Get it from the environment variable VCPKG_ROOT +set(VCPKG_ROOT ENV{VCPKG_ROOT}) +find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) +if(NOT _VCPKG_BIN) + message( +FATAL_ERROR "vcpkg not found in directory in environment variable VCPKG_ROOT") +endif() + else() +# Get it from the file vcpkg.path.txt +find_program(_VCPKG_BIN vcpkg) +if(_VCPKG_BIN) + get_filename_component(_VCPKG_REAL_BIN "${_VCPKG_BIN}" REALPATH) + get_filename_component(VCPKG_ROOT "${_VCPKG_REAL_BIN}" DIRECTORY) +else() + if(CMAKE_HOST_WIN32) +set(_VCPKG_PATH_TXT "$ENV{LOCALAPPDATA}/vcpkg/vcpkg.path.txt") + else() +set(_VCPKG_PATH_TXT "$ENV{HOME}/.vcpkg/vcpkg.path.txt") + endif() + if(EXISTS "${_VCPKG_PATH_TXT}") +file(STRINGS "${_VCPKG_PATH_TXT}" VCPKG_ROOT) + else() +message( + FATAL_ERROR +"vcpkg not found. Install vcpkg if not installed, " +"then run vcpkg integrate install or set environment variable VCPKG_ROOT.") + endif() + find_program(_VCPKG_BIN vcpkg PATHS "${VCPKG_ROOT}" NO_DEFAULT_PATH) + if(NOT _VCPKG_BIN) +message(FATAL_ERROR "vcpkg not found. Re-run vcpkg integrate install " +"or set environment variable VCPKG_ROOT.") + endif() +endif() + endif() + set(CMAKE_TOOLCHAIN_FILE + "${VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" + CACHE STRING "Path to vcpkg CMake toolchain file") Review comment: ```suggestion CACHE FILEPATH "Path to vcpkg CMake toolchain file" FORCE) ``` Do we need `CACHE ...` here? `set(CMAKE_TOOLCHAIN_FILE "${VCPKG_ROOT}/...")` doesn't work? (Sorry. I didn't confirm it.) ## File path: cpp/cmake_modules/Usevcpkg.cmake ## @@ -0,0 +1,214 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT
[GitHub] [arrow] Dandandan edited a comment on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
Dandandan edited a comment on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611 @yordan-pavlov Would be very interested in a faster parquet reader. Was doing some benchmarking with some queries in DataFusion with Parquet, looks like there is a lot to win in Parquet when looking at profiling results. For example, query 5: ![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png) And reading all the parquet files of the benchmark to memory (left inclusive % instruction fetch, right self %).: ![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9553: ARROW-11580: [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ianmcook commented on pull request #9553: URL: https://github.com/apache/arrow/pull/9553#issuecomment-790963128 @kou let me know if there is anything I can do to help you complete your review. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9588: ARROW-11799: [Rust] fix len of string and binary arrays created from unbound iterator
Dandandan commented on pull request #9588: URL: https://github.com/apache/arrow/pull/9588#issuecomment-790962611 @yordan-pavlov Would be very interested in a faster parquet reader. Was doing some benchmarking with some queries in DataFusion with Parquet, looks like there is a lot to win in Parquet when looking at profiling results. For example, query 5: ![query5](https://user-images.githubusercontent.com/163737/110033545-d54af780-7d39-11eb-945c-3614e3d85c8c.png) And reading all the parquet files to memory (left inclusive % instruction fetch, right self %).: ![unknown](https://user-images.githubusercontent.com/163737/110033770-1a6f2980-7d3a-11eb-9c54-1968fe35b88a.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587841807 ## File path: dev/tasks/tasks.yml ## @@ -1789,6 +1789,18 @@ tasks: r_image: r-base r_tag: 3.6-opensuse42 not_cran: "TRUE" + + test-r-minimal-build: Review comment: But it is minimal in the sense that the cpp minimal build in here is. I think it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
nealrichardson commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-790954184 > @nealrichardson do you want `LIBARROW_MINIMAL` to toggle Dataset and Parquet? I assume no. Correct, we don't want to change that meaning (yet). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] trucnguyenlam commented on pull request #9489: ARROW-11497: [Python] Provide parquet enable compliant nested type flag for python binding
trucnguyenlam commented on pull request #9489: URL: https://github.com/apache/arrow/pull/9489#issuecomment-790932818 @jorisvandenbossche could you please have a look at the PR? cheers This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r587812949 ## File path: dev/tasks/tasks.yml ## @@ -1789,6 +1789,18 @@ tasks: r_image: r-base r_tag: 3.6-opensuse42 not_cran: "TRUE" + + test-r-minimal-build: Review comment: TODO: Use a different word because "minimal" already has a specific meaning (`LIBARROW_MINIMAL`) that's not this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-790929111 @nealrichardson do you want `LIBARROW_MINIMAL` to toggle Dataset and Parquet? I assume no. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] msummersgill opened a new pull request #9634: Update R Package Arrow.Rmd vignette to document option(arrow.int64_downcast)
msummersgill opened a new pull request #9634: URL: https://github.com/apache/arrow/pull/9634 Per discussion on https://issues.apache.org/jira/browse/ARROW-9083 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-790925316 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9633: ARROW-11560: [C++][FlightRPC] fix mutex error on SIGINT
github-actions[bot] commented on pull request #9633: URL: https://github.com/apache/arrow/pull/9633#issuecomment-790919414 https://issues.apache.org/jira/browse/ARROW-11560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #9606: ARROW-10405: [C++] IsIn kernel should be able to lookup dictionary in string
rok commented on a change in pull request #9606: URL: https://github.com/apache/arrow/pull/9606#discussion_r587790806 ## File path: cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc ## @@ -231,6 +252,25 @@ TEST_F(TestIsInKernel, Decimal) { /*skip_nulls=*/true); } +TEST_F(TestIsInKernel, DictionaryArray) { + for (auto index_ty : all_dictionary_index_types()) { +CheckIsInDictionary(/*type=*/utf8(), +/*index_type=*/index_ty, +/*input_dictionary_json=*/R"(["A", "B", "C", "D"])", +/*input_index_json=*/"[1, 2, null, 0]", +/*value_set_json=*/R"(["A", "B", "C"])", +/*expected_json=*/"[true, true, false, true]", +/*skip_nulls=*/false); +CheckIsInDictionary(/*type=*/float32(), Review comment: Added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #9606: ARROW-10405: [C++] IsIn kernel should be able to lookup dictionary in string
rok commented on a change in pull request #9606: URL: https://github.com/apache/arrow/pull/9606#discussion_r587790691 ## File path: cpp/src/arrow/compute/kernels/scalar_set_lookup_test.cc ## @@ -72,6 +72,27 @@ void CheckIsInChunked(const std::shared_ptr& input, AssertChunkedEqual(*expected, *actual); } +void CheckIsInDictionary(const std::shared_ptr& type, + const std::shared_ptr& index_type, + const std::string& input_dictionary_json, + const std::string& input_index_json, + const std::string& value_set_json, + const std::string& expected_json, bool skip_nulls = false) { + auto dict_type = dictionary(index_type, type); + auto indices = ArrayFromJSON(index_type, input_index_json); + auto dict = ArrayFromJSON(type, value_set_json); Review comment: Ugh, thanks for catching that. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9630: ARROW-11860: [Rust] [DataFusion] Add DataFusion logos
github-actions[bot] commented on pull request #9630: URL: https://github.com/apache/arrow/pull/9630#issuecomment-790895297 https://issues.apache.org/jira/browse/ARROW-11860 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ericwburden commented on pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes
ericwburden commented on pull request #9624: URL: https://github.com/apache/arrow/pull/9624#issuecomment-790893609 Sorry, just learned that the "Close" button on my phone didn't mean the window... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ericwburden closed pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes
ericwburden closed pull request #9624: URL: https://github.com/apache/arrow/pull/9624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9494: ARROW-11626: [Rust][DataFusion] Move [DataFusion] examples to own project
alamb commented on a change in pull request #9494: URL: https://github.com/apache/arrow/pull/9494#discussion_r58078 ## File path: rust/datafusion-examples/Cargo.toml ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +[package] +name = "datafusion-examples" Review comment: FYI @andygrove -- I am not sure what you think about possibly releasing the examples as a crate of their own. I personally think it would be necessary. However perhaps it would be the right way to to eventually to somehow link to the example code from the docs.rs page https://docs.rs/datafusion/3.0.0/datafusion/ I filed this JIRA https://issues.apache.org/jira/browse/ARROW-11863 to track that ## File path: rust/datafusion/Cargo.toml ## @@ -71,9 +71,6 @@ unicode-segmentation = "^1.7.1" rand = "0.8" criterion = "0.3" tempfile = "3" -prost = "0.7" Review comment: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-790887608 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org