[GitHub] [arrow] kou commented on a change in pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS
kou commented on a change in pull request #7565: URL: https://github.com/apache/arrow/pull/7565#discussion_r446609892 ## File path: cpp/CMakeLists.txt ## @@ -472,7 +472,7 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARROW_CXXFLAGS}") # For any C code, use the same flags. These flags don't contain # C++ specific flags. -set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARROW_CXX_FLAGS} ${CXX_COMMON_FLAGS}") +set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARROW_CXXFLAGS} ${CXX_COMMON_FLAGS}") Review comment: Good catch! Could you use `${CXX_COMMON_FLAGS} ${ARROW_CXXFLAGS}` order to allow overriding `${CXX_COMMON_FLAGS}` by `${ARROW_CXXFLAGS}`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS
github-actions[bot] commented on pull request #7565: URL: https://github.com/apache/arrow/pull/7565#issuecomment-650706362 https://issues.apache.org/jira/browse/ARROW-9256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Ktakuya332C opened a new pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS
Ktakuya332C opened a new pull request #7565: URL: https://github.com/apache/arrow/pull/7565 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
kou closed pull request #7564: URL: https://github.com/apache/arrow/pull/7564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
kou commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650695347 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tianchen92 commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices
tianchen92 commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-650685609 > Does this impact IPC? seems not, IPC used getFieldBuffers which has the right buffer order, this PR is going to replace getFieldBuffers with getBuffers (getBuffers has wrong buffer order witch will break Dremio tests) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-650685726 @rymurr Thanks for your effort. I will make another pass today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446595068 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,107 @@ package org.apache.arrow.memory.rounding; -import java.lang.reflect.Field; - -import org.apache.arrow.memory.NettyAllocationManager; import org.apache.arrow.memory.util.CommonUtil; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import io.netty.util.internal.SystemPropertyUtil; /** * The default rounding policy. That is, if the requested size is within the chunk size, * the rounded size will be the next power of two. Otherwise, the rounded size * will be identical to the requested size. */ public class DefaultRoundingPolicy implements RoundingPolicy { - + private static final Logger logger = LoggerFactory.getLogger(DefaultRoundingPolicy.class); public final long chunkSize; /** - * The singleton instance. + * The variables here and the static block calculates teh DEFAULT_CHUNK_SIZE. Review comment: teh -> the This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446594755 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -227,13 +207,28 @@ public ArrowBuf slice(long index, long length) { return newBuf; } + /** + * Make an nio byte buffery from this arrowbuf. Review comment: nit: an -> a This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
github-actions[bot] commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650682223 Revision: a86f3649bdce9c5b2f58174488615725883b1f5b Submitted crossbow builds: [ursa-labs/crossbow @ actions-363](https://github.com/ursa-labs/crossbow/branches/all?query=actions-363) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-centos-6-amd64)| |centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-centos-7-amd64)| |centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-centos-8-amd64)| |debian-buster-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-debian-buster-amd64)| |debian-buster-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-debian-buster-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |debian-stretch-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-debian-stretch-amd64)| |debian-stretch-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-debian-stretch-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-bionic-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-ubuntu-bionic-amd64)| |ubuntu-bionic-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-ubuntu-bionic-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-eoan-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-ubuntu-eoan-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-ubuntu-eoan-amd64)| |ubuntu-eoan-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-ubuntu-eoan-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-focal-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-ubuntu-focal-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-ubuntu-focal-amd64)| |ubuntu-focal-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-ubuntu-focal-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |ubuntu-xenial-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-363-github-ubuntu-xenial-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-363-github-ubuntu-xenial-amd64)| |ubuntu-xenial-arm64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-363-travis-ubuntu-xenial-arm64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-363-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-363-azure-wheel-manylinux1-cp35m)| |wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-363-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-363-azure-wheel-manylinux1-cp36m)| |wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-363-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-363-azure-wheel-manylinux1-cp37m)| |wheel-manylinu
[GitHub] [arrow] kou commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
kou commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650681862 @github-actions crossbow submit -g linux -g wheel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
github-actions[bot] commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650676307 https://issues.apache.org/jira/browse/ARROW-9255 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
kou opened a new pull request #7564: URL: https://github.com/apache/arrow/pull/7564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow
wesm commented on pull request #7315: URL: https://github.com/apache/arrow/pull/7315#issuecomment-650658746 I'm going to close this for now and attempt to pursue the static library splicing solution for 1.0.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow
wesm closed pull request #7315: URL: https://github.com/apache/arrow/pull/7315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7563: ARROW-8888: [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable
github-actions[bot] commented on pull request #7563: URL: https://github.com/apache/arrow/pull/7563#issuecomment-650656331 https://issues.apache.org/jira/browse/ARROW- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7563: ARROW-8888: [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable
wesm opened a new pull request #7563: URL: https://github.com/apache/arrow/pull/7563 The ThreadPoolExecutor has a good amount of per-column overhead This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7562: ARROW-7273: [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parqu
github-actions[bot] commented on pull request #7562: URL: https://github.com/apache/arrow/pull/7562#issuecomment-650652746 https://issues.apache.org/jira/browse/ARROW-7273 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-650650589 Looks like the int64 tests must be removed from the "gold" corpus as the JSON files cannot be parsed anymore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7562: ARROW-7273: [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parquet schema conve
wesm opened a new pull request #7562: URL: https://github.com/apache/arrow/pull/7562 This was the simplest triage I could think of. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650648681 @nealrichardson I figure this might impact the R packages also This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7561: ARROW-9254: [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for ac
github-actions[bot] commented on pull request #7561: URL: https://github.com/apache/arrow/pull/7561#issuecomment-650647251 https://issues.apache.org/jira/browse/ARROW-9254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7561: ARROW-9254: [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for accessing primiti
wesm opened a new pull request #7561: URL: https://github.com/apache/arrow/pull/7561 This is some preparatory work for ARROW-9196. I also addressed some prior uncleanliness related to unboxing temporal scalars based on C types. By adding these `data()` and `mutable_data()` functions we can obtain a pointer e.g. to the `int64_t` stored in the scalar. Previously I was resorting to some slightly hacky inheritance tricks -- this seems better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files
github-actions[bot] commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-650643241 https://issues.apache.org/jira/browse/ARROW-9252 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files
wesm opened a new pull request #7560: URL: https://github.com/apache/arrow/pull/7560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray
github-actions[bot] commented on pull request #7559: URL: https://github.com/apache/arrow/pull/7559#issuecomment-650639293 https://issues.apache.org/jira/browse/ARROW-9247 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray
wesm commented on pull request #7559: URL: https://github.com/apache/arrow/pull/7559#issuecomment-650638222 cc @brills This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray
wesm opened a new pull request #7559: URL: https://github.com/apache/arrow/pull/7559 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations
github-actions[bot] commented on pull request #7558: URL: https://github.com/apache/arrow/pull/7558#issuecomment-650637397 https://issues.apache.org/jira/browse/ARROW-9250 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations
wesm commented on a change in pull request #7558: URL: https://github.com/apache/arrow/pull/7558#discussion_r446571732 ## File path: cpp/src/arrow/type.h ## @@ -900,7 +902,7 @@ class ARROW_EXPORT LargeStringType : public LargeBinaryType { public: static constexpr Type::type type_id = Type::LARGE_STRING; static constexpr bool is_utf8 = true; - using EquivalentBinaryType = LargeBinaryType; + using PhysicalType = LargeBinaryType; Review comment: These changes are for consistency with the `PhysicalType` attributes added to the CType-based types This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations
wesm opened a new pull request #7558: URL: https://github.com/apache/arrow/pull/7558 This yields a 150KB reduction in code for me on Linux. Since this may become a common pattern (using e.g. a single `uint32_t`-based function to process both int32/uint32), some of this may be factored out to make writing such kernel implementations simpler in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7557: ARROW-9251: [C++] Relocate integration testing JSON code implementation to src/arrow/testing
github-actions[bot] commented on pull request #7557: URL: https://github.com/apache/arrow/pull/7557#issuecomment-650634874 https://issues.apache.org/jira/browse/ARROW-9251 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650633720 It appears that the Brotli shared libraries are in the manylinux1 image even though `-DBUILD_SHARED_LIBS=OFF` https://github.com/apache/arrow/blob/master/python/manylinux1/scripts/build_brotli.sh#L29 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7557: ARROW-9251: [C++] Relocate integration testing JSON code implementation to src/arrow/testing
wesm opened a new pull request #7557: URL: https://github.com/apache/arrow/pull/7557 While this code is not being shipped in any packages, I think it would be better for it to live in the testing directory so that its purpose is clear I think there may be potentially some value in exposing `ArrayFromJSON` (in ipc/json_simple.h) in bindings at some point so I have left this code where it is, though it might be better to move it to arrow/json This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650627708 Revision: f675cd913b83c56bdbbe24ecc074059dfb382fd0 Submitted crossbow builds: [ursa-labs/crossbow @ actions-362](https://github.com/ursa-labs/crossbow/branches/all?query=actions-362) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-362-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-362-github-centos-6-amd64)| |centos-7-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-362-travis-centos-7-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-362-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-362-github-centos-7-amd64)| |centos-8-aarch64|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-362-travis-centos-8-aarch64.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-362-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-362-github-centos-8-amd64)| |conda-clean|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-clean)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-clean)| |conda-linux-gcc-py36-cpu|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-linux-gcc-py36-cpu)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-linux-gcc-py36-cpu)| |conda-linux-gcc-py36-cuda|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-linux-gcc-py36-cuda)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-linux-gcc-py36-cuda)| |conda-linux-gcc-py37-cpu|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-linux-gcc-py37-cpu)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-linux-gcc-py37-cpu)| |conda-linux-gcc-py37-cuda|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-linux-gcc-py37-cuda)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-linux-gcc-py37-cuda)| |conda-linux-gcc-py38-cpu|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-linux-gcc-py38-cpu)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-linux-gcc-py38-cpu)| |conda-linux-gcc-py38-cuda|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-linux-gcc-py38-cuda)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-linux-gcc-py38-cuda)| |conda-osx-clang-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-osx-clang-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-osx-clang-py36)| |conda-osx-clang-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-osx-clang-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-osx-clang-py37)| |conda-osx-clang-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-osx-clang-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-osx-clang-py38)| |conda-win-vs2015-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-win-vs2015-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-win-vs2015-py36)| |conda-win-vs2015-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-362-azure-conda-win-vs2015-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1&branchName=actions-362-azure-conda-win-vs2015-py37)| |conda-win-vs2015-py
[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650627218 @github-actions crossbow submit -g linux -g wheel -g conda This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650618145 Thanks, will look into this. I'm guessing these changes will break some of the Python wheel builds so we may need a flag to indicate a preference of shared vs static This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650616427 Ok thanks, that's much appreciated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove closed pull request #7494: ARROW-9184: [Rust][Datafusion] table scan without projection should return all columns
andygrove closed pull request #7494: URL: https://github.com/apache/arrow/pull/7494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650606456 @wesm I can also take this since you already have quite a bit on your plate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou edited a comment on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower
pitrou edited a comment on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650606456 @wesm I can also take this since you already have quite a bit on your plate for 1.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
kiszk commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650588895 Looks good except one minor comment. LZ4 and ZSTD also use the dynamic library at first if available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
kiszk commented on a change in pull request #7556: URL: https://github.com/apache/arrow/pull/7556#discussion_r446547216 ## File path: cpp/cmake_modules/FindBrotli.cmake ## @@ -17,29 +17,29 @@ # # find_package(Brotli) -# Favour static libraries over dynamic libraries, and handle various spellings +# Favor shared libraries over dynamic libraries, and handle various spellings Review comment: nit: dynamic -> static ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650583053 https://issues.apache.org/jira/browse/ARROW-9188 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available
wesm opened a new pull request #7556: URL: https://github.com/apache/arrow/pull/7556 If both shared and static Brotli libraries are available, the static ones were being selected, causing ~750KB of code to be statically linked into libarrow.so on Linux. This is not consistent with our handling of other toolchain libraries. We should use the shared library if it is available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7551: ARROW-9132: [C++] Support Unique and ValueCounts on dictionary data with non-changing dictionaries, add ChunkedArray::Make validating constructor
wesm closed pull request #7551: URL: https://github.com/apache/arrow/pull/7551 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7551: ARROW-9132: [C++] Support Unique and ValueCounts on dictionary data with non-changing dictionaries, add ChunkedArray::Make validating constructor
wesm commented on pull request #7551: URL: https://github.com/apache/arrow/pull/7551#issuecomment-650580242 +1. If anyone desires refinements of `ChunkedArray::Make` please let me know and I will make them This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650579924 @maartenbreddels let me know if I can help with anything to get this merge-ready -- I want to make the utf8proc-depending code optional so I will need to make a small refactor after this lands This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower
wesm commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446541086 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived> +struct Utf8Transform { + using offset_type = typename Type::offset_type; + using DerivedClass = Derived; + using ArrayType = typename TypeTraits::ArrayType; + + static offset_type Transform(const uint8_t* input, offset_type input_string_ncodeunits, + uint8_t* output) { +uint8_t* dest = output; +utf8_transform(input, input + input_string_ncodeunits, dest, + DerivedClass::TransformCodepoint); +return (offset_type)(dest - output); + } + + static void Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) { +if (batch[0].kind() == Datum::ARRAY) { + std::call_once(flag_case_luts, []() { +lut_upper_codepoint.reserve(MAX_CODEPOINT_LUT + 1); +lut_lower_codepoint.reserve(MAX_CODEPOINT_LUT + 1); +for (int i = 0; i <= MAX_CODEPOINT_LUT; i++) { + lut_upper_codepoint.push_back(utf8proc_toupper(i)); + lut_lower_codepoint.push_back(utf8proc_tolower(i)); +} + }); + const ArrayData& input = *batch[0].array(); + ArrayType input_boxed(batch[0].array()); + ArrayData* output = out->mutable_array(); + + offset_type const* input_string_offsets = input.GetValues(1); + utf8proc_uint8_t const* input_str = + input.buffers[2]->data() + input_boxed.value_offset(0); + offset_type input_ncodeunits = input_boxed.total_values_length(); + offset_type input_nstrings = (offset_type)input.length; + + // Section 5.18 of the Unicode spec claim that the number of codepoints for case + // mapping can grow by a factor of 3. This means grow by a factor of 3 in bytes + // However, since we don't support all casings (SpecialCasing.txt) the growth + // is actually only at max 3/2 (as covered by the unittest). + // Note that rounding down the 3/2 is ok, since only codepoints encoded by + // two code units (even) can grow to 3 code units. + + int64_t output_ncodeunits_max = ((int64_t)input_ncodeunits) * 3 / 2; + if (output_ncodeunits_max > std::numeric_limits::max()) { +ctx->SetStatus(Status::CapacityError( +"Result might not fit in a 32bit utf8 array, convert to large_utf8")); +return; + } + + KERNEL_RETURN_IF_ERROR( + ctx, ctx->Allocate(output_ncodeunits_max).Value(&output->buffers[2])); + // We could reuse the buffer if it is all ascii, benchmarking showed this not to + // matter + // output->buffers[1] = input.buffers[1]; + KERNEL_RETURN_IF_ERROR(ctx, + ctx->Allocate((input_nstrings + 1) * sizeof(offset_type)) + .Value(&output->buffers[1])); + utf8proc_uint8_t* output_str = output->buffers[2]->mutable_data(); + offset_type* output_string_offsets = output->GetMutableValues(1); + offset_type output_ncodeunits = 0; + + offset_type output_string_offset = 0; + *output_string_offsets = output_string_offset; + offset_type input_string_first_offset = input_string_offsets[0]; + for (int64_t i = 0; i < input_nstrings; i++) { +offset_type input_string_offset = +input_string_offsets[i] - input_string_first_offset; +offset_type input_string_end = +input_string_offsets[i + 1] - input_string_first_offset; +offset_type input_string_ncodeunits = input_string_end - input_string_offset; +offset_type encoded_nbytes = DerivedClass::Transform( +input_str + input_string_offset, input_string_ncodeunits, +output_str + output_ncodeunits); +output_ncodeunits += encoded_nbytes; +output_string_offsets[i + 1] = output_ncodeunits; + } + + // trim the codepoint buffer, since we allocated too much + KERNEL_RETURN_IF_ERROR( + ctx, + output->buffers[2]->CopySlice(0, output_ncodeunits).Value(&output->buffers[2])); Review comment: Yes we can change that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7321: ARROW-8985: [Format] Add Decimal::bitWidth field with default value of 128 for forward compatibility
wesm closed pull request #7321: URL: https://github.com/apache/arrow/pull/7321 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7321: ARROW-8985: [Format] Add Decimal::bitWidth field with default value of 128 for forward compatibility
wesm commented on pull request #7321: URL: https://github.com/apache/arrow/pull/7321#issuecomment-650575218 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
github-actions[bot] commented on pull request #7555: URL: https://github.com/apache/arrow/pull/7555#issuecomment-650570060 https://issues.apache.org/jira/browse/ARROW-9238 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk opened a new pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
kiszk opened a new pull request #7555: URL: https://github.com/apache/arrow/pull/7555 This PR increase test coverage of round-robin under ipc and flight. Before this PR, round-robin tests for primitive data under ipc use only int32 (and boolean in some cases). This PR adds other primitive types (i.e. int8, uint8, int16, uint16, uint32, int64, uint64, float32, and float64). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] Demetrio92 commented on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?
Demetrio92 commented on issue #1688: URL: https://github.com/apache/arrow/issues/1688#issuecomment-650559676 @wesm yeah, sorry, guys, you're awesome. I thought this was pandas repo... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7554: ARROW-9236: [Rust] CSV WriterBuilder never writes header
github-actions[bot] commented on pull request #7554: URL: https://github.com/apache/arrow/pull/7554#issuecomment-650521333 https://issues.apache.org/jira/browse/ARROW-9236 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ritchie46 opened a new pull request #7554: ARROW-9236: [Rust] CSV WriterBuilder never writes header
ritchie46 opened a new pull request #7554: URL: https://github.com/apache/arrow/pull/7554 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r446497720 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuffer(int startIndex, int length, BaseVariab */ private void splitAndTransferValidityBuffer(int startIndex, int length, BaseVariableWidthVector target) { -int firstByteSource = BitVectorHelper.byteIndex(startIndex); -int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1); -int byteSizeTarget = getValidityBufferSizeFromCount(length); -int offset = startIndex % 8; +if (length <= 0) { + return; +} -if (length > 0) { - if (offset == 0) { -// slice -if (target.validityBuffer != null) { - target.validityBuffer.getReferenceManager().release(); -} -target.validityBuffer = validityBuffer.slice(firstByteSource, byteSizeTarget); -target.validityBuffer.getReferenceManager().retain(); +final int firstByteSource = BitVectorHelper.byteIndex(startIndex); +final int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1); +final int byteSizeTarget = getValidityBufferSizeFromCount(length); +final int offset = startIndex % 8; + +if (offset == 0) { + // slice + if (target.validityBuffer != null) { +target.validityBuffer.getReferenceManager().release(); + } + final ArrowBuf slicedValidityBuffer = validityBuffer.slice(firstByteSource, byteSizeTarget); + target.validityBuffer = transferBuffer(slicedValidityBuffer, target.allocator); Review comment: Done in 076e9964740f663a813829a7c436439f6604123f This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org