[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-06-27 Thread GitBox
scampi commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r446497720 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -751,55 +757,57 @@ private void splitAndTransferOffsetBuf

[GitHub] [arrow] ritchie46 opened a new pull request #7554: ARROW-9236: [Rust] CSV WriterBuilder never writes header

2020-06-27 Thread GitBox
ritchie46 opened a new pull request #7554: URL: https://github.com/apache/arrow/pull/7554 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #7554: ARROW-9236: [Rust] CSV WriterBuilder never writes header

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7554: URL: https://github.com/apache/arrow/pull/7554#issuecomment-650521333 https://issues.apache.org/jira/browse/ARROW-9236 This is an automated message from the Apache Git Serv

[GitHub] [arrow] Demetrio92 commented on issue #1688: Possible to read categoricals back into Pandas from Parquet using Pyarrow?

2020-06-27 Thread GitBox
Demetrio92 commented on issue #1688: URL: https://github.com/apache/arrow/issues/1688#issuecomment-650559676 @wesm yeah, sorry, guys, you're awesome. I thought this was pandas repo... This is an automated message from the Apa

[GitHub] [arrow] kiszk opened a new pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-06-27 Thread GitBox
kiszk opened a new pull request #7555: URL: https://github.com/apache/arrow/pull/7555 This PR increase test coverage of round-robin under ipc and flight. Before this PR, round-robin tests for primitive data under ipc use only int32 (and boolean in some cases). This PR adds other primitive

[GitHub] [arrow] github-actions[bot] commented on pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7555: URL: https://github.com/apache/arrow/pull/7555#issuecomment-650570060 https://issues.apache.org/jira/browse/ARROW-9238 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7321: ARROW-8985: [Format] Add Decimal::bitWidth field with default value of 128 for forward compatibility

2020-06-27 Thread GitBox
wesm commented on pull request #7321: URL: https://github.com/apache/arrow/pull/7321#issuecomment-650575218 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] wesm closed pull request #7321: ARROW-8985: [Format] Add Decimal::bitWidth field with default value of 128 for forward compatibility

2020-06-27 Thread GitBox
wesm closed pull request #7321: URL: https://github.com/apache/arrow/pull/7321 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-27 Thread GitBox
wesm commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446541086 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived> +struct Utf8Tran

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-27 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650579924 @maartenbreddels let me know if I can help with anything to get this merge-ready -- I want to make the utf8proc-depending code optional so I will need to make a small refactor after

[GitHub] [arrow] wesm commented on pull request #7551: ARROW-9132: [C++] Support Unique and ValueCounts on dictionary data with non-changing dictionaries, add ChunkedArray::Make validating constructor

2020-06-27 Thread GitBox
wesm commented on pull request #7551: URL: https://github.com/apache/arrow/pull/7551#issuecomment-650580242 +1. If anyone desires refinements of `ChunkedArray::Make` please let me know and I will make them This is an automat

[GitHub] [arrow] wesm closed pull request #7551: ARROW-9132: [C++] Support Unique and ValueCounts on dictionary data with non-changing dictionaries, add ChunkedArray::Make validating constructor

2020-06-27 Thread GitBox
wesm closed pull request #7551: URL: https://github.com/apache/arrow/pull/7551 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm opened a new pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
wesm opened a new pull request #7556: URL: https://github.com/apache/arrow/pull/7556 If both shared and static Brotli libraries are available, the static ones were being selected, causing ~750KB of code to be statically linked into libarrow.so on Linux. This is not consistent with our hand

[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650583053 https://issues.apache.org/jira/browse/ARROW-9188 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kiszk commented on a change in pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
kiszk commented on a change in pull request #7556: URL: https://github.com/apache/arrow/pull/7556#discussion_r446547216 ## File path: cpp/cmake_modules/FindBrotli.cmake ## @@ -17,29 +17,29 @@ # # find_package(Brotli) -# Favour static libraries over dynamic libraries, and h

[GitHub] [arrow] kiszk commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
kiszk commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650588895 Looks good except one minor comment. LZ4 and ZSTD also use the dynamic library at first if available. This is an au

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-27 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650606456 @wesm I can also take this since you already have quite a bit on your plate. This is an automated message from the

[GitHub] [arrow] pitrou edited a comment on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-27 Thread GitBox
pitrou edited a comment on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650606456 @wesm I can also take this since you already have quite a bit on your plate for 1.0. This is an automated m

[GitHub] [arrow] andygrove closed pull request #7494: ARROW-9184: [Rust][Datafusion] table scan without projection should return all columns

2020-06-27 Thread GitBox
andygrove closed pull request #7494: URL: https://github.com/apache/arrow/pull/7494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-27 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-650616427 Ok thanks, that's much appreciated This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650618145 Thanks, will look into this. I'm guessing these changes will break some of the Python wheel builds so we may need a flag to indicate a preference of shared vs static --

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650627218 @github-actions crossbow submit -g linux -g wheel -g conda This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650627708 Revision: f675cd913b83c56bdbbe24ecc074059dfb382fd0 Submitted crossbow builds: [ursa-labs/crossbow @ actions-362](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] wesm opened a new pull request #7557: ARROW-9251: [C++] Relocate integration testing JSON code implementation to src/arrow/testing

2020-06-27 Thread GitBox
wesm opened a new pull request #7557: URL: https://github.com/apache/arrow/pull/7557 While this code is not being shipped in any packages, I think it would be better for it to live in the testing directory so that its purpose is clear I think there may be potentially some value in ex

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650633720 It appears that the Brotli shared libraries are in the manylinux1 image even though `-DBUILD_SHARED_LIBS=OFF` https://github.com/apache/arrow/blob/master/python/manylinux1/sc

[GitHub] [arrow] github-actions[bot] commented on pull request #7557: ARROW-9251: [C++] Relocate integration testing JSON code implementation to src/arrow/testing

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7557: URL: https://github.com/apache/arrow/pull/7557#issuecomment-650634874 https://issues.apache.org/jira/browse/ARROW-9251 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations

2020-06-27 Thread GitBox
wesm opened a new pull request #7558: URL: https://github.com/apache/arrow/pull/7558 This yields a 150KB reduction in code for me on Linux. Since this may become a common pattern (using e.g. a single `uint32_t`-based function to process both int32/uint32), some of this may be factore

[GitHub] [arrow] wesm commented on a change in pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations

2020-06-27 Thread GitBox
wesm commented on a change in pull request #7558: URL: https://github.com/apache/arrow/pull/7558#discussion_r446571732 ## File path: cpp/src/arrow/type.h ## @@ -900,7 +902,7 @@ class ARROW_EXPORT LargeStringType : public LargeBinaryType { public: static constexpr Type::ty

[GitHub] [arrow] github-actions[bot] commented on pull request #7558: ARROW-9250: [C++] Instantiate fewer templates in IsIn, Match kernel implementations

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7558: URL: https://github.com/apache/arrow/pull/7558#issuecomment-650637397 https://issues.apache.org/jira/browse/ARROW-9250 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-27 Thread GitBox
wesm opened a new pull request #7559: URL: https://github.com/apache/arrow/pull/7559 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-27 Thread GitBox
wesm commented on pull request #7559: URL: https://github.com/apache/arrow/pull/7559#issuecomment-650638222 cc @brills This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7559: URL: https://github.com/apache/arrow/pull/7559#issuecomment-650639293 https://issues.apache.org/jira/browse/ARROW-9247 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-27 Thread GitBox
wesm opened a new pull request #7560: URL: https://github.com/apache/arrow/pull/7560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-650643241 https://issues.apache.org/jira/browse/ARROW-9252 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7561: ARROW-9254: [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for accessing primiti

2020-06-27 Thread GitBox
wesm opened a new pull request #7561: URL: https://github.com/apache/arrow/pull/7561 This is some preparatory work for ARROW-9196. I also addressed some prior uncleanliness related to unboxing temporal scalars based on C types. By adding these `data()` and `mutable_data()` functions we can

[GitHub] [arrow] github-actions[bot] commented on pull request #7561: ARROW-9254: [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for ac

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7561: URL: https://github.com/apache/arrow/pull/7561#issuecomment-650647251 https://issues.apache.org/jira/browse/ARROW-9254 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-27 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-650648681 @nealrichardson I figure this might impact the R packages also This is an automated message from the Apache Git Servi

[GitHub] [arrow] wesm opened a new pull request #7562: ARROW-7273: [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parquet schema conve

2020-06-27 Thread GitBox
wesm opened a new pull request #7562: URL: https://github.com/apache/arrow/pull/7562 This was the simplest triage I could think of. This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-27 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-650650589 Looks like the int64 tests must be removed from the "gold" corpus as the JSON files cannot be parsed anymore This is

[GitHub] [arrow] github-actions[bot] commented on pull request #7562: ARROW-7273: [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parqu

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7562: URL: https://github.com/apache/arrow/pull/7562#issuecomment-650652746 https://issues.apache.org/jira/browse/ARROW-7273 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7563: ARROW-8888: [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable

2020-06-27 Thread GitBox
wesm opened a new pull request #7563: URL: https://github.com/apache/arrow/pull/7563 The ThreadPoolExecutor has a good amount of per-column overhead This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7563: ARROW-8888: [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7563: URL: https://github.com/apache/arrow/pull/7563#issuecomment-650656331 https://issues.apache.org/jira/browse/ARROW- This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm closed pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow

2020-06-27 Thread GitBox
wesm closed pull request #7315: URL: https://github.com/apache/arrow/pull/7315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm commented on pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow

2020-06-27 Thread GitBox
wesm commented on pull request #7315: URL: https://github.com/apache/arrow/pull/7315#issuecomment-650658746 I'm going to close this for now and attempt to pursue the static library splicing solution for 1.0.0 This is an auto

[GitHub] [arrow] kou opened a new pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7

2020-06-27 Thread GitBox
kou opened a new pull request #7564: URL: https://github.com/apache/arrow/pull/7564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650676307 https://issues.apache.org/jira/browse/ARROW-9255 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kou commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7

2020-06-27 Thread GitBox
kou commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650681862 @github-actions crossbow submit -g linux -g wheel This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650682223 Revision: a86f3649bdce9c5b2f58174488615725883b1f5b Submitted crossbow builds: [ursa-labs/crossbow @ actions-363](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-27 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446594755 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -227,13 +207,28 @@ public ArrowBuf slice(long index, long length) {

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-27 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446595068 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,107 @@ package org.apache.arrow.m

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-27 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-650685726 @rymurr Thanks for your effort. I will make another pass today. This is an automated message from the Apache Git

[GitHub] [arrow] tianchen92 commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-06-27 Thread GitBox
tianchen92 commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-650685609 > Does this impact IPC? seems not, IPC used getFieldBuffers which has the right buffer order, this PR is going to replace getFieldBuffers with getBuffers (getBuffers has

[GitHub] [arrow] kou commented on pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7

2020-06-27 Thread GitBox
kou commented on pull request #7564: URL: https://github.com/apache/arrow/pull/7564#issuecomment-650695347 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow] kou closed pull request #7564: ARROW-9255: [C++] Use CMake to build bundled Protobuf with CMake >= 3.7

2020-06-27 Thread GitBox
kou closed pull request #7564: URL: https://github.com/apache/arrow/pull/7564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] Ktakuya332C opened a new pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-27 Thread GitBox
Ktakuya332C opened a new pull request #7565: URL: https://github.com/apache/arrow/pull/7565 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow] github-actions[bot] commented on pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-27 Thread GitBox
github-actions[bot] commented on pull request #7565: URL: https://github.com/apache/arrow/pull/7565#issuecomment-650706362 https://issues.apache.org/jira/browse/ARROW-9256 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kou commented on a change in pull request #7565: ARROW-9256: [C++] Incorrect variable name ARROW_CXX_FLAGS

2020-06-27 Thread GitBox
kou commented on a change in pull request #7565: URL: https://github.com/apache/arrow/pull/7565#discussion_r446609892 ## File path: cpp/CMakeLists.txt ## @@ -472,7 +472,7 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARROW_CXXFLAGS}") # For any C code, use the same flags. The