[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-651423838 Will merge this if the build passes with the arrow-testing changes This is an automated message from the Apache Git

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651456329 @wesm Is it better to work for benchmarking in other pull-request? This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7585: ARROW-3520: [C++] Add "list_flatten" vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7585: URL: https://github.com/apache/arrow/pull/7585#issuecomment-651460741 https://issues.apache.org/jira/browse/ARROW-3520 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
wesm commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651460507 Hm not so fast. The macOS py35 failure seems legitimate https://travis-ci.org/github/ursa-labs/crossbow/builds/703242650#L10060

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651466752 @github-actions crossbow submit -g linux -g wheel -g conda This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651467252 Revision: 821f30a834dab99cdc757100e51986384f0a391c Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-651474250 @rymurr Thanks for your work. A few typos. I think it would be ready for merge. This is an automated

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363600 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -227,13 +207,25 @@ public ArrowBuf slice(long index, long length) {

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651464683 Apparently the `-DBUILD_SHARED_LIBS=OFF` option for Brotli doesn't do anything. I'll add some code to scrub the shared libs from the manylinux images

[GitHub] [arrow] zeevm opened a new pull request #7586: Calculate page and column statistics

2020-06-29 Thread GitBox
zeevm opened a new pull request #7586: URL: https://github.com/apache/arrow/pull/7586 1. Calculate page and column statistics 2. Use pre-calculated statistics when available to speed-up when writing data from other formats like ORC.

[GitHub] [arrow] zhztheplayer commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-29 Thread GitBox
zhztheplayer commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-651422684 Thanks for the comments! I've got some stuffs to deal with these days. Will address as soon as possible.

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] emkornfield commented on a change in pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-29 Thread GitBox
emkornfield commented on a change in pull request #7535: URL: https://github.com/apache/arrow/pull/7535#discussion_r447396488 ## File path: docs/source/format/Columnar.rst ## @@ -688,11 +687,10 @@ will have the following layout: ::

[GitHub] [arrow] wesm opened a new pull request #7585: ARROW-3520: [C++] WIP Add vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
wesm opened a new pull request #7585: URL: https://github.com/apache/arrow/pull/7585 I'm testing a JIRA webhook, I'll close this PR and then reopen it when the patch is done This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651466442 Actually, that's crazy. I'm taking the same approach as ZSTD and adding a CMake toggle between shared and static Brotli (with default being shared)

[GitHub] [arrow] nealrichardson commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651484901 Ok, this isn't necessarily pretty but I think it's done, or done enough for here. I'll add some more tests, probably some docs for the format, and poke around a bit more

[GitHub] [arrow] emkornfield commented on a change in pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-29 Thread GitBox
emkornfield commented on a change in pull request #7535: URL: https://github.com/apache/arrow/pull/7535#discussion_r447396655 ## File path: docs/source/format/Columnar.rst ## @@ -566,33 +572,28 @@ having the values: ``[{f=1.2}, null, {f=3.4}, {i=5}]`` :: * Length: 4,

[GitHub] [arrow] wesm closed pull request #7585: ARROW-3520: [C++] WIP Add vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
wesm closed pull request #7585: URL: https://github.com/apache/arrow/pull/7585 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm commented on pull request #7569: URL: https://github.com/apache/arrow/pull/7569#issuecomment-651457980 +1, this is a bit dry so would rather reviewers reserve their time for other PRs This is an automated message from

[GitHub] [arrow] wesm closed pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm closed pull request #7569: URL: https://github.com/apache/arrow/pull/7569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm edited a comment on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm edited a comment on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651457062 @mrkn it's up to you, it's fine with me if you work on performance tuning (or at least measurement) in another PR or this one

[GitHub] [arrow] wesm commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651457062 @mrkn it's up to you, it's fine with me if you work on performance tuning in another PR or this one This is an

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363293 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,103 @@ package

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363481 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,107 @@ package

[GitHub] [arrow] liyafan82 commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
liyafan82 commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651492666 In addition to the problem of top level validity buffer, I think there is another problem to discuss: Java is using the ordinal of the minor type as the type id, and this is

[GitHub] [arrow] liyafan82 edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
liyafan82 edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651492666 In addition to the problem of top level validity buffer, I think there is another problem to discuss: Java is using the ordinal of the minor type as the type id, and

[GitHub] [arrow] emkornfield commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-06-29 Thread GitBox
emkornfield commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-651509291 Can we leave the old method in place and mark it as deprecated and remove in a later release? This is an

[GitHub] [arrow] emkornfield commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
emkornfield commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651515749 @liyafan82 I think that is a good point. If it supports both modes I think that is a reasonable compromise for now as long as @jacques-n is OK with it. But we can maybe

[GitHub] [arrow] wesm closed pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm closed pull request #7560: URL: https://github.com/apache/arrow/pull/7560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-651458622 Yahtzee This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] mrkn edited a comment on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn edited a comment on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651458470 @wesm OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue.

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651458470 OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue.

[GitHub] [arrow] github-actions[bot] commented on pull request #7586: Calculate page and column statistics

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7586: URL: https://github.com/apache/arrow/pull/7586#issuecomment-651495743 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] kszucs opened a new pull request #7584: ARROW-9272: [C++][Python] Reduce complexity in python to arrow conversion

2020-06-29 Thread GitBox
kszucs opened a new pull request #7584: URL: https://github.com/apache/arrow/pull/7584 The original motivation for this patch was to reuse the same conversions path for both the scalars and arrays. In my recent patch the scalars are converted from a single element list to a single

[GitHub] [arrow] github-actions[bot] commented on pull request #7584: ARROW-9272: [C++][Python] Reduce complexity in python to arrow conversion

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7584: URL: https://github.com/apache/arrow/pull/7584#issuecomment-651404016 https://issues.apache.org/jira/browse/ARROW-9272 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651368104 Indeed, toolchain incompatibilities only affect C++ code This is an automated message from the Apache Git Service.

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447259489 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447267538 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651355763 > > This means there also needs to be a PKGBUILD > > Why? `libutf8proc` is installed. The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9.

[GitHub] [arrow] kou commented on a change in pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
kou commented on a change in pull request #7581: URL: https://github.com/apache/arrow/pull/7581#discussion_r447247927 ## File path: cpp/src/arrow/config.h ## @@ -0,0 +1,47 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651366993 > The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9 What difference does it make? This is plain C.

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651369436 > > The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9 > > What difference does it make? This is plain C. :shrug: then I'll leave it to you

[GitHub] [arrow] wesm commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447264419 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447268862 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-650967222 @wesm Could you please review this? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-29 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r446816192 ## File path: r/src/array_from_vector.cpp ## @@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter { } }; +template +class

[GitHub] [arrow] github-actions[bot] commented on pull request #7579: ARROW-9242: [Java] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7579: URL: https://github.com/apache/arrow/pull/7579#issuecomment-650985250 https://issues.apache.org/jira/browse/ARROW-9242 This is an automated message from the Apache Git

[GitHub] [arrow] tianchen92 opened a new pull request #7579: ARROW-9242: [Java] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
tianchen92 opened a new pull request #7579: URL: https://github.com/apache/arrow/pull/7579 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] tianchen92 commented on a change in pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
tianchen92 commented on a change in pull request #7568: URL: https://github.com/apache/arrow/pull/7568#discussion_r446812811 ## File path: cpp/src/arrow/ipc/metadata_internal.cc ## @@ -257,6 +259,9 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void*

[GitHub] [arrow] tianchen92 commented on pull request #7579: ARROW-9242: [Java] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
tianchen92 commented on pull request #7579: URL: https://github.com/apache/arrow/pull/7579#issuecomment-650997334 cc @wesm This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] rymurr commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
rymurr commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446920091 ## File path: java/memory/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java ## @@ -95,4 +143,26 @@ public static long

[GitHub] [arrow] rymurr commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
rymurr commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-651078278 Thanks @liyafan82 I have updated based on your comments and the integration tests have passed. This is an

[GitHub] [arrow] rymurr commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
rymurr commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-651081660 looks like an ongoing github incident is causing build failures. Will rebase and rebuild once Github is back to normal

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446943865 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -68,6 +76,64 @@ TYPED_TEST(TestStringKernels, AsciiLower) {

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446964243 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived>

[GitHub] [arrow] pitrou opened a new pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou opened a new pull request #7580: URL: https://github.com/apache/arrow/pull/7580 The AWS SDK on manylinux packages uses a custom-compiled libcurl that is configured for the certificates of the build system. However, there's no standard location for CA certificates on Linux, so we

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651139396 @github-actions crossbow submit -g wheel This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651144126 https://issues.apache.org/jira/browse/ARROW-9261 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on a change in pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7571: URL: https://github.com/apache/arrow/pull/7571#discussion_r447002338 ## File path: cpp/src/arrow/ipc/options.h ## @@ -66,6 +67,10 @@ struct ARROW_EXPORT IpcWriteOptions { /// like compression bool use_threads = true;

[GitHub] [arrow] wesm commented on a change in pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7571: URL: https://github.com/apache/arrow/pull/7571#discussion_r447002951 ## File path: cpp/src/arrow/ipc/options.h ## @@ -66,6 +67,10 @@ struct ARROW_EXPORT IpcWriteOptions { /// like compression bool use_threads = true; +

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446952433 ## File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc ## @@ -41,7 +42,9 @@ static void UnaryStringBenchmark(benchmark::State&

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446952269 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived>

[GitHub] [arrow] github-actions[bot] commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651141065 Revision: b0dd66f17e94fbf61da0f1f616ccd5e627a79145 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on a change in pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7568: URL: https://github.com/apache/arrow/pull/7568#discussion_r446993778 ## File path: cpp/src/arrow/ipc/metadata_internal.cc ## @@ -257,6 +259,9 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,

[GitHub] [arrow] pitrou commented on a change in pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7571: URL: https://github.com/apache/arrow/pull/7571#discussion_r447003525 ## File path: cpp/src/arrow/ipc/options.h ## @@ -66,6 +67,10 @@ struct ARROW_EXPORT IpcWriteOptions { /// like compression bool use_threads = true;

[GitHub] [arrow] rymurr commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
rymurr commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446920388 ## File path: java/memory/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java ## @@ -78,6 +77,55 @@ public Object run() { Field addressField

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446934057 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446988586 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived>

[GitHub] [arrow] wesm commented on pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
wesm commented on pull request #7568: URL: https://github.com/apache/arrow/pull/7568#issuecomment-651142403 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm commented on a change in pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7568: URL: https://github.com/apache/arrow/pull/7568#discussion_r446994593 ## File path: cpp/src/arrow/ipc/metadata_internal.cc ## @@ -257,6 +259,9 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,

[GitHub] [arrow] pitrou commented on pull request #7575: ARROW-8671: [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files

2020-06-29 Thread GitBox
pitrou commented on pull request #7575: URL: https://github.com/apache/arrow/pull/7575#issuecomment-651147499 Did you add the regression file to the testing repository? (in `data/arrow-ipc-file` or `data/arrow-ipc-stream`, depending on the fuzzer which found it)

[GitHub] [arrow] wesm edited a comment on pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm edited a comment on pull request #7569: URL: https://github.com/apache/arrow/pull/7569#issuecomment-650803803 Benchmarks on gcc-8 ``` $ archery benchmark diff --cc=gcc-8 --cxx=g++-8 --benchmark-filter=FilterString benchmark

[GitHub] [arrow] wesm commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm commented on pull request #7571: URL: https://github.com/apache/arrow/pull/7571#issuecomment-651146389 @pitrou this is already merged (by accident actually, mistyped the PR number on the command line and went too fast), but let me know if you see anything concerning from a fuzz

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446938183 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -68,6 +76,64 @@ TYPED_TEST(TestStringKernels, AsciiLower) {

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446946562 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables +std::vector

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446958717 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -81,5 +147,40 @@ TYPED_TEST(TestStringKernels,

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446990168 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables

[GitHub] [arrow] wesm commented on pull request #7575: ARROW-8671: [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files

2020-06-29 Thread GitBox
wesm commented on pull request #7575: URL: https://github.com/apache/arrow/pull/7575#issuecomment-651148456 @pitrou the issue was caught by one of the existing fuzz files -- I had accidentally merged the PR without seeing the ASAN failure

[GitHub] [arrow] rymurr commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
rymurr commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r446921213 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,107 @@ package

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446934218 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r446934815 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived>

[GitHub] [arrow] wjones1 commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-06-29 Thread GitBox
wjones1 commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-651140872 Actually @jorisvandenbossche, I agree we should probably just add in the batch_size argument (with a sensible default) to those other methods. Took me a while to understand what

[GitHub] [arrow] wesm closed pull request #7568: ARROW-9241: [C++] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
wesm closed pull request #7568: URL: https://github.com/apache/arrow/pull/7568 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on a change in pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7571: URL: https://github.com/apache/arrow/pull/7571#discussion_r447004437 ## File path: cpp/src/arrow/ipc/options.h ## @@ -66,6 +67,10 @@ struct ARROW_EXPORT IpcWriteOptions { /// like compression bool use_threads = true; +

[GitHub] [arrow] xhochy commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
xhochy commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651162033 > The manylinux import test

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651165626 @pitrou many thanks for the review. I've implemented all you suggestions except: * Raising an error on invalid utf8 data (see comment) * Having a benchmark run on

[GitHub] [arrow] rjzamora commented on pull request #7546: ARROW-8733: [C++][Dataset][Python] Expose RowGroupInfo statistics values

2020-06-29 Thread GitBox
rjzamora commented on pull request #7546: URL: https://github.com/apache/arrow/pull/7546#issuecomment-651177576 Thanks for the great work here @bkietz ! This is wonderful - Dask uses the min/max statistics to calculate `divisions`, so this functionality is definitely necessary.

[GitHub] [arrow] wesm closed pull request #7579: ARROW-9242: [Java] Add forward compatibility check for Decimal bit width

2020-06-29 Thread GitBox
wesm closed pull request #7579: URL: https://github.com/apache/arrow/pull/7579 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651186700 If you're installing Brotli in any of the packaging setups, there may be a scenario where there is both the shared AND static library -- in that case there would be an issue. We

[GitHub] [arrow] kszucs closed pull request #7376: ARROW-9043: [Go][FOLLOWUP] Move license file copy to correct location

2020-06-29 Thread GitBox
kszucs closed pull request #7376: URL: https://github.com/apache/arrow/pull/7376 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] nealrichardson commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651182567 @wesm how can I help/what should I look for? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651188772 IIUC we're ok: * Windows: no brotli: https://github.com/apache/arrow/blob/master/ci/scripts/PKGBUILD * macOS: no brotli:

[GitHub] [arrow] jba commented on pull request #7376: ARROW-9043: [Go][FOLLOWUP] Move license file copy to correct location

2020-06-29 Thread GitBox
jba commented on pull request #7376: URL: https://github.com/apache/arrow/pull/7376#issuecomment-651164902 There's no feasible way to test this, unfortunately. I verified both my suggested changes by forking the repo, making the edits, and running an internal tool to verify that they

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651159316 The manylinux import test

[GitHub] [arrow] kszucs commented on pull request #7376: ARROW-9043: [Go][FOLLOWUP] Move license file copy to correct location

2020-06-29 Thread GitBox
kszucs commented on pull request #7376: URL: https://github.com/apache/arrow/pull/7376#issuecomment-651170232 Thanks @jba for the update. Merging it then. This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651181336 @xhochy In https://github.com/apache/arrow/blob/master/dev/tasks/python-wheels/manylinux-test.sh#L33 . I'm sure there's a better way to do that (e.g. spawn a minimal Python docker

[GitHub] [arrow] wesm commented on pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-29 Thread GitBox
wesm commented on pull request #7559: URL: https://github.com/apache/arrow/pull/7559#issuecomment-651187351 cc @pitrou or @jorisvandenbossche for review This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] wesm commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447030010 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

  1   2   >