[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651352858 Also, I'll let others add `-DARROW_PACKAGE_KIND=...` in other places. This is an automated message from the Apache

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651352568 > It would be also nice to store the enabled features. Agreed, but that can be done in a separate PR. > How about adding int BuildInfo::version for ARROW_VERSION too?

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651353338 > This means there also needs to be a PKGBUILD Why? `libutf8proc` is installed. This is an automated messag

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651355763 > > This means there also needs to be a PKGBUILD > > Why? `libutf8proc` is installed. The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9. Most

[GitHub] [arrow] kou commented on a change in pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
kou commented on a change in pull request #7581: URL: https://github.com/apache/arrow/pull/7581#discussion_r447247927 ## File path: cpp/src/arrow/config.h ## @@ -0,0 +1,47 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651366993 > The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9 What difference does it make? This is plain C. ---

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651368104 Indeed, toolchain incompatibilities only affect C++ code This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651369436 > > The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9 > > What difference does it make? This is plain C. :shrug: then I'll leave it to you to

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447259489 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class SparseCSFTe

[GitHub] [arrow] wesm commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447264419 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class SparseCSFTe

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447267538 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class SparseCSFTe

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447268862 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class SparseCSFTe

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure and

[GitHub] [arrow] kszucs opened a new pull request #7584: ARROW-9272: [C++][Python] Reduce complexity in python to arrow conversion

2020-06-29 Thread GitBox
kszucs opened a new pull request #7584: URL: https://github.com/apache/arrow/pull/7584 The original motivation for this patch was to reuse the same conversions path for both the scalars and arrays. In my recent patch the scalars are converted from a single element list to a single

[GitHub] [arrow] github-actions[bot] commented on pull request #7584: ARROW-9272: [C++][Python] Reduce complexity in python to arrow conversion

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7584: URL: https://github.com/apache/arrow/pull/7584#issuecomment-651404016 https://issues.apache.org/jira/browse/ARROW-9272 This is an automated message from the Apache Git Serv

[GitHub] [arrow] zhztheplayer commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-29 Thread GitBox
zhztheplayer commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-651422684 Thanks for the comments! I've got some stuffs to deal with these days. Will address as soon as possible. Thi

[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-651423838 Will merge this if the build passes with the arrow-testing changes This is an automated message from the Apache Git S

[GitHub] [arrow] wesm opened a new pull request #7585: ARROW-3520: [C++] WIP Add vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
wesm opened a new pull request #7585: URL: https://github.com/apache/arrow/pull/7585 I'm testing a JIRA webhook, I'll close this PR and then reopen it when the patch is done This is an automated message from the Apache Git S

[GitHub] [arrow] wesm closed pull request #7585: ARROW-3520: [C++] WIP Add vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
wesm closed pull request #7585: URL: https://github.com/apache/arrow/pull/7585 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure and

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651456329 @wesm Is it better to work for benchmarking in other pull-request? This is an automated message from the Apache Git S

[GitHub] [arrow] wesm edited a comment on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm edited a comment on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651457062 @mrkn it's up to you, it's fine with me if you work on performance tuning (or at least measurement) in another PR or this one

[GitHub] [arrow] wesm commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651457062 @mrkn it's up to you, it's fine with me if you work on performance tuning in another PR or this one This is an autom

[GitHub] [arrow] wesm commented on pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm commented on pull request #7569: URL: https://github.com/apache/arrow/pull/7569#issuecomment-651457980 +1, this is a bit dry so would rather reviewers reserve their time for other PRs This is an automated message from t

[GitHub] [arrow] wesm closed pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm closed pull request #7569: URL: https://github.com/apache/arrow/pull/7569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] mrkn edited a comment on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn edited a comment on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651458470 @wesm OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue. -

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651458470 OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue. --

[GitHub] [arrow] wesm closed pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm closed pull request #7560: URL: https://github.com/apache/arrow/pull/7560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-651458622 Yahtzee This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] wesm commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
wesm commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651460507 Hm not so fast. The macOS py35 failure seems legitimate https://travis-ci.org/github/ursa-labs/crossbow/builds/703242650#L10060 ---

[GitHub] [arrow] github-actions[bot] commented on pull request #7585: ARROW-3520: [C++] Add "list_flatten" vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7585: URL: https://github.com/apache/arrow/pull/7585#issuecomment-651460741 https://issues.apache.org/jira/browse/ARROW-3520 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651464683 Apparently the `-DBUILD_SHARED_LIBS=OFF` option for Brotli doesn't do anything. I'll add some code to scrub the shared libs from the manylinux images ---

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651466442 Actually, that's crazy. I'm taking the same approach as ZSTD and adding a CMake toggle between shared and static Brotli (with default being shared) -

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651466752 @github-actions crossbow submit -g linux -g wheel -g conda This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651467252 Revision: 821f30a834dab99cdc757100e51986384f0a391c Submitted crossbow builds: [ursa-labs/crossbow @ actions-367](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363293 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,103 @@ package org.apache.arrow.m

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363481 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,107 @@ package org.apache.arrow.m

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363600 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -227,13 +207,25 @@ public ArrowBuf slice(long index, long length) {

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-651474250 @rymurr Thanks for your work. A few typos. I think it would be ready for merge. This is an automated messag

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure and

[GitHub] [arrow] nealrichardson commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651484901 Ok, this isn't necessarily pretty but I think it's done, or done enough for here. I'll add some more tests, probably some docs for the format, and poke around a bit more wh

[GitHub] [arrow] zeevm opened a new pull request #7586: Calculate page and column statistics

2020-06-29 Thread GitBox
zeevm opened a new pull request #7586: URL: https://github.com/apache/arrow/pull/7586 1. Calculate page and column statistics 2. Use pre-calculated statistics when available to speed-up when writing data from other formats like ORC. -

[GitHub] [arrow] liyafan82 commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
liyafan82 commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651492666 In addition to the problem of top level validity buffer, I think there is another problem to discuss: Java is using the ordinal of the minor type as the type id, and this is not

[GitHub] [arrow] liyafan82 edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
liyafan82 edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651492666 In addition to the problem of top level validity buffer, I think there is another problem to discuss: Java is using the ordinal of the minor type as the type id, and this

[GitHub] [arrow] github-actions[bot] commented on pull request #7586: Calculate page and column statistics

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7586: URL: https://github.com/apache/arrow/pull/7586#issuecomment-651495743 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] emkornfield commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-06-29 Thread GitBox
emkornfield commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-651509291 Can we leave the old method in place and mark it as deprecated and remove in a later release? This is an auto

[GitHub] [arrow] emkornfield commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
emkornfield commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651515749 @liyafan82 I think that is a good point. If it supports both modes I think that is a reasonable compromise for now as long as @jacques-n is OK with it. But we can maybe disc

[GitHub] [arrow] emkornfield commented on a change in pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-29 Thread GitBox
emkornfield commented on a change in pull request #7535: URL: https://github.com/apache/arrow/pull/7535#discussion_r447396488 ## File path: docs/source/format/Columnar.rst ## @@ -688,11 +687,10 @@ will have the following layout: :: ||---

[GitHub] [arrow] emkornfield commented on a change in pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-29 Thread GitBox
emkornfield commented on a change in pull request #7535: URL: https://github.com/apache/arrow/pull/7535#discussion_r447396655 ## File path: docs/source/format/Columnar.rst ## @@ -566,33 +572,28 @@ having the values: ``[{f=1.2}, null, {f=3.4}, {i=5}]`` :: * Length: 4, Nu

[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r447441667 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache So

[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r447442441 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Soft

<    1   2