[GitHub] [arrow] emkornfield commented on a change in pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-29 Thread GitBox
emkornfield commented on a change in pull request #7535: URL: https://github.com/apache/arrow/pull/7535#discussion_r447396655 ## File path: docs/source/format/Columnar.rst ## @@ -566,33 +572,28 @@ having the values: ``[{f=1.2}, null, {f=3.4}, {i=5}]`` :: * Length: 4,

[GitHub] [arrow] emkornfield commented on a change in pull request #7535: ARROW-9222: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-29 Thread GitBox
emkornfield commented on a change in pull request #7535: URL: https://github.com/apache/arrow/pull/7535#discussion_r447396488 ## File path: docs/source/format/Columnar.rst ## @@ -688,11 +687,10 @@ will have the following layout: ::

[GitHub] [arrow] emkornfield commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
emkornfield commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651515749 @liyafan82 I think that is a good point. If it supports both modes I think that is a reasonable compromise for now as long as @jacques-n is OK with it. But we can maybe

[GitHub] [arrow] emkornfield commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-06-29 Thread GitBox
emkornfield commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-651509291 Can we leave the old method in place and mark it as deprecated and remove in a later release? This is an

[GitHub] [arrow] github-actions[bot] commented on pull request #7586: Calculate page and column statistics

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7586: URL: https://github.com/apache/arrow/pull/7586#issuecomment-651495743 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] liyafan82 edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
liyafan82 edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651492666 In addition to the problem of top level validity buffer, I think there is another problem to discuss: Java is using the ordinal of the minor type as the type id, and

[GitHub] [arrow] liyafan82 commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-29 Thread GitBox
liyafan82 commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-651492666 In addition to the problem of top level validity buffer, I think there is another problem to discuss: Java is using the ordinal of the minor type as the type id, and this is

[GitHub] [arrow] zeevm opened a new pull request #7586: Calculate page and column statistics

2020-06-29 Thread GitBox
zeevm opened a new pull request #7586: URL: https://github.com/apache/arrow/pull/7586 1. Calculate page and column statistics 2. Use pre-calculated statistics when available to speed-up when writing data from other formats like ORC.

[GitHub] [arrow] nealrichardson commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651484901 Ok, this isn't necessarily pretty but I think it's done, or done enough for here. I'll add some more tests, probably some docs for the format, and poke around a bit more

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-651474250 @rymurr Thanks for your work. A few typos. I think it would be ready for merge. This is an automated

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363600 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -227,13 +207,25 @@ public ArrowBuf slice(long index, long length) {

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363481 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,107 @@ package

[GitHub] [arrow] liyafan82 commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-29 Thread GitBox
liyafan82 commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r447363293 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -17,33 +17,103 @@ package

[GitHub] [arrow] github-actions[bot] commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651467252 Revision: 821f30a834dab99cdc757100e51986384f0a391c Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651466752 @github-actions crossbow submit -g linux -g wheel -g conda This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651466442 Actually, that's crazy. I'm taking the same approach as ZSTD and adding a CMake toggle between shared and static Brotli (with default being shared)

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651464683 Apparently the `-DBUILD_SHARED_LIBS=OFF` option for Brotli doesn't do anything. I'll add some code to scrub the shared libs from the manylinux images

[GitHub] [arrow] github-actions[bot] commented on pull request #7585: ARROW-3520: [C++] Add "list_flatten" vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7585: URL: https://github.com/apache/arrow/pull/7585#issuecomment-651460741 https://issues.apache.org/jira/browse/ARROW-3520 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
wesm commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651460507 Hm not so fast. The macOS py35 failure seems legitimate https://travis-ci.org/github/ursa-labs/crossbow/builds/703242650#L10060

[GitHub] [arrow] wesm closed pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm closed pull request #7560: URL: https://github.com/apache/arrow/pull/7560 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-651458622 Yahtzee This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] mrkn edited a comment on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn edited a comment on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651458470 @wesm OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue.

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651458470 OK. I continue to work for benchmarking in this pull-request. If I need more time to tune etc., I'll split the issue.

[GitHub] [arrow] wesm closed pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm closed pull request #7569: URL: https://github.com/apache/arrow/pull/7569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7569: ARROW-9152: [C++] Specialized implementation of filtering Binary/LargeBinary-based types

2020-06-29 Thread GitBox
wesm commented on pull request #7569: URL: https://github.com/apache/arrow/pull/7569#issuecomment-651457980 +1, this is a bit dry so would rather reviewers reserve their time for other PRs This is an automated message from

[GitHub] [arrow] wesm edited a comment on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm edited a comment on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651457062 @mrkn it's up to you, it's fine with me if you work on performance tuning (or at least measurement) in another PR or this one

[GitHub] [arrow] wesm commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651457062 @mrkn it's up to you, it's fine with me if you work on performance tuning in another PR or this one This is an

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-651456329 @wesm Is it better to work for benchmarking in other pull-request? This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] wesm closed pull request #7585: ARROW-3520: [C++] WIP Add vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
wesm closed pull request #7585: URL: https://github.com/apache/arrow/pull/7585 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm opened a new pull request #7585: ARROW-3520: [C++] WIP Add vector kernel wrapper for Flatten method of ListArray types

2020-06-29 Thread GitBox
wesm opened a new pull request #7585: URL: https://github.com/apache/arrow/pull/7585 I'm testing a JIRA webhook, I'll close this PR and then reopen it when the patch is done This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
wesm commented on pull request #7560: URL: https://github.com/apache/arrow/pull/7560#issuecomment-651423838 Will merge this if the build passes with the arrow-testing changes This is an automated message from the Apache Git

[GitHub] [arrow] zhztheplayer commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-29 Thread GitBox
zhztheplayer commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-651422684 Thanks for the comments! I've got some stuffs to deal with these days. Will address as soon as possible.

[GitHub] [arrow] github-actions[bot] commented on pull request #7584: ARROW-9272: [C++][Python] Reduce complexity in python to arrow conversion

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7584: URL: https://github.com/apache/arrow/pull/7584#issuecomment-651404016 https://issues.apache.org/jira/browse/ARROW-9272 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs opened a new pull request #7584: ARROW-9272: [C++][Python] Reduce complexity in python to arrow conversion

2020-06-29 Thread GitBox
kszucs opened a new pull request #7584: URL: https://github.com/apache/arrow/pull/7584 The original motivation for this patch was to reuse the same conversions path for both the scalars and arrays. In my recent patch the scalars are converted from a single element list to a single

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447268862 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447267538 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] wesm commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
wesm commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447264419 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] mrkn commented on a change in pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-06-29 Thread GitBox
mrkn commented on a change in pull request #7539: URL: https://github.com/apache/arrow/pull/7539#discussion_r447259489 ## File path: cpp/src/arrow/tensor/csf_converter.cc ## @@ -57,73 +57,86 @@ inline void IncrementIndex(std::vector& coord, const std::vector -class

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651369436 > > The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9 > > What difference does it make? This is plain C. :shrug: then I'll leave it to you

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651368104 Indeed, toolchain incompatibilities only affect C++ code This is an automated message from the Apache Git Service.

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651366993 > The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9 What difference does it make? This is plain C.

[GitHub] [arrow] kou commented on a change in pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
kou commented on a change in pull request #7581: URL: https://github.com/apache/arrow/pull/7581#discussion_r447247927 ## File path: cpp/src/arrow/config.h ## @@ -0,0 +1,47 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651355763 > > This means there also needs to be a PKGBUILD > > Why? `libutf8proc` is installed. The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9.

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651353338 > This means there also needs to be a PKGBUILD Why? `libutf8proc` is installed. This is an automated

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651352568 > It would be also nice to store the enabled features. Agreed, but that can be done in a separate PR. > How about adding int BuildInfo::version for ARROW_VERSION too?

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651352858 Also, I'll let others add `-DARROW_PACKAGE_KIND=...` in other places. This is an automated message from the Apache

[GitHub] [arrow] wesm commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm commented on pull request #7571: URL: https://github.com/apache/arrow/pull/7571#issuecomment-651351599 I'll close this for now. Please leave any review comments and I can address them later This is an automated message

[GitHub] [arrow] kou merged pull request #7583: [Doc][C++] Follow docker-compose service name change for lint

2020-06-29 Thread GitBox
kou merged pull request #7583: URL: https://github.com/apache/arrow/pull/7583 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm closed pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm closed pull request #7571: URL: https://github.com/apache/arrow/pull/7571 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm closed pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
wesm closed pull request #7576: URL: https://github.com/apache/arrow/pull/7576 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
wesm commented on pull request #7576: URL: https://github.com/apache/arrow/pull/7576#issuecomment-651350872 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
wesm commented on pull request #7576: URL: https://github.com/apache/arrow/pull/7576#issuecomment-651350708 I think both the 1/1000 and 1/1 cases have something interesting to show perf wise, but in any case using 1M as the length in this benchmark seems OK.

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651342350 > @xhochy Could you help on the utf8proc issue on RTools 3.5? > See here: https://github.com/apache/arrow/pull/7449/checks?check_run_id=819772618#step:10:169 This

[GitHub] [arrow] kszucs commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
kszucs commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651339349 It would be also nice to store the enabled features. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651338264 @xhochy Could you help on the utf8proc issue on RTools 3.5? See here: https://github.com/apache/arrow/pull/7449/checks?check_run_id=819772618#step:10:169 It seems that

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651322087 I just concluded the same :) This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651316656 I pushed a commit that raises an error on invalid UTF8. It does not seem to make the benchmarks slower. This is

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651289874 @pitrou your size commit made the benchmark go from `52->60 M/s`  > Yes, too. The main point of this state-machine-based decoder is that it's branchless, and so

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447171303 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -133,23 +134,23 @@ struct Utf8Transform { output_string_offsets[i + 1] =

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447170380 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -133,23 +134,23 @@ struct Utf8Transform { output_string_offsets[i + 1]

[GitHub] [arrow] pitrou edited a comment on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou edited a comment on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282959 Main point remaining is whether we raise an error on invalid UTF8 input. I see no reason not to (an Arrow string array has to be valid UTF8 as per the spec, just like a

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282959 Main point remaining is whether we raise an error on invalid UTF8 input. I see no reason not too (an Arrow string array has to be valid UTF8 as per the spec, just like a Python

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282415 > Having a benchmark run on non-ascii codepoints (I think we want to do this separate from this PR, but important point). Yes, I think we can defer that to a separate PR.

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447161925 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived> +struct

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447161391 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables +std::vector

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447161391 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables +std::vector

[GitHub] [arrow] sbinet closed pull request #7483: ARROW-9174: [Go] Fix table panic on 386

2020-06-29 Thread GitBox
sbinet closed pull request #7483: URL: https://github.com/apache/arrow/pull/7483 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] sbinet commented on pull request #7483: ARROW-9174: [Go] Fix table panic on 386

2020-06-29 Thread GitBox
sbinet commented on pull request #7483: URL: https://github.com/apache/arrow/pull/7483#issuecomment-651277610 apologies for the delay. I must admit I don't free many cycles for apache-arrow these days. LGTM though.

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447154836 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -15,13 +15,15 @@ // specific language governing permissions and limitations //

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447155149 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +73,103 @@ struct AsciiLength { } }; +template +struct

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447149548 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447143530 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -81,5 +147,40 @@ TYPED_TEST(TestStringKernels,

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447142388 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +73,103 @@ struct AsciiLength { } }; +template +struct Utf8Transform { +

[GitHub] [arrow] nealrichardson commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [ ] Support nested types (requires adapting the data structure and

[GitHub] [arrow] pitrou closed pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-29 Thread GitBox
pitrou closed pull request #7559: URL: https://github.com/apache/arrow/pull/7559 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651261802 Revision: 989cd4023a59159b44f69a6d5f530acc815a2407 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] github-actions[bot] commented on pull request #7583: [Doc][C++] docker compose lint -> ubuntu-link

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7583: URL: https://github.com/apache/arrow/pull/7583#issuecomment-651261350 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651261388 The R Windows builds will fail until either utf8proc is not required by default (https://issues.apache.org/jira/browse/ARROW-9220) or until libutf8proc is added as a

[GitHub] [arrow] maartenbreddels opened a new pull request #7583: [Doc][C++] docker compose lint -> ubuntu-link

2020-06-29 Thread GitBox
maartenbreddels opened a new pull request #7583: URL: https://github.com/apache/arrow/pull/7583 I guess the name changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651257793 We still have 2 failures, one might need a restart (travis / no output), the other is still a linker error: ```

[GitHub] [arrow] pitrou commented on a change in pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7560: URL: https://github.com/apache/arrow/pull/7560#discussion_r447134548 ## File path: ci/scripts/integration_arrow.sh ## @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +#

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651256468 @github-actions crossbow submit -g wheel This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] kylebrandt commented on pull request #7483: ARROW-9174: [Go] Fix table panic on 386

2020-06-29 Thread GitBox
kylebrandt commented on pull request #7483: URL: https://github.com/apache/arrow/pull/7483#issuecomment-651255643 Hi @sbinet , new to contributing here (and see your name all over the Go code :-) ). Anything I need to do on my end for this to get merged? Thank you for all your work

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651252764 Amusingly, even a minimal Debian or Ubuntu Docker image has `liblz4` and `liblzma`. I think we can simply change the script not to remove the zlib.

[GitHub] [arrow] pitrou removed a comment on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou removed a comment on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651252311 Amusingly, even a minimal Debian or Ubuntu Docker image has `liblz4` and `liblzma`. This is an automated

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651252311 Amusingly, even a minimal Debian or Ubuntu Docker image has `liblz4` and `liblzma`. This is an automated message

[GitHub] [arrow] github-actions[bot] commented on pull request #7582: ARROW-8190: [FlightRPC][C++] Expose IPC options

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7582: URL: https://github.com/apache/arrow/pull/7582#issuecomment-651251968 https://issues.apache.org/jira/browse/ARROW-8190 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651248175 https://issues.apache.org/jira/browse/ARROW-6521 This is an automated message from the Apache Git

[GitHub] [arrow] lidavidm opened a new pull request #7582: ARROW-8190: [FlightRPC][C++] Expose IPC options

2020-06-29 Thread GitBox
lidavidm opened a new pull request #7582: URL: https://github.com/apache/arrow/pull/7582 - Python is not covered as I'm not sure how best to expose these structs to Python. - Java is not covered as it doesn't use IpcOption at all currently; I'd rather hold off and see how the metadata

[GitHub] [arrow] pitrou commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
pitrou commented on pull request #7576: URL: https://github.com/apache/arrow/pull/7576#issuecomment-651243430 Wouldn't it be more realistic to simply use 0.1% instead of 0.01%? This is an automated message from the Apache

[GitHub] [arrow] pitrou opened a new pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou opened a new pull request #7581: URL: https://github.com/apache/arrow/pull/7581 Also add build options and preprocessor constants to represent git identification and package kind (e.g. "manylinux1"). This is an

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651242409 @kou and @xhochy your advice would be welcome. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651188772 IIUC we're ok: * Windows: no brotli: https://github.com/apache/arrow/blob/master/ci/scripts/PKGBUILD * macOS: no brotli:

[GitHub] [arrow] wesm commented on pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-29 Thread GitBox
wesm commented on pull request #7559: URL: https://github.com/apache/arrow/pull/7559#issuecomment-651187351 cc @pitrou or @jorisvandenbossche for review This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] wesm commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
wesm commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651186700 If you're installing Brotli in any of the packaging setups, there may be a scenario where there is both the shared AND static library -- in that case there would be an issue. We

[GitHub] [arrow] nealrichardson commented on pull request #7556: ARROW-9188: [C++] Use Brotli shared libraries if they are available

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7556: URL: https://github.com/apache/arrow/pull/7556#issuecomment-651182567 @wesm how can I help/what should I look for? This is an automated message from the Apache Git Service. To

  1   2   >