[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651242409 @kou and @xhochy your advice would be welcome. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou opened a new pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou opened a new pull request #7581: URL: https://github.com/apache/arrow/pull/7581 Also add build options and preprocessor constants to represent git identification and package kind (e.g. "manylinux1"). This is an

[GitHub] [arrow] lidavidm opened a new pull request #7582: ARROW-8190: [FlightRPC][C++] Expose IPC options

2020-06-29 Thread GitBox
lidavidm opened a new pull request #7582: URL: https://github.com/apache/arrow/pull/7582 - Python is not covered as I'm not sure how best to expose these structs to Python. - Java is not covered as it doesn't use IpcOption at all currently; I'd rather hold off and see how the metadata

[GitHub] [arrow] github-actions[bot] commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651248175 https://issues.apache.org/jira/browse/ARROW-6521 This is an automated message from the Apache Git

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651257793 We still have 2 failures, one might need a restart (travis / no output), the other is still a linker error: ```

[GitHub] [arrow] pitrou commented on a change in pull request #7560: ARROW-9252: [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7560: URL: https://github.com/apache/arrow/pull/7560#discussion_r447134548 ## File path: ci/scripts/integration_arrow.sh ## @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +#

[GitHub] [arrow] maartenbreddels opened a new pull request #7583: [Doc][C++] docker compose lint -> ubuntu-link

2020-06-29 Thread GitBox
maartenbreddels opened a new pull request #7583: URL: https://github.com/apache/arrow/pull/7583 I guess the name changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] sbinet closed pull request #7483: ARROW-9174: [Go] Fix table panic on 386

2020-06-29 Thread GitBox
sbinet closed pull request #7483: URL: https://github.com/apache/arrow/pull/7483 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
pitrou commented on pull request #7576: URL: https://github.com/apache/arrow/pull/7576#issuecomment-651243430 Wouldn't it be more realistic to simply use 0.1% instead of 0.01%? This is an automated message from the Apache

[GitHub] [arrow] pitrou removed a comment on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou removed a comment on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651252311 Amusingly, even a minimal Debian or Ubuntu Docker image has `liblz4` and `liblzma`. This is an automated

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651252764 Amusingly, even a minimal Debian or Ubuntu Docker image has `liblz4` and `liblzma`. I think we can simply change the script not to remove the zlib.

[GitHub] [arrow] github-actions[bot] commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651261802 Revision: 989cd4023a59159b44f69a6d5f530acc815a2407 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] sbinet commented on pull request #7483: ARROW-9174: [Go] Fix table panic on 386

2020-06-29 Thread GitBox
sbinet commented on pull request #7483: URL: https://github.com/apache/arrow/pull/7483#issuecomment-651277610 apologies for the delay. I must admit I don't free many cycles for apache-arrow these days. LGTM though.

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447161925 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +158,121 @@ struct AsciiLength { } }; +template class Derived> +struct

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447161391 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables +std::vector

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447161391 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables +std::vector

[GitHub] [arrow] pitrou edited a comment on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou edited a comment on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282959 Main point remaining is whether we raise an error on invalid UTF8 input. I see no reason not to (an Arrow string array has to be valid UTF8 as per the spec, just like a

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447171303 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -133,23 +134,23 @@ struct Utf8Transform { output_string_offsets[i + 1] =

[GitHub] [arrow] pitrou commented on pull request #7580: ARROW-9261: [Python] Fix CA certificate lookup with S3 filesystem on manylinux

2020-06-29 Thread GitBox
pitrou commented on pull request #7580: URL: https://github.com/apache/arrow/pull/7580#issuecomment-651256468 @github-actions crossbow submit -g wheel This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on pull request #7583: [Doc][C++] docker compose lint -> ubuntu-link

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7583: URL: https://github.com/apache/arrow/pull/7583#issuecomment-651261350 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651261388 The R Windows builds will fail until either utf8proc is not required by default (https://issues.apache.org/jira/browse/ARROW-9220) or until libutf8proc is added as a

[GitHub] [arrow] nealrichardson commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [ ] Support nested types (requires adapting the data structure and

[GitHub] [arrow] pitrou closed pull request #7559: ARROW-9247: [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray

2020-06-29 Thread GitBox
pitrou closed pull request #7559: URL: https://github.com/apache/arrow/pull/7559 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447155149 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +73,103 @@ struct AsciiLength { } }; +template +struct

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447154836 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -15,13 +15,15 @@ // specific language governing permissions and limitations //

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282415 > Having a benchmark run on non-ascii codepoints (I think we want to do this separate from this PR, but important point). Yes, I think we can defer that to a separate PR.

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282959 Main point remaining is whether we raise an error on invalid UTF8 input. I see no reason not too (an Arrow string array has to be valid UTF8 as per the spec, just like a Python

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651252311 Amusingly, even a minimal Debian or Ubuntu Docker image has `liblz4` and `liblzma`. This is an automated message

[GitHub] [arrow] github-actions[bot] commented on pull request #7582: ARROW-8190: [FlightRPC][C++] Expose IPC options

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #7582: URL: https://github.com/apache/arrow/pull/7582#issuecomment-651251968 https://issues.apache.org/jira/browse/ARROW-8190 This is an automated message from the Apache Git

[GitHub] [arrow] kylebrandt commented on pull request #7483: ARROW-9174: [Go] Fix table panic on 386

2020-06-29 Thread GitBox
kylebrandt commented on pull request #7483: URL: https://github.com/apache/arrow/pull/7483#issuecomment-651255643 Hi @sbinet , new to contributing here (and see your name all over the Go code :-) ). Anything I need to do on my end for this to get merged? Thank you for all your work

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447143530 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -81,5 +147,40 @@ TYPED_TEST(TestStringKernels,

[GitHub] [arrow] pitrou commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447142388 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -39,6 +73,103 @@ struct AsciiLength { } }; +template +struct Utf8Transform { +

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447149548 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -30,6 +31,124 @@ namespace internal { namespace { +// lookup tables

[GitHub] [arrow] maartenbreddels commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on a change in pull request #7449: URL: https://github.com/apache/arrow/pull/7449#discussion_r447170380 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -133,23 +134,23 @@ struct Utf8Transform { output_string_offsets[i + 1]

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651289874 @pitrou your size commit made the benchmark go from `52->60 M/s`  > Yes, too. The main point of this state-machine-based decoder is that it's branchless, and so

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651322087 I just concluded the same :) This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651352858 Also, I'll let others add `-DARROW_PACKAGE_KIND=...` in other places. This is an automated message from the Apache

[GitHub] [arrow] pitrou commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
pitrou commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651352568 > It would be also nice to store the enabled features. Agreed, but that can be done in a separate PR. > How about adding int BuildInfo::version for ARROW_VERSION too?

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651316656 I pushed a commit that raises an error on invalid UTF8. It does not seem to make the benchmarks slower. This is

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651338264 @xhochy Could you help on the utf8proc issue on RTools 3.5? See here: https://github.com/apache/arrow/pull/7449/checks?check_run_id=819772618#step:10:169 It seems that

[GitHub] [arrow] kszucs commented on pull request #7581: ARROW-6521: [C++] Add an API to query runtime build info

2020-06-29 Thread GitBox
kszucs commented on pull request #7581: URL: https://github.com/apache/arrow/pull/7581#issuecomment-651339349 It would be also nice to store the enabled features. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson edited a comment on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-29 Thread GitBox
nealrichardson edited a comment on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-651263042 I'm taking this over. Outstanding TODOs: - [x] Add tests - [x] Support record batches - [x] Support nested types (requires adapting the data structure

[GitHub] [arrow] wesm commented on pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm commented on pull request #7571: URL: https://github.com/apache/arrow/pull/7571#issuecomment-651351599 I'll close this for now. Please leave any review comments and I can address them later This is an automated message

[GitHub] [arrow] kou merged pull request #7583: [Doc][C++] Follow docker-compose service name change for lint

2020-06-29 Thread GitBox
kou merged pull request #7583: URL: https://github.com/apache/arrow/pull/7583 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm closed pull request #7571: ARROW-8671: [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata

2020-06-29 Thread GitBox
wesm closed pull request #7571: URL: https://github.com/apache/arrow/pull/7571 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
pitrou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651353338 > This means there also needs to be a PKGBUILD Why? `libutf8proc` is installed. This is an automated

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-29 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-651342350 > @xhochy Could you help on the utf8proc issue on RTools 3.5? > See here: https://github.com/apache/arrow/pull/7449/checks?check_run_id=819772618#step:10:169 This

[GitHub] [arrow] wesm commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
wesm commented on pull request #7576: URL: https://github.com/apache/arrow/pull/7576#issuecomment-651350708 I think both the 1/1000 and 1/1 cases have something interesting to show perf wise, but in any case using 1M as the length in this benchmark seems OK.

[GitHub] [arrow] wesm closed pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
wesm closed pull request #7576: URL: https://github.com/apache/arrow/pull/7576 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7576: ARROW-9263: [C++] Promote compute aggregate benchmark size to 1M.

2020-06-29 Thread GitBox
wesm commented on pull request #7576: URL: https://github.com/apache/arrow/pull/7576#issuecomment-651350872 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

<    1   2