[GitHub] [arrow] jianxind edited a comment on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind edited a comment on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-649198815 Here is the results: https://ci.ursalabs.org/#/builders/73/builds/90/steps/3/logs/result, not know why it not paste here, I see all 0.01% get positive results.

[GitHub] [arrow] jianxind edited a comment on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind edited a comment on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-649198815 Here is the results: https://ci.ursalabs.org/#/builders/73/builds/90/steps/3/logs/result, not know why it not paste here, I see all 0.01% get positive results.

[GitHub] [arrow] jianxind commented on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-649198815 Here is the results: https://ci.ursalabs.org/#/builders/73/builds/90/steps/3/logs/result, not know why it not paste here, I see all 0.01% get positive results. 106

[GitHub] [arrow] jianxind commented on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-649197066 @ursabot benchmark --suite-filter=parquet-encoding-benchmark --benchmark-filter=BM_Plain This is an automated

[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-24 Thread GitBox
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r445289548 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -427,6 +469,14 @@ class TestIpcRoundTrip : public ::testing::TestWithParam, void TearDown() {

[GitHub] [arrow] wesm commented on pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-24 Thread GitBox
wesm commented on pull request #7439: URL: https://github.com/apache/arrow/pull/7439#issuecomment-649195633 The Java docs don't build at all ``` + mvn -B -DskipTests -Drat.skip=true -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn

[GitHub] [arrow] wesm closed pull request #7439: ARROW-4309: [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled

2020-06-24 Thread GitBox
wesm closed pull request #7439: URL: https://github.com/apache/arrow/pull/7439 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on a change in pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7531: URL: https://github.com/apache/arrow/pull/7531#discussion_r445281773 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] wesm closed pull request #7538: ARROW-7925: [C++][Docs] Better document use of IWYU, including new 'match' option

2020-06-24 Thread GitBox
wesm closed pull request #7538: URL: https://github.com/apache/arrow/pull/7538 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7538: ARROW-7925: [C++][Docs] Better document use of IWYU, including new 'match' option

2020-06-24 Thread GitBox
wesm commented on pull request #7538: URL: https://github.com/apache/arrow/pull/7538#issuecomment-649186060 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] bkietz commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-24 Thread GitBox
bkietz commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-649154064 Actually, on reflection: I'm not sure it's worthwhile to check the count of unique values at all. In any given batch a virtual column would be materialized with a single-item

[GitHub] [arrow] jianxind commented on a change in pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on a change in pull request #7531: URL: https://github.com/apache/arrow/pull/7531#discussion_r445249689 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] jianxind commented on a change in pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on a change in pull request #7531: URL: https://github.com/apache/arrow/pull/7531#discussion_r445249248 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] jianxind commented on a change in pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on a change in pull request #7531: URL: https://github.com/apache/arrow/pull/7531#discussion_r445249183 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] bkietz commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-24 Thread GitBox
bkietz commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-649148829 I think there's value in finding the smallest index type possible; we expect partition fields to have few unique values in most cases.

[GitHub] [arrow] jianxind commented on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-649147883 @ursabot benchmark --suite-filter=parquet-encoding-benchmark This is an automated message from the Apache Git

[GitHub] [arrow] jianxind commented on a change in pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on a change in pull request #7531: URL: https://github.com/apache/arrow/pull/7531#discussion_r445245990 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] wesm commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-24 Thread GitBox
wesm commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-649145717 We could use just int32() dictionary indices and call it a day? This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7538: ARROW-7925: [C++][Docs] Better document use of IWYU, including new 'match' option

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7538: URL: https://github.com/apache/arrow/pull/7538#issuecomment-649144542 https://issues.apache.org/jira/browse/ARROW-7925 This is an automated message from the Apache Git

[GitHub] [arrow] wesm opened a new pull request #7538: ARROW-7925: [C++][Docs] Better document use of IWYU, including new 'match' option

2020-06-24 Thread GitBox
wesm opened a new pull request #7538: URL: https://github.com/apache/arrow/pull/7538 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm commented on pull request #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-06-24 Thread GitBox
wesm commented on pull request #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-649124737 You should be able to just rebase and the problem will go away This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7537: URL: https://github.com/apache/arrow/pull/7537#discussion_r445221159 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) { return PyFloat_Check(obj) &&

[GitHub] [arrow] wesm opened a new pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True

2020-06-24 Thread GitBox
wesm opened a new pull request #7537: URL: https://github.com/apache/arrow/pull/7537 This has been the root cause of a number of bugs. I'm unclear if there's a race condition with tearing down a `static OwnedRef` so we might need some other approach to managing symbols imported from

[GitHub] [arrow] github-actions[bot] commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-649112307 https://issues.apache.org/jira/browse/ARROW-8647 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7535: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7535: URL: https://github.com/apache/arrow/pull/7535#issuecomment-649112294 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] bkietz opened a new pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-24 Thread GitBox
bkietz opened a new pull request #7536: URL: https://github.com/apache/arrow/pull/7536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm opened a new pull request #7535: [Format][DONOTMERGE] Columnar.rst changes for removing validity bitmap from union types

2020-06-24 Thread GitBox
wesm opened a new pull request #7535: URL: https://github.com/apache/arrow/pull/7535 See mailing list discussion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] BryanCutler commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-06-24 Thread GitBox
BryanCutler commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-649103303 I mean the current process for integration tests with the master branch is to build Spark with Arrow Java master, then run Java and Python tests. That process is good for

[GitHub] [arrow] wesm commented on pull request #7396: ARROW-9092: [C++][TRIAGE] Do not enable TestRoundFunctions when using LLVM 9 until gandiva-decimal-test is fixed

2020-06-24 Thread GitBox
wesm commented on pull request #7396: URL: https://github.com/apache/arrow/pull/7396#issuecomment-649102716 I used perf to record some data about the hung function ``` + 83.37% 0.00% gandiva-decimal [unknown] [.] 0x + 65.62%

[GitHub] [arrow] wesm commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-24 Thread GitBox
wesm commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-649098754 Once the utf8_lower/utf8_upper patch lands I am going to make utf8proc not mandatory. See ARROW-9220. This is an

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9153: [C++][Python] Refactor scalar bindings

2020-06-24 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r445194648 ## File path: python/pyarrow/tests/test_misc.py ## @@ -120,7 +120,6 @@ def test_cpu_count(): pa.LargeListValue, pa.MapValue,

[GitHub] [arrow] kou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox
kou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-649090497 Rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] kou closed pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-24 Thread GitBox
kou closed pull request #7452: URL: https://github.com/apache/arrow/pull/7452 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kou commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-24 Thread GitBox
kou commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-649085709 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] kou commented on a change in pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-24 Thread GitBox
kou commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r445184425 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -427,6 +469,14 @@ class TestIpcRoundTrip : public ::testing::TestWithParam, void TearDown() {

[GitHub] [arrow] xrl commented on pull request #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-06-24 Thread GitBox
xrl commented on pull request #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-649081934 @wesm I'm the original author and I'd love to wrap this up. I can probably figure out how to debug some ruby for that release script bug.

[GitHub] [arrow] wesm commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-24 Thread GitBox
wesm commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-649079161 There's a small rebase conflict here. Need help from Java folks here, @rymurr can you help? This is an automated

[GitHub] [arrow] wesm closed pull request #7014: ARROW-8563: [Go] Minor change to make newBuilder public

2020-06-24 Thread GitBox
wesm closed pull request #7014: URL: https://github.com/apache/arrow/pull/7014 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-06-24 Thread GitBox
wesm commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-649077394 @sonthonaxrk I would recommend opening a new PR This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] wesm commented on pull request #6725: ARROW-8226: [Go] Implement 64 bit offsets binary builder

2020-06-24 Thread GitBox
wesm commented on pull request #6725: URL: https://github.com/apache/arrow/pull/6725#issuecomment-649076794 @sbinet could you assist with this? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] wesm commented on pull request #6725: ARROW-8226: [Go] Implement 64 bit offsets binary builder

2020-06-24 Thread GitBox
wesm commented on pull request #6725: URL: https://github.com/apache/arrow/pull/6725#issuecomment-649076694 Hm, at a high level it seems like it might be better to have a separate set of LargeBinary types rather than try to pack both 32-bit and 64-bit into the same types. This means some

[GitHub] [arrow] wesm commented on pull request #4140: ARROW-5123: [Rust] Parquet derive for simple structs

2020-06-24 Thread GitBox
wesm commented on pull request #4140: URL: https://github.com/apache/arrow/pull/4140#issuecomment-649075783 Any hope of rehabilitating this for 1.0.0? This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] wesm commented on pull request #7374: ARROW-9064: [APT] Use --no-install-recommends

2020-06-24 Thread GitBox
wesm commented on pull request #7374: URL: https://github.com/apache/arrow/pull/7374#issuecomment-649075242 I'm closing this This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] wesm closed pull request #7374: ARROW-9064: [APT] Use --no-install-recommends

2020-06-24 Thread GitBox
wesm closed pull request #7374: URL: https://github.com/apache/arrow/pull/7374 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7376: ARROW-9043: [Go][FOLLOWUP] Move license file copy to correct location

2020-06-24 Thread GitBox
wesm commented on pull request #7376: URL: https://github.com/apache/arrow/pull/7376#issuecomment-649074875 @kszucs can you take this over? This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] wesm closed pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-24 Thread GitBox
wesm closed pull request #7520: URL: https://github.com/apache/arrow/pull/7520 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm closed pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
wesm closed pull request #7533: URL: https://github.com/apache/arrow/pull/7533 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
wesm commented on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-649067717 Benchmark results ``` $ archery benchmark diff --cc=gcc-8 --cxx=g++-8 jianxind/BitBlockSpaced master --suite-filter=parquet-encoding

[GitHub] [arrow] kou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox
kou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-649065240 Oh, sorry. It seems that I saw wrong CI jobs. The link problem has been fixed by the workaround. I'll cherry pick the workaround to

[GitHub] [arrow] kszucs commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-06-24 Thread GitBox
kszucs commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-649043272 > @kszucs I submitted a patch to fix Java compilation with Spark master and branch-3.0, and tested locally with the latest pyarrow so Spark integration tests should pass for these

[GitHub] [arrow] wesm commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r445127367 ## File path: r/src/array_from_vector.cpp ## @@ -915,6 +924,39 @@ class Time64Converter : public TimeConverter { } }; +class BinaryVectorConverter :

[GitHub] [arrow] wesm commented on a change in pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7531: URL: https://github.com/apache/arrow/pull/7531#discussion_r445122603 ## File path: cpp/src/arrow/util/spaced.h ## @@ -0,0 +1,199 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] bkietz closed pull request #7493: ARROW-9183: [C++] Fix build with clang & old libstdc++.

2020-06-24 Thread GitBox
bkietz closed pull request #7493: URL: https://github.com/apache/arrow/pull/7493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7534: ARROW-8729: [C++][Dataset] Ensure non-empty batches when only virtual columns are projected

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7534: URL: https://github.com/apache/arrow/pull/7534#issuecomment-649022873 https://issues.apache.org/jira/browse/ARROW-8729 This is an automated message from the Apache Git

[GitHub] [arrow] bkietz opened a new pull request #7534: ARROW-8729: [C++][Dataset] Ensure non-empty batches when only virtual columns are projected

2020-06-24 Thread GitBox
bkietz opened a new pull request #7534: URL: https://github.com/apache/arrow/pull/7534 This bug is inherited from `parquet::arrow::RowGroupRecordBatchReader`, which yielded empty record batches when no columns were projected because no field readers were available from which to derive

[GitHub] [arrow] wesm closed pull request #7498: ARROW-9091: [C++][Compute] Add default FunctionOptions

2020-06-24 Thread GitBox
wesm closed pull request #7498: URL: https://github.com/apache/arrow/pull/7498 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] nealrichardson commented on pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-24 Thread GitBox
nealrichardson commented on pull request #7527: URL: https://github.com/apache/arrow/pull/7527#issuecomment-649000526 @romainfrancois this is ready for (and seriously needs) your review. Tests should be passing now. This is

[GitHub] [arrow] nealrichardson commented on a change in pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-24 Thread GitBox
nealrichardson commented on a change in pull request #7527: URL: https://github.com/apache/arrow/pull/7527#discussion_r445098056 ## File path: r/R/schema.R ## @@ -83,16 +83,21 @@ Schema <- R6Class("Schema", } ), active = list( -names = function()

[GitHub] [arrow] alexbaden commented on pull request #7263: ARROW-8927: [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions

2020-06-24 Thread GitBox
alexbaden commented on pull request #7263: URL: https://github.com/apache/arrow/pull/7263#issuecomment-648988459 Maybe with this PR that is possible, I'll have to explore a bit once this is merged. The concern is more around getting the order of the dictionary, etc right in the message

[GitHub] [arrow] BryanCutler commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-06-24 Thread GitBox
BryanCutler commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-648982266 @kszucs I submitted a patch to fix Java compilation with Spark master and branch-3.0, and tested locally with the latest pyarrow so Spark integration tests should pass for

[GitHub] [arrow] wesm commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r445042115 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator { ArrayType

[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r445007543 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -397,24 +452,26 @@ struct MinMaxImpl : public ScalarAggregator { ArrayType

[GitHub] [arrow] kszucs commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox
kszucs commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r445006111 ## File path: cpp/src/arrow/testing/gtest_util.h ## @@ -137,6 +137,8 @@ namespace arrow { //

[GitHub] [arrow] nealrichardson closed pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-24 Thread GitBox
nealrichardson closed pull request #7297: URL: https://github.com/apache/arrow/pull/7297 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] wesm commented on a change in pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7478: URL: https://github.com/apache/arrow/pull/7478#discussion_r444996273 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -399,15 +434,59 @@ class TestNumericMinMaxKernel : public ::testing::Test { };

[GitHub] [arrow] nevi-me commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-24 Thread GitBox
nevi-me commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-648901870 > Ok there was one other test needing to be skipped, which I've done, and now the tests "pass". Should we merge this and progressively unskip tests as you can? Yes please

[GitHub] [arrow] andygrove commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-24 Thread GitBox
andygrove commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-648901377 Yes, that would be great. Thanks! On Wed, Jun 24, 2020 at 9:43 AM Neal Richardson wrote: > Ok there was one other test needing to be skipped, which I've done,

[GitHub] [arrow] nealrichardson commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-24 Thread GitBox
nealrichardson commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-648900443 Ok there was one other test needing to be skipped, which I've done, and now the tests "pass". Should we merge this and progressively unskip tests as you can?

[GitHub] [arrow] nealrichardson commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox
nealrichardson commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-648889866 @xhochy if you're trying to add R Windows dependencies, see the discussion on https://issues.apache.org/jira/browse/ARROW-6960 for pointers

[GitHub] [arrow] nealrichardson commented on pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-06-24 Thread GitBox
nealrichardson commented on pull request #7275: URL: https://github.com/apache/arrow/pull/7275#issuecomment-648886550 Is this good to merge now? @BryanCutler are you still planning to review this? Would like to get this in 1.0.

[GitHub] [arrow] wesm commented on pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox
wesm commented on pull request #7478: URL: https://github.com/apache/arrow/pull/7478#issuecomment-648879878 Looking This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] wesm commented on pull request #7532: ARROW-9217: [C++] Cover 0.01% null for the plain spaced benchmark

2020-06-24 Thread GitBox
wesm commented on pull request #7532: URL: https://github.com/apache/arrow/pull/7532#issuecomment-648877226 Here's 0.17.1 with the benchmark changes backported https://github.com/wesm/arrow/tree/BitBlockSpacedBM-0.17.1 This

[GitHub] [arrow] wesm closed pull request #7532: ARROW-9217: [C++] Cover 0.01% null for the plain spaced benchmark

2020-06-24 Thread GitBox
wesm closed pull request #7532: URL: https://github.com/apache/arrow/pull/7532 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7532: ARROW-9217: [C++] Cover 0.01% null for the plain spaced benchmark

2020-06-24 Thread GitBox
wesm commented on pull request #7532: URL: https://github.com/apache/arrow/pull/7532#issuecomment-648875010 For interest, I benchmarked 0.17.1 versus master with these new benchmarks (gcc-8 on i9-9960X, with SSE4.2): ``` benchmark

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-24 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648873558 I sent an e-mail to dev@ -- let's discuss there This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] alippai edited a comment on pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
alippai edited a comment on pull request #7533: URL: https://github.com/apache/arrow/pull/7533#issuecomment-648752119 Can this be extended to support any scalar value? Creating a column with single value is a common step for me (before concatenating tables, so the fragments a are

[GitHub] [arrow] kszucs commented on pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-06-24 Thread GitBox
kszucs commented on pull request #7478: URL: https://github.com/apache/arrow/pull/7478#issuecomment-648856803 ping @wesm This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] kszucs commented on pull request #6592: ARROW-8089: [C++] Port the toolchain build from Appveyor to Github Actions

2020-06-24 Thread GitBox
kszucs commented on pull request #6592: URL: https://github.com/apache/arrow/pull/6592#issuecomment-648856487 This PR was outdated, I will keep working on the windows docker containers instead. This is an automated message

[GitHub] [arrow] romainfrancois commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-24 Thread GitBox
romainfrancois commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-648848097 Added support for record batches. Toying with the idea of a print method for the metadata, to make it less opaque: ``` r library(arrow) #> #>

[GitHub] [arrow] kszucs commented on a change in pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
kszucs commented on a change in pull request #7533: URL: https://github.com/apache/arrow/pull/7533#discussion_r444924523 ## File path: python/pyarrow/__init__.py ## @@ -90,7 +90,7 @@ def parse_git(root, **kwargs): schema,

[GitHub] [arrow] bkietz commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-24 Thread GitBox
bkietz commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r444914459 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; } -Result

[GitHub] [arrow] bkietz commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-24 Thread GitBox
bkietz commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r444914459 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; } -Result

[GitHub] [arrow] wesm commented on a change in pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
wesm commented on a change in pull request #7533: URL: https://github.com/apache/arrow/pull/7533#discussion_r444890495 ## File path: python/pyarrow/__init__.py ## @@ -90,7 +90,7 @@ def parse_git(root, **kwargs): schema,

[GitHub] [arrow] wesm commented on pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
wesm commented on pull request #7533: URL: https://github.com/apache/arrow/pull/7533#issuecomment-648817609 @alippai that is doable but would need to get done in a separate PR This is an automated message from the Apache Git

[GitHub] [arrow] rymurr commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-24 Thread GitBox
rymurr commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648810719 Thanks @wesm and @jacques-n for the review. I will leave this up until consensus is reached on the format change. Please let me know if I can help w/ the c++ patch, would be happy

[GitHub] [arrow] alippai commented on pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
alippai commented on pull request #7533: URL: https://github.com/apache/arrow/pull/7533#issuecomment-648752119 Can this be extended to support any scalar value? Creating a column with single value is a common step for us (before concatenating tables, so the fragments a are differentiated

[GitHub] [arrow] github-actions[bot] commented on pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7533: URL: https://github.com/apache/arrow/pull/7533#issuecomment-648694810 https://issues.apache.org/jira/browse/ARROW-7375 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs opened a new pull request #7533: ARROW-7375: [Python] Expose C++ MakeArrayOfNull

2020-06-24 Thread GitBox
kszucs opened a new pull request #7533: URL: https://github.com/apache/arrow/pull/7533 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] jianxind commented on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind commented on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-648656865 This PR https://github.com/apache/arrow/pull/7532 add the 0.01% benchmark case, I can trigger a benchmark action if 7532 get merged. Below is the results for 0.01% on my

[GitHub] [arrow] github-actions[bot] commented on pull request #7532: ARROW-9217: [C++] Cover 0.01% null for the plain spaced benchmark

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7532: URL: https://github.com/apache/arrow/pull/7532#issuecomment-648655465 https://issues.apache.org/jira/browse/ARROW-9217 This is an automated message from the Apache Git

[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox
xhochy commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-648653322 > The R ones probably? For these, we need to add `utf8proc` to rtools40 and rtools35 and add them to the linker line of the R build.

[GitHub] [arrow] jianxind opened a new pull request #7532: ARROW-9217: [C++] Cover 0.01% null for the plain spaced benchmark

2020-06-24 Thread GitBox
jianxind opened a new pull request #7532: URL: https://github.com/apache/arrow/pull/7532 Add 0.01% null probability which represent most data are true values. Signed-off-by: Frank Du This is an automated message from

[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox
xhochy commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-648649038 The R ones probably? This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-24 Thread GitBox
xhochy commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-648648745 @kou What is the problematic CI job that shows your problem? The MinGW ones seem fine. This is an automated

[GitHub] [arrow] github-actions[bot] commented on pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
github-actions[bot] commented on pull request #7531: URL: https://github.com/apache/arrow/pull/7531#issuecomment-648648342 https://issues.apache.org/jira/browse/ARROW-9216 This is an automated message from the Apache Git

[GitHub] [arrow] jianxind opened a new pull request #7531: ARROW-9216: [C++] Use BitBlockCounter for plain spaced encoding/decoding

2020-06-24 Thread GitBox
jianxind opened a new pull request #7531: URL: https://github.com/apache/arrow/pull/7531 Speedup the typical use case which most data are true values, also add null probability test case. Signed-off-by: Frank Du

[GitHub] [arrow] romainfrancois commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-24 Thread GitBox
romainfrancois commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444682932 ## File path: r/R/table.R ## @@ -202,7 +210,27 @@ Table$create <- function(..., schema = NULL) { #' @export as.data.frame.Table <- function(x,

[GitHub] [arrow] romainfrancois commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-24 Thread GitBox
romainfrancois commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444675449 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type

[GitHub] [arrow] praveenbingo closed pull request #7402: ARROW-9099: [C++][Gandiva] Implement trim function for string

2020-06-24 Thread GitBox
praveenbingo closed pull request #7402: URL: https://github.com/apache/arrow/pull/7402 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go