[GitHub] [arrow] winningsix commented on pull request #8229: ARROW-9579: [C++] Provide the plugin API to support customized compression codec for parquet

2020-09-22 Thread GitBox
winningsix commented on pull request #8229: URL: https://github.com/apache/arrow/pull/8229#issuecomment-697141258 @xieqi How about the on-disk path? How does user determine whether to use a customized codec for a given compression codec?

[GitHub] [arrow] github-actions[bot] commented on pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8244: URL: https://github.com/apache/arrow/pull/8244#issuecomment-697136388 https://issues.apache.org/jira/browse/ARROW-8355 This is an automated message from the Apache Git

[GitHub] [arrow] arw2019 opened a new pull request #8244: ARROW-8355: [Python] Reduce the number of pandas dependent test cases in test_feather

2020-09-22 Thread GitBox
arw2019 opened a new pull request #8244: URL: https://github.com/apache/arrow/pull/8244 xref https://github.com/apache/arrow/pull/6849#discussion_r404160096 This is a minor refactor. The changes are to replace uses of `pandas` dataframes with `pa.Table(...)` wherever possible.

[GitHub] [arrow] lwxown commented on issue #8137: Bind failed for pathname

2020-09-22 Thread GitBox
lwxown commented on issue #8137: URL: https://github.com/apache/arrow/issues/8137#issuecomment-697134880 I also met the same problem,who can solve? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] xieqi commented on pull request #8229: ARROW-9579: [C++] Provide the plugin API to support customized compression codec for parquet

2020-09-22 Thread GitBox
xieqi commented on pull request #8229: URL: https://github.com/apache/arrow/pull/8229#issuecomment-697120706 @pitrou For Parquet write, the end-user still use the standard GZip as the compression codec, we add a compression_plugin API in parquet WriterProperties Builder, the end-user

[GitHub] [arrow] emkornfield commented on a change in pull request #7214: ARROW-8842: [Java] fix ListVector's setValueCount to set inner vector's value count correctly

2020-09-22 Thread GitBox
emkornfield commented on a change in pull request #7214: URL: https://github.com/apache/arrow/pull/7214#discussion_r493188671 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java ## @@ -844,7 +844,7 @@ public void setValueCount(int

[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-697114813 Thank you all for the fruitful discussion. One small reminder for @josiahyan : to cache the buffer capacity, it is sufficient to use an `int` instead of a `long`

[GitHub] [arrow] liyafan82 commented on pull request #8210: ARROW-10031: [CI][Java] Support Java benchmark in Ursabot

2020-09-22 Thread GitBox
liyafan82 commented on pull request #8210: URL: https://github.com/apache/arrow/pull/8210#issuecomment-697118857 @kiszk Thank you for doing this. Please note that when running the benchmarks, some flags should be configured properly. They can be set through environmental variables:

[GitHub] [arrow] kiszk edited a comment on pull request #8210: ARROW-10031: [CI][Java] Support Java benchmark in Ursabot

2020-09-22 Thread GitBox
kiszk edited a comment on pull request #8210: URL: https://github.com/apache/arrow/pull/8210#issuecomment-697119568 @liyafan82 Thank you for your comment. I will set these two properties as default for Java benchmarking .

[GitHub] [arrow] emkornfield closed pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-09-22 Thread GitBox
emkornfield closed pull request #7326: URL: https://github.com/apache/arrow/pull/7326 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] andygrove commented on pull request #8242: ARROW-10065: [Rust] Simplify code (+500, -1k)

2020-09-22 Thread GitBox
andygrove commented on pull request #8242: URL: https://github.com/apache/arrow/pull/8242#issuecomment-697101767 This looks good to me conceptually at least. I wasn't too involved in this part of the codebase so I think it would be best to see if @nevi-me can review.

[GitHub] [arrow] kiszk commented on pull request #8210: ARROW-10031: [CI][Java] Support Java benchmark in Ursabot

2020-09-22 Thread GitBox
kiszk commented on pull request #8210: URL: https://github.com/apache/arrow/pull/8210#issuecomment-697119568 @liyafan82 Thank you for your comment. I will set these two properties as default. This is an automated message

[GitHub] [arrow] emkornfield commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-09-22 Thread GitBox
emkornfield commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r493187630 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-09-22 Thread GitBox
liyafan82 commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r492506544 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] kszucs commented on pull request #8238: ARROW-10063: [Archery][CI] Fetch main branch in archery build only when it is a pull request

2020-09-22 Thread GitBox
kszucs commented on pull request #8238: URL: https://github.com/apache/arrow/pull/8238#issuecomment-696697010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] sbinet commented on a change in pull request #8175: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client

2020-09-22 Thread GitBox
sbinet commented on a change in pull request #8175: URL: https://github.com/apache/arrow/pull/8175#discussion_r491881927 ## File path: go/arrow/flight/client.go ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] github-actions[bot] commented on pull request #8242: ARROW-10065: [Rust] Simplify code (+500, -1k)

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8242: URL: https://github.com/apache/arrow/pull/8242#issuecomment-696821165 https://issues.apache.org/jira/browse/ARROW-10065 This is an automated message from the Apache Git

[GitHub] [arrow] wesm closed pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
wesm closed pull request #7789: URL: https://github.com/apache/arrow/pull/7789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] t829702 edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 edited a comment on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] jacques-n commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox
jacques-n commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696453930 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8236: ARROW-10060: [Rust] [DataFusion] Fixed error on which Err were discarded in MergeExec.

2020-09-22 Thread GitBox
jorgecarleitao commented on a change in pull request #8236: URL: https://github.com/apache/arrow/pull/8236#discussion_r492447876 ## File path: rust/datafusion/src/physical_plan/merge.rs ## @@ -111,9 +111,9 @@ impl ExecutionPlan for MergeExec { let

[GitHub] [arrow] ggershinsky commented on a change in pull request #8023: ARROW-9318: [C++] Parquet encryption key management

2020-09-22 Thread GitBox
ggershinsky commented on a change in pull request #8023: URL: https://github.com/apache/arrow/pull/8023#discussion_r492540735 ## File path: cpp/src/parquet/encryption/remote_kms_client.h ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] vertexclique commented on pull request #8237: ARROW-10062: [Rust] Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
vertexclique commented on pull request #8237: URL: https://github.com/apache/arrow/pull/8237#issuecomment-696687042 @andygrove Can I get a review for this one too? Thanks. This is an automated message from the Apache Git

[GitHub] [arrow] kszucs closed pull request #8228: ARROW-10049: [C++/Python] Sync conda recipe with conda-forge

2020-09-22 Thread GitBox
kszucs closed pull request #8228: URL: https://github.com/apache/arrow/pull/8228 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] emkornfield commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
emkornfield commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492461706 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2360,6 +2361,49 @@ TEST(ArrowReadWrite, SingleColumnNullableStruct) { 3);

[GitHub] [arrow] xhochy closed pull request #8239: ARROW-10064: [C++] Resolve compile warnings on Apple Clang 12

2020-09-22 Thread GitBox
xhochy closed pull request #8239: URL: https://github.com/apache/arrow/pull/8239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jhorstmann commented on pull request #8223: ARROW-10040: [Rust] Add slice that realigns Buffer

2020-09-22 Thread GitBox
jhorstmann commented on pull request #8223: URL: https://github.com/apache/arrow/pull/8223#issuecomment-696754781 @nevi-me can you point me to the part of the parquet code that you have in mind? I found the `BitReader` used by bit packed encoding but that seems to solve a more general

[GitHub] [arrow] wesm commented on pull request #8219: ARROW-9603: [C++] Fix parquet write

2020-09-22 Thread GitBox
wesm commented on pull request #8219: URL: https://github.com/apache/arrow/pull/8219#issuecomment-696368598 @xhochy might be the only one. I can do my best to provide some comments This is an automated message from the

[GitHub] [arrow] kszucs commented on pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on pull request #8088: URL: https://github.com/apache/arrow/pull/8088#issuecomment-696797135 @github-actions crossbow submit test-spark This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] emkornfield commented on pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
emkornfield commented on pull request #8219: URL: https://github.com/apache/arrow/pull/8219#issuecomment-696503073 @xhochy did you want to review? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] xhochy commented on pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
xhochy commented on pull request #8219: URL: https://github.com/apache/arrow/pull/8219#issuecomment-696610675 I reserved my self an hour tomorrow to review this. I haven't touched this code for over a year but this is the code path that actually got me into Arrow/Parquet project, so I'm

[GitHub] [arrow] xhochy commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
xhochy commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492772505 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] pitrou closed pull request #8196: ARROW-10013: [FlightRPC][C++] fix setting generic client options

2020-09-22 Thread GitBox
pitrou closed pull request #8196: URL: https://github.com/apache/arrow/pull/8196 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] arw2019 commented on pull request #8145: ARROW-9967: [Python] Add compute module docs + expose more option classes

2020-09-22 Thread GitBox
arw2019 commented on pull request #8145: URL: https://github.com/apache/arrow/pull/8145#issuecomment-696882474 This is ready for re-review. I believe that I've addressed the feedback from previous reviews. I've also now exposed all the option classes so that all the kernels listed

[GitHub] [arrow] bkietz commented on a change in pull request #8240: ARROW-10038: [C++] Spawn thread pool threads lazily

2020-09-22 Thread GitBox
bkietz commented on a change in pull request #8240: URL: https://github.com/apache/arrow/pull/8240#discussion_r492829401 ## File path: cpp/src/arrow/util/thread_pool.cc ## @@ -168,9 +174,11 @@ Status ThreadPool::SetCapacity(int threads) { CollectFinishedWorkersUnlocked();

[GitHub] [arrow] github-actions[bot] commented on pull request #8241: ARROW-10054: [C++][Python] don't crash when slice offset > length

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8241: URL: https://github.com/apache/arrow/pull/8241#issuecomment-696821164 https://issues.apache.org/jira/browse/ARROW-10054 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8235: URL: https://github.com/apache/arrow/pull/8235#issuecomment-696431010 https://issues.apache.org/jira/browse/ARROW-10059 This is an automated message from the Apache Git

[GitHub] [arrow] TheNeuralBit commented on a change in pull request #8216: ARROW-8394: [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+

2020-09-22 Thread GitBox
TheNeuralBit commented on a change in pull request #8216: URL: https://github.com/apache/arrow/pull/8216#discussion_r492822710 ## File path: js/test/unit/ipc/helpers.ts ## @@ -54,13 +54,13 @@ export abstract class ArrowIOTestHelper { await testFn(await

[GitHub] [arrow] BatmanAoD commented on a change in pull request #3031: ARROW-3878: [Rust] Improve primitive types

2020-09-22 Thread GitBox
BatmanAoD commented on a change in pull request #3031: URL: https://github.com/apache/arrow/pull/3031#discussion_r492955163 ## File path: rust/src/lib.rs ## @@ -15,6 +15,8 @@ // specific language governing permissions and limitations // under the License.

[GitHub] [arrow] andygrove closed pull request #8237: ARROW-10062: [Rust] Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
andygrove closed pull request #8237: URL: https://github.com/apache/arrow/pull/8237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] lidavidm commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox
lidavidm commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696458726 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] pitrou edited a comment on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou edited a comment on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-696630511 Need to add a test with the legacy file in https://github.com/apache/arrow-testing/pull/47 . Edit: done (the file was instead moved to parquet-testing).

[GitHub] [arrow] t829702 commented on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 commented on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 > Providing a separate utility in Arrow to parse dates I didn't mean to duplicate JS parsing code, but a way to provide a special parser function to the constructor,

[GitHub] [arrow] cyb70289 commented on a change in pull request #8232: ARROW-10051: [C++][Compute] Make aggregate kernel state mutable

2020-09-22 Thread GitBox
cyb70289 commented on a change in pull request #8232: URL: https://github.com/apache/arrow/pull/8232#discussion_r492543154 ## File path: cpp/src/arrow/compute/kernel.h ## @@ -664,7 +664,7 @@ struct VectorKernel : public ArrayKernel { using ScalarAggregateConsume =

[GitHub] [arrow] cyb70289 commented on pull request #8232: ARROW-10051: [C++][Compute] Make aggregate kernel state mutable

2020-09-22 Thread GitBox
cyb70289 commented on pull request #8232: URL: https://github.com/apache/arrow/pull/8232#issuecomment-696485953 CI failure is about flight test. Looks not related. https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/35331179/job/i7l5oi2mxnwytd4q#L1788

[GitHub] [arrow] alamb commented on pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
alamb commented on pull request #8222: URL: https://github.com/apache/arrow/pull/8222#issuecomment-696842287 @drusso I think you are correct that we would need a separate group by operator for each count distinct and then combine them together: so `SELECT c1, COUNT(DISTINCT c2),

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492648007 ## File path: cpp/src/arrow/util/converter.h ## @@ -0,0 +1,353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] kszucs commented on pull request #7797: ARROW-4189: [Rust] Added coverage report.

2020-09-22 Thread GitBox
kszucs commented on pull request #7797: URL: https://github.com/apache/arrow/pull/7797#issuecomment-696598247 @jorgecarleitao could you rebase on top of the master? I'm unable to push to your fork, but the build failures should be resolved after a rebase.

[GitHub] [arrow] pitrou closed pull request #8136: ARROW-9078: [C++] Parquet read / write extension type with nested storage type

2020-09-22 Thread GitBox
pitrou closed pull request #8136: URL: https://github.com/apache/arrow/pull/8136 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] trxcllnt commented on a change in pull request #8216: ARROW-8394: [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+

2020-09-22 Thread GitBox
trxcllnt commented on a change in pull request #8216: URL: https://github.com/apache/arrow/pull/8216#discussion_r493036448 ## File path: js/test/inference/column.ts ## @@ -33,33 +33,6 @@ const boolColumn = new Column(new Field('bool', boolType), [ expect(typeof

[GitHub] [arrow] andygrove closed pull request #8233: ARROW-10055: [Rust] DoubleEndedIterator implementation for NullableIter

2020-09-22 Thread GitBox
andygrove closed pull request #8233: URL: https://github.com/apache/arrow/pull/8233 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] zeroshade commented on pull request #8175: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client

2020-09-22 Thread GitBox
zeroshade commented on pull request #8175: URL: https://github.com/apache/arrow/pull/8175#issuecomment-696370510 @wesm as far as i can tell the two checks that are failing are unrelated to this PR. I have ideas for further exploring / adding more functionality for the FlightRPC

[GitHub] [arrow] drusso commented on a change in pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on a change in pull request #8222: URL: https://github.com/apache/arrow/pull/8222#discussion_r492692158 ## File path: rust/datafusion/src/physical_plan/distinct_expressions.rs ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jorisvandenbossche commented on pull request #8188: ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans

2020-09-22 Thread GitBox
jorisvandenbossche commented on pull request #8188: URL: https://github.com/apache/arrow/pull/8188#issuecomment-696688370 It seems the crashing test is: https://github.com/apache/arrow/blob/40d64756dc3b2c51489b48362d0f04ee3e2a7388/python/pyarrow/tests/test_parquet.py#L3389-L3414

[GitHub] [arrow] pitrou commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492627002 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] kszucs closed pull request #7797: ARROW-4189: [Rust] Added coverage report.

2020-09-22 Thread GitBox
kszucs closed pull request #7797: URL: https://github.com/apache/arrow/pull/7797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] lidavidm commented on pull request #8196: ARROW-10013: [FlightRPC][C++] fix setting generic client options

2020-09-22 Thread GitBox
lidavidm commented on pull request #8196: URL: https://github.com/apache/arrow/pull/8196#issuecomment-696675621 > Wow, did you report the `peer()` issue to gRPC? Not yet - I need to reproduce it in a VM first (probably with just base gRPC instead of trying to set up Arrow).

[GitHub] [arrow] github-actions[bot] commented on pull request #8236: ARROW-10060: [Rust] [DataFusion] Fixed error on which Err were discarded in MergeExec.

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8236: URL: https://github.com/apache/arrow/pull/8236#issuecomment-696484939 https://issues.apache.org/jira/browse/ARROW-10060 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
nealrichardson commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697054182 Ok, with that SEMICOLON change, the Ubuntu R job successfully compiles the C++ library, but the R package can't use it because it created a shared library for aws-sdk-cpp

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #8188: ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans

2020-09-22 Thread GitBox
jorisvandenbossche edited a comment on pull request #8188: URL: https://github.com/apache/arrow/pull/8188#issuecomment-696688370 It seems the crashing test is:

[GitHub] [arrow] wesm commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write

2020-09-22 Thread GitBox
wesm commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492407465 ## File path: cpp/src/parquet/arrow/arrow_reader_writer_test.cc ## @@ -2360,6 +2361,49 @@ TEST(ArrowReadWrite, SingleColumnNullableStruct) { 3); }

[GitHub] [arrow] pitrou commented on pull request #8196: ARROW-10013: [FlightRPC][C++] fix setting generic client options

2020-09-22 Thread GitBox
pitrou commented on pull request #8196: URL: https://github.com/apache/arrow/pull/8196#issuecomment-696561450 Wow, did you report the `peer()` issue to gRPC? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] praveenbingo commented on a change in pull request #8095: ARROW-9897: [C++][Gandiva] Added to_date function

2020-09-22 Thread GitBox
praveenbingo commented on a change in pull request #8095: URL: https://github.com/apache/arrow/pull/8095#discussion_r492681566 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -47,18 +47,23 @@ Status ToDateHolder::Make(const FunctionNode& node, } auto pattern =

[GitHub] [arrow] pitrou closed pull request #8234: ARROW-10035: [C++] Update vendored libraries

2020-09-22 Thread GitBox
pitrou closed pull request #8234: URL: https://github.com/apache/arrow/pull/8234 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8237: ARROW-10062: [Rust] Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8237: URL: https://github.com/apache/arrow/pull/8237#issuecomment-696681860 https://issues.apache.org/jira/browse/ARROW-10062 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492558115 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] GPSnoopy commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
GPSnoopy commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492615499 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] nealrichardson commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
nealrichardson commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492820728 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the

[GitHub] [arrow] jacques-n edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox
jacques-n edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696459730 > I think there are two opportunities here - simply optimizing setSafe, which can be done by either specializing for the power-of-two size where possible, or by caching

[GitHub] [arrow] andygrove commented on pull request #8204: ARROW-10016: [Rust] Implement is null / is not null kernels

2020-09-22 Thread GitBox
andygrove commented on pull request #8204: URL: https://github.com/apache/arrow/pull/8204#issuecomment-696744882 @jhorstmann Looks like there is cargo fmt issue This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492536697 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] bkietz commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
bkietz commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492838013 ## File path: cpp/src/arrow/array/array_list_test.cc ## @@ -508,6 +534,8 @@ TYPED_TEST(TestListArray, ValidateOffsets) { this->TestValidateOffsets(); }

[GitHub] [arrow] alamb edited a comment on pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
alamb edited a comment on pull request #8222: URL: https://github.com/apache/arrow/pull/8222#issuecomment-696842287 @drusso I think you are correct that we would need a separate group by operator for each count distinct and then combine them together: so `SELECT c1, COUNT(DISTINCT

[GitHub] [arrow] nevi-me commented on pull request #8223: ARROW-10040: [Rust] Add slice that realigns Buffer

2020-09-22 Thread GitBox
nevi-me commented on pull request #8223: URL: https://github.com/apache/arrow/pull/8223#issuecomment-696669012 @jhorstmann can I close this PR, and rely on your implementation when ready? Also, do you think we'd be able to use your implementation in `parquet`, as we might need that for

[GitHub] [arrow] github-actions[bot] commented on pull request #8240: ARROW-10038: [C++] Spawn thread pool threads lazily

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8240: URL: https://github.com/apache/arrow/pull/8240#issuecomment-696730535 https://issues.apache.org/jira/browse/ARROW-10038 This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
kou commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697040532 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] github-actions[bot] commented on pull request #8239: ARROW-10064: [C++] Resolve compile warnings on Apple Clang 12

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8239: URL: https://github.com/apache/arrow/pull/8239#issuecomment-696702368 https://issues.apache.org/jira/browse/ARROW-10064 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697027877 https://issues.apache.org/jira/browse/ARROW-10068 This is an automated message from the Apache Git

[GitHub] [arrow] kou edited a comment on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
kou edited a comment on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697040532 For the aws-sdk headers, the following patch will fix them: ```diff diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake

[GitHub] [arrow] romainfrancois commented on pull request #8122: ARROW-9557: [R] Iterating over parquet columns is slow in R

2020-09-22 Thread GitBox
romainfrancois commented on pull request #8122: URL: https://github.com/apache/arrow/pull/8122#issuecomment-696593899 The methods of `ParquetFileReader` no longer use tidyselect, i.e. you can use `$ReadTable()` or `$ReadTable(column_indices)` with an 0-based integer vector so this does

[GitHub] [arrow] pitrou commented on pull request #8136: ARROW-9078: [C++] Parquet read / write extension type with nested storage type

2020-09-22 Thread GitBox
pitrou commented on pull request #8136: URL: https://github.com/apache/arrow/pull/8136#issuecomment-696686570 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] pitrou commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-696630511 Need to add a test with the legacy file in https://github.com/apache/arrow-testing/pull/47 This is an automated

[GitHub] [arrow] andygrove closed pull request #8236: ARROW-10060: [Rust] [DataFusion] Fixed error on which Err were discarded in MergeExec.

2020-09-22 Thread GitBox
andygrove closed pull request #8236: URL: https://github.com/apache/arrow/pull/8236 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] zeroshade commented on a change in pull request #8175: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client

2020-09-22 Thread GitBox
zeroshade commented on a change in pull request #8175: URL: https://github.com/apache/arrow/pull/8175#discussion_r492737156 ## File path: go/arrow/flight/client.go ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] wesm commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
wesm commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492965387 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] drusso commented on pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on pull request #8222: URL: https://github.com/apache/arrow/pull/8222#issuecomment-696689994 Thanks for the review/feedback all! @jorgecarleitao: > it may be worth take a look at #8172 , where we are trying to improve how to declare and run aggregate

[GitHub] [arrow] github-actions[bot] commented on pull request #8238: ARROW-10063: [Archery][CI] Fetch main branch in archery build only when it is a pull request

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8238: URL: https://github.com/apache/arrow/pull/8238#issuecomment-696690267 https://issues.apache.org/jira/browse/ARROW-10063 This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on a change in pull request #8234: ARROW-10035: [C++] Update vendored libraries

2020-09-22 Thread GitBox
kou commented on a change in pull request #8234: URL: https://github.com/apache/arrow/pull/8234#discussion_r492378032 ## File path: LICENSE.txt ## @@ -849,9 +849,9 @@ THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

[GitHub] [arrow] nealrichardson commented on a change in pull request #8122: ARROW-9557: [R] Iterating over parquet columns is slow in R

2020-09-22 Thread GitBox
nealrichardson commented on a change in pull request #8122: URL: https://github.com/apache/arrow/pull/8122#discussion_r492850251 ## File path: r/R/parquet.R ## @@ -409,10 +420,20 @@ ParquetFileWriter$create <- function(schema, #' #' @section Methods: #' -#' -

[GitHub] [arrow] josiahyan commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox
josiahyan commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] pitrou commented on a change in pull request #8232: ARROW-10051: [C++][Compute] Make aggregate kernel state mutable

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8232: URL: https://github.com/apache/arrow/pull/8232#discussion_r492540465 ## File path: cpp/src/arrow/compute/kernel.h ## @@ -664,7 +664,7 @@ struct VectorKernel : public ArrayKernel { using ScalarAggregateConsume =

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492666980 ## File path: python/pyarrow/array.pxi ## @@ -21,28 +21,28 @@ import warnings cdef _sequence_to_array(object sequence, object mask, object

[GitHub] [arrow] josiahyan edited a comment on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox
josiahyan edited a comment on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-696381588 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] xieqi commented on pull request #8229: ARROW-9579: [C++] Provide the plugin API to support customized compression codec for parquet

2020-09-22 Thread GitBox
xieqi commented on pull request #8229: URL: https://github.com/apache/arrow/pull/8229#issuecomment-697120706 @pitrou For Parquet write, the end-user still use the standard GZip as the compression codec, we add a compression_plugin API in parquet WriterProperties Builder, the end-user

[GitHub] [arrow] kiszk commented on pull request #8210: ARROW-10031: [CI][Java] Support Java benchmark in Ursabot

2020-09-22 Thread GitBox
kiszk commented on pull request #8210: URL: https://github.com/apache/arrow/pull/8210#issuecomment-697119568 @liyafan82 Thank you for your comment. I will set these two properties as default. This is an automated message

[GitHub] [arrow] liyafan82 commented on pull request #8210: ARROW-10031: [CI][Java] Support Java benchmark in Ursabot

2020-09-22 Thread GitBox
liyafan82 commented on pull request #8210: URL: https://github.com/apache/arrow/pull/8210#issuecomment-697118857 @kiszk Thank you for doing this. Please note that when running the benchmarks, some flags should be configured properly. They can be set through environmental variables:

[GitHub] [arrow] cyb70289 commented on a change in pull request #8232: ARROW-10051: [C++][Compute] Make aggregate kernel state mutable

2020-09-22 Thread GitBox
cyb70289 commented on a change in pull request #8232: URL: https://github.com/apache/arrow/pull/8232#discussion_r493150346 ## File path: cpp/src/arrow/compute/kernel.h ## @@ -664,7 +664,7 @@ struct VectorKernel : public ArrayKernel { using ScalarAggregateConsume =

[GitHub] [arrow] nealrichardson commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
nealrichardson commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697054182 Ok, with that SEMICOLON change, the Ubuntu R job successfully compiles the C++ library, but the R package can't use it because it created a shared library for aws-sdk-cpp

[GitHub] [arrow] kou commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
kou commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697049024 Ah, we can use `$` https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#escaped-characters instead of `LIST_SEPARATOR` like TileDB does:

[GitHub] [arrow] kou edited a comment on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
kou edited a comment on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697040532 For the aws-sdk headers, the following patch will fix them: ```diff diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake

  1   2   3   >