[GitHub] [arrow] arw2019 commented on pull request #8145: ARROW-9967: [Python] Add compute module docs + expose more option classes

2020-09-22 Thread GitBox
arw2019 commented on pull request #8145: URL: https://github.com/apache/arrow/pull/8145#issuecomment-696882474 This is ready for re-review. I believe that I've addressed the feedback from previous reviews. I've also now exposed all the option classes so that all the kernels listed

[GitHub] [arrow] wesm closed pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
wesm closed pull request #7789: URL: https://github.com/apache/arrow/pull/7789 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492661346 ## File path: python/pyarrow/scalar.pxi ## @@ -610,12 +609,10 @@ cdef class StructScalar(Scalar, collections.abc.Mapping): def __getitem__(self, key):

[GitHub] [arrow] nevi-me commented on pull request #8223: ARROW-10040: [Rust] Add slice that realigns Buffer

2020-09-22 Thread GitBox
nevi-me commented on pull request #8223: URL: https://github.com/apache/arrow/pull/8223#issuecomment-696669012 @jhorstmann can I close this PR, and rely on your implementation when ready? Also, do you think we'd be able to use your implementation in `parquet`, as we might need that for

[GitHub] [arrow] GPSnoopy commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
GPSnoopy commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492732061 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] BatmanAoD commented on a change in pull request #3031: ARROW-3878: [Rust] Improve primitive types

2020-09-22 Thread GitBox
BatmanAoD commented on a change in pull request #3031: URL: https://github.com/apache/arrow/pull/3031#discussion_r492955163 ## File path: rust/src/lib.rs ## @@ -15,6 +15,8 @@ // specific language governing permissions and limitations // under the License.

[GitHub] [arrow] pitrou closed pull request #8136: ARROW-9078: [C++] Parquet read / write extension type with nested storage type

2020-09-22 Thread GitBox
pitrou closed pull request #8136: URL: https://github.com/apache/arrow/pull/8136 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou commented on pull request #8136: ARROW-9078: [C++] Parquet read / write extension type with nested storage type

2020-09-22 Thread GitBox
pitrou commented on pull request #8136: URL: https://github.com/apache/arrow/pull/8136#issuecomment-696867693 Will merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] wesm commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
wesm commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492965387 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] jorgecarleitao commented on pull request #8215: ARROW-9977: [Rust] Added min/max of [Large]StringArray

2020-09-22 Thread GitBox
jorgecarleitao commented on pull request #8215: URL: https://github.com/apache/arrow/pull/8215#issuecomment-695812820 This was merged as part of #8172 and will thus be closed. I also marked the respective Jira issue as done.

[GitHub] [arrow] liyafan82 commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-09-22 Thread GitBox
liyafan82 commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r492506544 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] pitrou commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492536697 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] cyb70289 commented on a change in pull request #8232: ARROW-10051: [C++][Compute] Make aggregate kernel state mutable

2020-09-22 Thread GitBox
cyb70289 commented on a change in pull request #8232: URL: https://github.com/apache/arrow/pull/8232#discussion_r492543154 ## File path: cpp/src/arrow/compute/kernel.h ## @@ -664,7 +664,7 @@ struct VectorKernel : public ArrayKernel { using ScalarAggregateConsume =

[GitHub] [arrow] romainfrancois commented on pull request #8122: ARROW-9557: [R] Iterating over parquet columns is slow in R

2020-09-22 Thread GitBox
romainfrancois commented on pull request #8122: URL: https://github.com/apache/arrow/pull/8122#issuecomment-696593899 The methods of `ParquetFileReader` no longer use tidyselect, i.e. you can use `$ReadTable()` or `$ReadTable(column_indices)` with an 0-based integer vector so this does

[GitHub] [arrow] pitrou commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492536697 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] pitrou commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492537265 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] pitrou closed pull request #8234: ARROW-10035: [C++] Update vendored libraries

2020-09-22 Thread GitBox
pitrou closed pull request #8234: URL: https://github.com/apache/arrow/pull/8234 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] xhochy commented on pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
xhochy commented on pull request #8219: URL: https://github.com/apache/arrow/pull/8219#issuecomment-696610675 I reserved my self an hour tomorrow to review this. I haven't touched this code for over a year but this is the code path that actually got me into Arrow/Parquet project, so I'm

[GitHub] [arrow] pitrou commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-696630511 Need to add a test with the legacy file in https://github.com/apache/arrow-testing/pull/47 This is an automated

[GitHub] [arrow] pitrou commented on pull request #8196: ARROW-10013: [FlightRPC][C++] fix setting generic client options

2020-09-22 Thread GitBox
pitrou commented on pull request #8196: URL: https://github.com/apache/arrow/pull/8196#issuecomment-696561450 Wow, did you report the `peer()` issue to gRPC? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou closed pull request #8196: ARROW-10013: [FlightRPC][C++] fix setting generic client options

2020-09-22 Thread GitBox
pitrou closed pull request #8196: URL: https://github.com/apache/arrow/pull/8196 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] ggershinsky commented on a change in pull request #8023: ARROW-9318: [C++] Parquet encryption key management

2020-09-22 Thread GitBox
ggershinsky commented on a change in pull request #8023: URL: https://github.com/apache/arrow/pull/8023#discussion_r492540735 ## File path: cpp/src/parquet/encryption/remote_kms_client.h ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492558115 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492563521 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1130,37 +1188,61 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] sbinet commented on a change in pull request #8175: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client

2020-09-22 Thread GitBox
sbinet commented on a change in pull request #8175: URL: https://github.com/apache/arrow/pull/8175#discussion_r491881927 ## File path: go/arrow/flight/client.go ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492558836 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492559005 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] pitrou commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492627002 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] pitrou commented on a change in pull request #8232: ARROW-10051: [C++][Compute] Make aggregate kernel state mutable

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8232: URL: https://github.com/apache/arrow/pull/8232#discussion_r492540465 ## File path: cpp/src/arrow/compute/kernel.h ## @@ -664,7 +664,7 @@ struct VectorKernel : public ArrayKernel { using ScalarAggregateConsume =

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492560172 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492559517 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] kszucs closed pull request #8228: ARROW-10049: [C++/Python] Sync conda recipe with conda-forge

2020-09-22 Thread GitBox
kszucs closed pull request #8228: URL: https://github.com/apache/arrow/pull/8228 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] GPSnoopy commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
GPSnoopy commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492615499 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492560707 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] pitrou commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492560707 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public TypedColumnWriter<

[GitHub] [arrow] emkornfield edited a comment on pull request #8177: ARROW-8494: [C++][Parquet] Full support for reading mixed list and structs

2020-09-22 Thread GitBox
emkornfield edited a comment on pull request #8177: URL: https://github.com/apache/arrow/pull/8177#issuecomment-696205222 > Just for the record, apart from FixedSizeList, is there anything remaining for full nested Parquet -> Arrow reading? We need to support LargeList, and Map

[GitHub] [arrow] github-actions[bot] commented on pull request #8231: Arrow 10023: [C++][Gandiva] Implement split_part function in gandiva

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8231: URL: https://github.com/apache/arrow/pull/8231#issuecomment-695912228 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492662258 ## File path: python/pyarrow/array.pxi ## @@ -21,28 +21,28 @@ import warnings cdef _sequence_to_array(object sequence, object mask, object size,

[GitHub] [arrow] vertexclique opened a new pull request #8237: ARROW-10062 - Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
vertexclique opened a new pull request #8237: URL: https://github.com/apache/arrow/pull/8237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] kszucs commented on pull request #8238: ARROW-10063: [Archery][CI] Fetch main branch in archery build only when it is a pull request

2020-09-22 Thread GitBox
kszucs commented on pull request #8238: URL: https://github.com/apache/arrow/pull/8238#issuecomment-696697010 Merging this to my fork's main branch to test it works properly. This is an automated message from the Apache Git

[GitHub] [arrow] zeroshade commented on a change in pull request #8175: ARROW-8601: [Go][Flight] Implementations Flight RPC server and client

2020-09-22 Thread GitBox
zeroshade commented on a change in pull request #8175: URL: https://github.com/apache/arrow/pull/8175#discussion_r492737156 ## File path: go/arrow/flight/client.go ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] GPSnoopy commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
GPSnoopy commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492743655 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] andygrove closed pull request #8237: ARROW-10062: [Rust] Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
andygrove closed pull request #8237: URL: https://github.com/apache/arrow/pull/8237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492757133 ## File path: python/pyarrow/array.pxi ## @@ -21,28 +21,28 @@ import warnings cdef _sequence_to_array(object sequence, object mask, object size,

[GitHub] [arrow] andygrove commented on pull request #8204: ARROW-10016: [Rust] Implement is null / is not null kernels

2020-09-22 Thread GitBox
andygrove commented on pull request #8204: URL: https://github.com/apache/arrow/pull/8204#issuecomment-696744882 @jhorstmann Looks like there is cargo fmt issue This is an automated message from the Apache Git Service. To

[GitHub] [arrow] jhorstmann commented on pull request #8223: ARROW-10040: [Rust] Add slice that realigns Buffer

2020-09-22 Thread GitBox
jhorstmann commented on pull request #8223: URL: https://github.com/apache/arrow/pull/8223#issuecomment-696754781 @nevi-me can you point me to the part of the parquet code that you have in mind? I found the `BitReader` used by bit packed encoding but that seems to solve a more general

[GitHub] [arrow] vertexclique commented on pull request #8237: ARROW-10062: [Rust] Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
vertexclique commented on pull request #8237: URL: https://github.com/apache/arrow/pull/8237#issuecomment-696687042 @andygrove Can I get a review for this one too? Thanks. This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492693156 ## File path: python/pyarrow/array.pxi ## @@ -158,24 +158,44 @@ def array(object obj, type=None, mask=None, size=None, from_pandas=None, Notes

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #8188: ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans

2020-09-22 Thread GitBox
jorisvandenbossche edited a comment on pull request #8188: URL: https://github.com/apache/arrow/pull/8188#issuecomment-696688370 It seems the crashing test is:

[GitHub] [arrow] kszucs opened a new pull request #8239: ARROW-10064: [C++] Resolve compile warnings on Apple Clang 12

2020-09-22 Thread GitBox
kszucs opened a new pull request #8239: URL: https://github.com/apache/arrow/pull/8239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] drusso commented on a change in pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on a change in pull request #8222: URL: https://github.com/apache/arrow/pull/8222#discussion_r492692251 ## File path: rust/datafusion/src/physical_plan/distinct_expressions.rs ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] github-actions[bot] commented on pull request #8239: ARROW-10064: [C++] Resolve compile warnings on Apple Clang 12

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8239: URL: https://github.com/apache/arrow/pull/8239#issuecomment-696702368 https://issues.apache.org/jira/browse/ARROW-10064 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492756140 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -329,985 +302,602 @@ struct ValueConverter { default: return

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492667996 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -1513,6 +1519,108 @@ def test_struct_from_tuples():

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492666980 ## File path: python/pyarrow/array.pxi ## @@ -21,28 +21,28 @@ import warnings cdef _sequence_to_array(object sequence, object mask, object

[GitHub] [arrow] praveenbingo commented on a change in pull request #8095: ARROW-9897: [C++][Gandiva] Added to_date function

2020-09-22 Thread GitBox
praveenbingo commented on a change in pull request #8095: URL: https://github.com/apache/arrow/pull/8095#discussion_r492681566 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -47,18 +47,23 @@ Status ToDateHolder::Make(const FunctionNode& node, } auto pattern =

[GitHub] [arrow] pitrou commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492734835 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] pitrou opened a new pull request #8240: ARROW-10038: [C++] Spawn thread pool threads lazily

2020-09-22 Thread GitBox
pitrou opened a new pull request #8240: URL: https://github.com/apache/arrow/pull/8240 Thread pool threads are not spawned until necessary to execute a pending task. This is an automated message from the Apache Git Service.

[GitHub] [arrow] pitrou commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492734525 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] pitrou commented on a change in pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #7789: URL: https://github.com/apache/arrow/pull/7789#discussion_r492746151 ## File path: cpp/src/arrow/util/compression.h ## @@ -30,7 +30,18 @@ namespace arrow { struct Compression { /// \brief Compression algorithm - enum

[GitHub] [arrow] andygrove closed pull request #8236: ARROW-10060: [Rust] [DataFusion] Fixed error on which Err were discarded in MergeExec.

2020-09-22 Thread GitBox
andygrove closed pull request #8236: URL: https://github.com/apache/arrow/pull/8236 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492648007 ## File path: cpp/src/arrow/util/converter.h ## @@ -0,0 +1,353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492663108 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -1513,6 +1519,108 @@ def test_struct_from_tuples(): pa.array([tup], type=ty)

[GitHub] [arrow] lidavidm commented on pull request #8196: ARROW-10013: [FlightRPC][C++] fix setting generic client options

2020-09-22 Thread GitBox
lidavidm commented on pull request #8196: URL: https://github.com/apache/arrow/pull/8196#issuecomment-696675621 > Wow, did you report the `peer()` issue to gRPC? Not yet - I need to reproduce it in a VM first (probably with just base gRPC instead of trying to set up Arrow).

[GitHub] [arrow] drusso commented on a change in pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on a change in pull request #8222: URL: https://github.com/apache/arrow/pull/8222#discussion_r492692251 ## File path: rust/datafusion/src/physical_plan/distinct_expressions.rs ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] drusso commented on a change in pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on a change in pull request #8222: URL: https://github.com/apache/arrow/pull/8222#discussion_r492692158 ## File path: rust/datafusion/src/physical_plan/distinct_expressions.rs ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] drusso commented on a change in pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on a change in pull request #8222: URL: https://github.com/apache/arrow/pull/8222#discussion_r492692512 ## File path: rust/datafusion/src/physical_plan/distinct_expressions.rs ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] kszucs opened a new pull request #8238: ARROW-10063: [Archery][CI] Fetch main branch in archery build only when it is a pull request

2020-09-22 Thread GitBox
kszucs opened a new pull request #8238: URL: https://github.com/apache/arrow/pull/8238 Arrow's git data for the main branch is required to test the release curation scripts. While the build properly works from pull requests it is failing on the main branch since the requested

[GitHub] [arrow] jorisvandenbossche commented on pull request #8188: ARROW-9924: [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans

2020-09-22 Thread GitBox
jorisvandenbossche commented on pull request #8188: URL: https://github.com/apache/arrow/pull/8188#issuecomment-696688370 It seems the crashing test is: https://github.com/apache/arrow/blob/40d64756dc3b2c51489b48362d0f04ee3e2a7388/python/pyarrow/tests/test_parquet.py#L3389-L3414

[GitHub] [arrow] kszucs closed pull request #7797: ARROW-4189: [Rust] Added coverage report.

2020-09-22 Thread GitBox
kszucs closed pull request #7797: URL: https://github.com/apache/arrow/pull/7797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8240: ARROW-10038: [C++] Spawn thread pool threads lazily

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8240: URL: https://github.com/apache/arrow/pull/8240#issuecomment-696730535 https://issues.apache.org/jira/browse/ARROW-10038 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492695285 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -1513,6 +1519,108 @@ def test_struct_from_tuples(): pa.array([tup], type=ty)

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492695285 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -1513,6 +1519,108 @@ def test_struct_from_tuples(): pa.array([tup], type=ty)

[GitHub] [arrow] kszucs commented on pull request #8238: ARROW-10063: [Archery][CI] Fetch main branch in archery build only when it is a pull request

2020-09-22 Thread GitBox
kszucs commented on pull request #8238: URL: https://github.com/apache/arrow/pull/8238#issuecomment-696720299 It has passed on my fork so it should be good to go: https://github.com/kszucs/arrow/runs/1149604242 This is an

[GitHub] [arrow] xhochy commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
xhochy commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492772505 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] github-actions[bot] commented on pull request #8237: ARROW-10062: [Rust] Fix for null elems at key position in dictionary arrays

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8237: URL: https://github.com/apache/arrow/pull/8237#issuecomment-696681860 https://issues.apache.org/jira/browse/ARROW-10062 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on pull request #8136: ARROW-9078: [C++] Parquet read / write extension type with nested storage type

2020-09-22 Thread GitBox
pitrou commented on pull request #8136: URL: https://github.com/apache/arrow/pull/8136#issuecomment-696686570 I rebased and addressed review comments. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] drusso commented on pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

2020-09-22 Thread GitBox
drusso commented on pull request #8222: URL: https://github.com/apache/arrow/pull/8222#issuecomment-696689994 Thanks for the review/feedback all! @jorgecarleitao: > it may be worth take a look at #8172 , where we are trying to improve how to declare and run aggregate

[GitHub] [arrow] github-actions[bot] commented on pull request #8238: ARROW-10063: [Archery][CI] Fetch main branch in archery build only when it is a pull request

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8238: URL: https://github.com/apache/arrow/pull/8238#issuecomment-696690267 https://issues.apache.org/jira/browse/ARROW-10063 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492694003 ## File path: python/pyarrow/array.pxi ## @@ -21,28 +21,28 @@ import warnings cdef _sequence_to_array(object sequence, object mask, object size,

[GitHub] [arrow] t829702 edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 edited a comment on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 >> Is there a better way to create RecordBatch than the static method arrow.RecordBatch.new? > No, this is the recommended way to construct a RecordBatch zero-copy.

[GitHub] [arrow] trxcllnt commented on a change in pull request #8216: ARROW-8394: [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+

2020-09-22 Thread GitBox
trxcllnt commented on a change in pull request #8216: URL: https://github.com/apache/arrow/pull/8216#discussion_r493036448 ## File path: js/test/inference/column.ts ## @@ -33,33 +33,6 @@ const boolColumn = new Column(new Field('bool', boolType), [ expect(typeof

[GitHub] [arrow] github-actions[bot] commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
github-actions[bot] commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697027877 https://issues.apache.org/jira/browse/ARROW-10068 This is an automated message from the Apache Git

[GitHub] [arrow] t829702 edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 edited a comment on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 >> Is there a better way to create RecordBatch than the static method arrow.RecordBatch.new? > No, this is the recommended way to construct a RecordBatch zero-copy.

[GitHub] [arrow] t829702 edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 edited a comment on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 >> Is there a better way to create RecordBatch than the static method arrow.RecordBatch.new? > No, this is the recommended way to construct a RecordBatch zero-copy.

[GitHub] [arrow] t829702 edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 edited a comment on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 >> Is there a better way to create RecordBatch than the static method arrow.RecordBatch.new? > No, this is the recommended way to construct a RecordBatch zero-copy.

[GitHub] [arrow] nealrichardson opened a new pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
nealrichardson opened a new pull request #8243: URL: https://github.com/apache/arrow/pull/8243 I've tried enabling this in the R linux builds and have made some progress, but I'm hitting some issues that someone more experienced with cmake might be able to help with. * In the

[GitHub] [arrow] kou edited a comment on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
kou edited a comment on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697040532 For the aws-sdk headers, the following patch will fix them: ```diff diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake

[GitHub] [arrow] kou commented on pull request #8243: ARROW-10068: [C++] Add bundled external project for aws-sdk-cpp

2020-09-22 Thread GitBox
kou commented on pull request #8243: URL: https://github.com/apache/arrow/pull/8243#issuecomment-697040532 For the ubuntu R jobs, the following patch will fix them: ```diff diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake

[GitHub] [arrow] t829702 edited a comment on pull request #2035: ARROW-2116: [JS] implement IPC writers

2020-09-22 Thread GitBox
t829702 edited a comment on pull request #2035: URL: https://github.com/apache/arrow/pull/2035#issuecomment-696480501 >> Is there a better way to create RecordBatch than the static method arrow.RecordBatch.new? > No, this is the recommended way to construct a RecordBatch zero-copy.

[GitHub] [arrow] kszucs commented on a change in pull request #8088: ARROW-9992: [C++][Python] Refactor python to arrow conversions based on a reusable conversion API

2020-09-22 Thread GitBox
kszucs commented on a change in pull request #8088: URL: https://github.com/apache/arrow/pull/8088#discussion_r492790440 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -132,6 +133,10 @@ def _as_tuple(xs): return tuple(xs) +def _as_pairs(xs): Review

[GitHub] [arrow] xhochy closed pull request #8239: ARROW-10064: [C++] Resolve compile warnings on Apple Clang 12

2020-09-22 Thread GitBox
xhochy closed pull request #8239: URL: https://github.com/apache/arrow/pull/8239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] lidavidm opened a new pull request #8241: ARROW-10054: [C++][Python] don't crash when slice offset > length

2020-09-22 Thread GitBox
lidavidm opened a new pull request #8241: URL: https://github.com/apache/arrow/pull/8241 Instead of crashing using a CHECK, adjust the offset so that the caller gets an empty array, matching expectations for Python users.

[GitHub] [arrow] pitrou commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
pitrou commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492821834 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the [developer

[GitHub] [arrow] emkornfield commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
emkornfield commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492830650 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public

[GitHub] [arrow] emkornfield commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
emkornfield commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492834953 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1009,12 +1046,33 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public

[GitHub] [arrow] emkornfield commented on a change in pull request #8219: ARROW-9603: [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs

2020-09-22 Thread GitBox
emkornfield commented on a change in pull request #8219: URL: https://github.com/apache/arrow/pull/8219#discussion_r492835461 ## File path: cpp/src/parquet/column_writer.cc ## @@ -1130,37 +1188,61 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, public

[GitHub] [arrow] nealrichardson commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
nealrichardson commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492820728 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the

[GitHub] [arrow] TheNeuralBit commented on a change in pull request #8216: ARROW-8394: [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+

2020-09-22 Thread GitBox
TheNeuralBit commented on a change in pull request #8216: URL: https://github.com/apache/arrow/pull/8216#discussion_r492822710 ## File path: js/test/unit/ipc/helpers.ts ## @@ -54,13 +54,13 @@ export abstract class ArrowIOTestHelper { await testFn(await

[GitHub] [arrow] nealrichardson commented on a change in pull request #8235: ARROW-10059: [R][Doc] Give more advice on how to set up C++ build

2020-09-22 Thread GitBox
nealrichardson commented on a change in pull request #8235: URL: https://github.com/apache/arrow/pull/8235#discussion_r492822472 ## File path: r/README.md ## @@ -102,6 +102,43 @@ elsewhere, you’ll need to build it from source too. First, install the C++ library. See the

  1   2   3   >