[GitHub] [arrow] nevi-me commented on a change in pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns

2020-05-04 Thread GitBox
nevi-me commented on a change in pull request #6770: URL: https://github.com/apache/arrow/pull/6770#discussion_r419233839 ## File path: rust/parquet/src/arrow/array_reader.rs ## @@ -468,6 +491,391 @@ where } } +/// Implementation of list array reader. +pub struct

[GitHub] [arrow] hantusk commented on issue #7082: pyarrow 0.17 atexit handler causes a segmentation fault

2020-05-04 Thread GitBox
hantusk commented on issue #7082: URL: https://github.com/apache/arrow/issues/7082#issuecomment-623301347 Yes, macOS running python 3.7.5 or 3.7.7. I will try and reproduce and continue commenting in the JIRA issue. This is

[GitHub] [arrow] github-actions[bot] commented on pull request #7097: ARROW-8690: [Python] Clean-up dataset+parquet tests now order is determinstic

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7097: URL: https://github.com/apache/arrow/pull/7097#issuecomment-623457675 https://issues.apache.org/jira/browse/ARROW-8690 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

2020-05-04 Thread GitBox
wesm commented on a change in pull request #7089: URL: https://github.com/apache/arrow/pull/7089#discussion_r419458260 ## File path: cpp/src/parquet/properties.h ## @@ -34,10 +34,14 @@ namespace parquet { +/// Control for data types in parquet. struct ParquetVersion {

[GitHub] [arrow] pitrou opened a new pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
pitrou opened a new pull request #7098: URL: https://github.com/apache/arrow/pull/7098 The AWS SDK creates a auto-growing StringStream by default, entailing multiple memory copies when transferring large data blocks (because of resizes). Instead, write directly into the target data area.

[GitHub] [arrow] wesm commented on a change in pull request #7088: ARROW-8111: [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl

2020-05-04 Thread GitBox
wesm commented on a change in pull request #7088: URL: https://github.com/apache/arrow/pull/7088#discussion_r419459324 ## File path: cpp/src/arrow/csv/converter.cc ## @@ -381,32 +383,98 @@ class NumericConverter : public ConcreteConverter {

[GitHub] [arrow] pitrou edited a comment on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
pitrou edited a comment on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623507548 @lidavidm It would be nice if you could run the benchmarks and post numbers on your setup (perhaps on S3 too?).

[GitHub] [arrow] pitrou commented on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
pitrou commented on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623507548 @lidavidm It would be nice if you could run the benchmarks and post number on your setup (perhaps on S3 too?).

[GitHub] [arrow] github-actions[bot] removed a comment on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
github-actions[bot] removed a comment on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-618494095 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

[GitHub] [arrow] github-actions[bot] commented on pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7096: URL: https://github.com/apache/arrow/pull/7096#issuecomment-623433630 Revision: 3e480a91833c7cd401fa120c520e5a51dad2d58a Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419401740 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419413778 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] github-actions[bot] commented on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623499876 https://issues.apache.org/jira/browse/ARROW-8692 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419541129 ## File path: docs/source/example1.dat ## @@ -0,0 +1 @@ +some data Review comment: Nope.

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7099: ARROW-8693: [Python] Insert implicit cast in Dataset.get_fragments with filter

2020-05-04 Thread GitBox
jorisvandenbossche opened a new pull request #7099: URL: https://github.com/apache/arrow/pull/7099 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] nealrichardson commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
nealrichardson commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419518977 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,197 @@ +.. raw:: html + + + +Running Docker Builds += + +Most

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419528025 ## File path: .github/workflows/archery.yml ## @@ -51,10 +53,12 @@ jobs: python-version: '3.7' - name: Install working-directory:

[GitHub] [arrow] pitrou commented on pull request #7094: ARROW-8689: [C++] Fix linking S3FS benchmarks

2020-05-04 Thread GitBox
pitrou commented on pull request #7094: URL: https://github.com/apache/arrow/pull/7094#issuecomment-623469326 Ok, rebasing. This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] wesm commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

2020-05-04 Thread GitBox
wesm commented on pull request #7089: URL: https://github.com/apache/arrow/pull/7089#issuecomment-623480713 Sorry fat-fingered the review request. I will take a look at this This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
nealrichardson commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-623520253 @emkornfield that looks like the same R Windows 32-bit failure to me. I'm not sure I understand your other question. Are you saying you want to use the old (status

[GitHub] [arrow] github-actions[bot] commented on pull request #7099: ARROW-8693: [Python] Insert implicit cast in Dataset.get_fragments with filter

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7099: URL: https://github.com/apache/arrow/pull/7099#issuecomment-623534789 https://issues.apache.org/jira/browse/ARROW-8693 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419538949 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,197 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] pitrou commented on a change in pull request #7088: ARROW-8111: [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl

2020-05-04 Thread GitBox
pitrou commented on a change in pull request #7088: URL: https://github.com/apache/arrow/pull/7088#discussion_r419336015 ## File path: cpp/src/arrow/util/value_parsing.cc ## @@ -79,5 +86,46 @@ bool StringToFloatConverter::StringToFloat(const char* s, size_t length, double*

[GitHub] [arrow] rymurr commented on pull request #7093: ARROW-8687: [Java] Remove references to io.netty.buffer.ArrowBuf

2020-05-04 Thread GitBox
rymurr commented on pull request #7093: URL: https://github.com/apache/arrow/pull/7093#issuecomment-623426298 Thanks both! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419339195 ## File path: .github/workflows/java.yml ## @@ -38,6 +38,8 @@ on: env: DOCKER_BUILDKIT: 0 COMPOSE_DOCKER_CLI_BUILD: 1 + ARCHERY_DOCKER_USER: ${{

[GitHub] [arrow] github-actions[bot] commented on pull request #7094: ARROW-8689: [C++] Fix linking S3FS benchmarks

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7094: URL: https://github.com/apache/arrow/pull/7094#issuecomment-623422812 https://issues.apache.org/jira/browse/ARROW-8689 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7095: ARROW-8664: [Java] Add flag to skip null check

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7095: URL: https://github.com/apache/arrow/pull/7095#issuecomment-623428933 https://issues.apache.org/jira/browse/ARROW-8664 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7096: URL: https://github.com/apache/arrow/pull/7096#issuecomment-623428932 https://issues.apache.org/jira/browse/ARROW-8644 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-04 Thread GitBox
pitrou commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-623367064 Did they increase the available cache size? Last I looked it was a fixed size for the entire repo. This is an

[GitHub] [arrow] rymurr commented on pull request #7084: ARROW-8664: [Java] Add flag to skip null check

2020-05-04 Thread GitBox
rymurr commented on pull request #7084: URL: https://github.com/apache/arrow/pull/7084#issuecomment-623343842 build is dependent on #7093 and rebase This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mr-smidge commented on pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-04 Thread GitBox
mr-smidge commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-623353035 Hi @zgramana (and @eerhardt). I was independently working on nullable array builder support (but have not been able to contribute just yet as my organisation needs to sign a

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
jorisvandenbossche opened a new pull request #7096: URL: https://github.com/apache/arrow/pull/7096 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] liyafan82 commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package

2020-05-04 Thread GitBox
liyafan82 commented on pull request #6729: URL: https://github.com/apache/arrow/pull/6729#issuecomment-623427960 > It will be good to link the related issues in the PR description. @siddharthteotia Thanks a lot for your effort. I have updated the description.

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419400510 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419416691 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] jorisvandenbossche commented on pull request #7097: ARROW-8690: [Python] Clean-up dataset+parquet tests now order is determinstic

2020-05-04 Thread GitBox
jorisvandenbossche commented on pull request #7097: URL: https://github.com/apache/arrow/pull/7097#issuecomment-623445265 @github-actions crossbow submit -g python This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on pull request #7097: ARROW-8690: [Python] Clean-up dataset+parquet tests now order is determinstic

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7097: URL: https://github.com/apache/arrow/pull/7097#issuecomment-623445892 Revision: 065dc03fc971c34c7d008283ef399b88939f8e98 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] lidavidm commented on a change in pull request #7012: ARROW-8555: [FlightRPC][Java] implement DoExchange

2020-05-04 Thread GitBox
lidavidm commented on a change in pull request #7012: URL: https://github.com/apache/arrow/pull/7012#discussion_r419413303 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightClient.java ## @@ -293,6 +292,76 @@ public void onCompleted() {

[GitHub] [arrow] pitrou commented on pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-05-04 Thread GitBox
pitrou commented on pull request #6959: URL: https://github.com/apache/arrow/pull/6959#issuecomment-623381960 @wesm Do you want to take a look at this? This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] pitrou opened a new pull request #7094: ARROW-8689: [C++] Fix linking S3FS benchmarks

2020-05-04 Thread GitBox
pitrou opened a new pull request #7094: URL: https://github.com/apache/arrow/pull/7094 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419413899 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] github-actions[bot] commented on pull request #7093: ARROW-8687: [Java] Remove references to io.netty.buffer.ArrowBuf

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7093: URL: https://github.com/apache/arrow/pull/7093#issuecomment-623337581 https://issues.apache.org/jira/browse/ARROW-8687 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419372644 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] rymurr opened a new pull request #7095: ARROW-8664: [Java] Add flag to skip null check

2020-05-04 Thread GitBox
rymurr opened a new pull request #7095: URL: https://github.com/apache/arrow/pull/7095 All Vector containers should skip null check when null check flag is enabled This is an automated message from the Apache Git Service.

[GitHub] [arrow] pitrou commented on pull request #7094: ARROW-8689: [C++] Fix linking S3FS benchmarks

2020-05-04 Thread GitBox
pitrou commented on pull request #7094: URL: https://github.com/apache/arrow/pull/7094#issuecomment-623432600 Java issues on CI look unrelated. @kszucs can you confirm? This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419353296 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] lidavidm commented on a change in pull request #7012: ARROW-8555: [FlightRPC][Java] implement DoExchange

2020-05-04 Thread GitBox
lidavidm commented on a change in pull request #7012: URL: https://github.com/apache/arrow/pull/7012#discussion_r419412323 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/TestDoExchange.java ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] wesm commented on pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-04 Thread GitBox
wesm commented on pull request #7032: URL: https://github.com/apache/arrow/pull/7032#issuecomment-623587071 @mr-smidge I don't think a CCLA is necessary for you to contribute. We've seen organizations require an ASF ICLA but extremely rarely a CCLA (in fact, many corporate attorneys will

[GitHub] [arrow] lidavidm commented on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
lidavidm commented on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623645720 Ok, I ran the benchmarks against S3 several times, but performance is wildly inconsistent. Before: ```

[GitHub] [arrow] pitrou commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
pitrou commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419602697 ## File path: cpp/src/arrow/util/bit_util.h ## @@ -610,6 +618,101 @@ class FirstTimeBitmapWriter { } } + /// Appends number_of_bits from word to

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419646876 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,197 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419646543 ## File path: docs/source/developers/integration.rst ## @@ -56,15 +56,19 @@ build mount is used for caching and sharing state between staged images. You

[GitHub] [arrow] jorisvandenbossche commented on pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
jorisvandenbossche commented on pull request #7096: URL: https://github.com/apache/arrow/pull/7096#issuecomment-623635439 So the question comes up if we actually should also not revert the behaviour in case of `use_legacy_dataset=False` (the

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-05-04 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r419653824 ## File path: rust/arrow/src/array/union.rs ## @@ -0,0 +1,1174 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-05-04 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r419657778 ## File path: rust/arrow/src/array/union.rs ## @@ -0,0 +1,1174 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] pitrou commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
pitrou commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419593319 ## File path: cpp/src/arrow/util/bit_util.h ## @@ -43,13 +43,18 @@ #if defined(_MSC_VER) #include +#include #pragma intrinsic(_BitScanReverse)

[GitHub] [arrow] lidavidm edited a comment on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
lidavidm edited a comment on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623645720 Ok, I ran the benchmarks against S3 several times, but performance is wildly inconsistent. This is from an EC2 VM to S3 in the same region. Before: ```

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419646333 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,197 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419646070 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,197 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] paddyhoran commented on a change in pull request #7004: ARROW-3827: [Rust] Implement UnionArray Updated

2020-05-04 Thread GitBox
paddyhoran commented on a change in pull request #7004: URL: https://github.com/apache/arrow/pull/7004#discussion_r419658264 ## File path: rust/arrow/src/array/union.rs ## @@ -0,0 +1,1174 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
jorisvandenbossche edited a comment on pull request #7096: URL: https://github.com/apache/arrow/pull/7096#issuecomment-623635439 So the question comes up if we actually should also not revert the behaviour in case of `use_legacy_dataset=False` (the `_ParquetDatasetV2` shim). For me,

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
jorisvandenbossche edited a comment on pull request #7096: URL: https://github.com/apache/arrow/pull/7096#issuecomment-623635439 So the question comes up if we actually should have the same behaviour in case of `use_legacy_dataset=False` (the `_ParquetDatasetV2` shim). For me, that

[GitHub] [arrow] rymurr opened a new pull request #7101: [Java] ARROW-8695: Remove references to PlatformDependent in arrow-memory

2020-05-04 Thread GitBox
rymurr opened a new pull request #7101: URL: https://github.com/apache/arrow/pull/7101 As part of ARROW-8230 we are reducing the usages of Netty inside the arrow-memory module. This step simply removes Netty utils references

[GitHub] [arrow] kszucs commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419709631 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,197 @@ +.. raw:: html + + + +Running Docker Builds += + +Most of our

[GitHub] [arrow] kevinushey commented on pull request #7102: ARROW-8699: [R] Fix automatic r_to_py conversion

2020-05-04 Thread GitBox
kevinushey commented on pull request #7102: URL: https://github.com/apache/arrow/pull/7102#issuecomment-623743441 LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] nealrichardson opened a new pull request #7102: ARROW-8699: [R] Fix automatic r_to_py conversion

2020-05-04 Thread GitBox
nealrichardson opened a new pull request #7102: URL: https://github.com/apache/arrow/pull/7102 This appears to be the fix for https://github.com/rstudio/reticulate/issues/748 cc @kevinushey This is an automated

[GitHub] [arrow] yordan-pavlov commented on pull request #7037: ARROW-6718: [DRAFT] [Rust] Remove packed_simd

2020-05-04 Thread GitBox
yordan-pavlov commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-623718293 hi @nevi-me , I have just published the filtering benchmark here: https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs

[GitHub] [arrow] wesm commented on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
wesm commented on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623666418 S3 benchmarks run outside of EC2 aren't likely to be useful This is an automated message from the Apache Git

[GitHub] [arrow] wesm edited a comment on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
wesm edited a comment on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623666418 S3 benchmarks run outside of EC2 aren't likely to be (too) useful This is an automated message from the

[GitHub] [arrow] rymurr opened a new pull request #7100: [Java] ARROW-8696: Convert tests to maven failsafe

2020-05-04 Thread GitBox
rymurr opened a new pull request #7100: URL: https://github.com/apache/arrow/pull/7100 Some tests are run via main() and can be run as integration tests instead. This makes running as part of an automated job easier. This

[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-623684032 > Some UX improvement I'd like to see: > > * I often fail to run `docker-compose` from the root of the sources leading to cryptic errors because it can't find the `.env`.

[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-623695509 @github-actions crossbow submit -g test This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] fsaintjacques commented on pull request #7098: ARROW-8692: [C++] Avoid memory copies when downloading from S3

2020-05-04 Thread GitBox
fsaintjacques commented on pull request #7098: URL: https://github.com/apache/arrow/pull/7098#issuecomment-623657503 Locally: ``` # Before $ time cpp/build/conda-release/release/dataset-parquet-scan-example 's3://123:12345678@nyc-tlc/parquet?scheme=http_override=localhost:9000'

[GitHub] [arrow] github-actions[bot] commented on pull request #7101: [Java] ARROW-8695: Remove references to PlatformDependent in arrow-memory

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7101: URL: https://github.com/apache/arrow/pull/7101#issuecomment-623675866 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] github-actions[bot] commented on pull request #7100: [Java] ARROW-8696: Convert tests to maven failsafe

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7100: URL: https://github.com/apache/arrow/pull/7100#issuecomment-623675865 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] github-actions[bot] commented on pull request #7101: ARROW-8695: [Java] Remove references to PlatformDependent in arrow-memory

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7101: URL: https://github.com/apache/arrow/pull/7101#issuecomment-623682441 https://issues.apache.org/jira/browse/ARROW-8695 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-04 Thread GitBox
nealrichardson commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r419722389 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,224 @@ +.. raw:: html + + + +Running Docker Builds += + +Most

[GitHub] [arrow] wesm commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-05-04 Thread GitBox
wesm commented on a change in pull request #6959: URL: https://github.com/apache/arrow/pull/6959#discussion_r419765783 ## File path: cpp/src/arrow/util/key_value_metadata.cc ## @@ -94,11 +94,42 @@ Result KeyValueMetadata::Get(const std::string& key) const { } } +Status

[GitHub] [arrow] wesm commented on pull request #7088: ARROW-8111: [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl

2020-05-04 Thread GitBox
wesm commented on pull request #7088: URL: https://github.com/apache/arrow/pull/7088#issuecomment-623788547 Turns out Gandiva already had a `strptime` wrapper so wasn't that hard to adapt it. This is ready to merge pending CI

[GitHub] [arrow] wesm opened a new pull request #7103: ARROW-8694: [C++][Parquet] Relax string size limit when deserializing Thrift messages

2020-05-04 Thread GitBox
wesm opened a new pull request #7103: URL: https://github.com/apache/arrow/pull/7103 While it's not an ideal use case for Parquet, the 10MB limit for strings was causing a Thrift deserialization failure due to the large "pandas metadata" JSON blob written with the Schema when there are

[GitHub] [arrow] wesm commented on issue #7058: Offline installation on Linux

2020-05-04 Thread GitBox
wesm commented on issue #7058: URL: https://github.com/apache/arrow/issues/7058#issuecomment-623774020 Closing. Please feel free to follow up on JIRA This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] wesm edited a comment on pull request #7088: ARROW-8111: [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl

2020-05-04 Thread GitBox
wesm edited a comment on pull request #7088: URL: https://github.com/apache/arrow/pull/7088#issuecomment-623788547 Turns out Gandiva already had a `strptime` wrapper (that was ~10+x faster than vendored/datetime.h) so wasn't that hard to adapt it. This is ready to merge pending CI

[GitHub] [arrow] emkornfield commented on pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-05-04 Thread GitBox
emkornfield commented on pull request #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-623846486 Nothing more from me This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] github-actions[bot] commented on pull request #7103: ARROW-8694: [C++][Parquet] Relax string size limit when deserializing Thrift messages

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7103: URL: https://github.com/apache/arrow/pull/7103#issuecomment-623757231 https://issues.apache.org/jira/browse/ARROW-8694 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7096: ARROW-8644: [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility

2020-05-04 Thread GitBox
wesm commented on pull request #7096: URL: https://github.com/apache/arrow/pull/7096#issuecomment-623764101 This is a regression? If so can you mark it with 0.17.1 (and 1.0.0) This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on a change in pull request #7088: ARROW-8111: [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl

2020-05-04 Thread GitBox
wesm commented on a change in pull request #7088: URL: https://github.com/apache/arrow/pull/7088#discussion_r419792382 ## File path: cpp/src/arrow/util/value_parsing.cc ## @@ -79,5 +86,46 @@ bool StringToFloatConverter::StringToFloat(const char* s, size_t length, double*

[GitHub] [arrow] wesm commented on a change in pull request #7088: ARROW-8111: [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl

2020-05-04 Thread GitBox
wesm commented on a change in pull request #7088: URL: https://github.com/apache/arrow/pull/7088#discussion_r419799490 ## File path: cpp/src/arrow/csv/converter_benchmark.cc ## @@ -20,15 +20,67 @@ #include #include +#include "arrow/buffer.h" #include

[GitHub] [arrow] jianxind commented on pull request #7029: ARROW-8579 [C++] Add AVX512 SIMD for spaced decoding and encoding.

2020-05-04 Thread GitBox
jianxind commented on pull request #7029: URL: https://github.com/apache/arrow/pull/7029#issuecomment-623845147 > A general question: why is this limited to `sizeof(T) == 4` and `sizeof(T) == 8`? There are 8-bit and 16-bit types as well.

[GitHub] [arrow] emkornfield commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #7089: URL: https://github.com/apache/arrow/pull/7089#discussion_r419852845 ## File path: cpp/src/parquet/properties.h ## @@ -34,10 +34,14 @@ namespace parquet { +/// Control for data types in parquet. struct

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419855634 ## File path: cpp/src/arrow/util/bit_util.h ## @@ -610,6 +618,103 @@ class FirstTimeBitmapWriter { } } + /// Appends number_of_bits from

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419855848 ## File path: cpp/src/arrow/util/bit_util.h ## @@ -610,6 +618,71 @@ class FirstTimeBitmapWriter { } } + /// Appends number_of_bits from word

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419855029 ## File path: cpp/src/arrow/util/bit_util_test.cc ## @@ -315,6 +318,115 @@ TEST(FirstTimeBitmapWriter, NormalOperation) { } } +std::string

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419855248 ## File path: cpp/src/parquet/level_conversion.cc ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419855459 ## File path: cpp/src/arrow/util/bit_util.h ## @@ -610,6 +618,103 @@ class FirstTimeBitmapWriter { } } + /// Appends number_of_bits from

[GitHub] [arrow] emkornfield commented on a change in pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #6985: URL: https://github.com/apache/arrow/pull/6985#discussion_r419855100 ## File path: cpp/src/arrow/util/bit_util_test.cc ## @@ -315,6 +318,115 @@ TEST(FirstTimeBitmapWriter, NormalOperation) { } } +std::string

[GitHub] [arrow] emkornfield commented on a change in pull request #7103: ARROW-8694: [C++][Parquet] Relax string size limit when deserializing Thrift messages

2020-05-04 Thread GitBox
emkornfield commented on a change in pull request #7103: URL: https://github.com/apache/arrow/pull/7103#discussion_r419875325 ## File path: cpp/src/parquet/thrift_internal.h ## @@ -362,7 +362,7 @@ inline void DeserializeThriftUnencryptedMsg(const uint8_t* buf, uint32_t* len,

[GitHub] [arrow] kiszk commented on pull request #7101: ARROW-8695: [Java] Remove references to PlatformDependent in arrow-memory

2020-05-04 Thread GitBox
kiszk commented on pull request #7101: URL: https://github.com/apache/arrow/pull/7101#issuecomment-623868378 Looks good to me This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] rymurr opened a new pull request #7093: [Java] ARROW-8687: Remove references to io.netty.buffer.ArrowBuf

2020-05-04 Thread GitBox
rymurr opened a new pull request #7093: URL: https://github.com/apache/arrow/pull/7093 Some references to `io.netty.buffer.ArrowBuf` were missed off in ARROW-8229. This cleans up the last remaining references. This is an

[GitHub] [arrow] github-actions[bot] commented on pull request #7093: [Java] ARROW-8687: Remove references to io.netty.buffer.ArrowBuf

2020-05-04 Thread GitBox
github-actions[bot] commented on pull request #7093: URL: https://github.com/apache/arrow/pull/7093#issuecomment-623318389 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] rymurr commented on pull request #7093: [Java] ARROW-8687: Remove references to io.netty.buffer.ArrowBuf

2020-05-04 Thread GitBox
rymurr commented on pull request #7093: URL: https://github.com/apache/arrow/pull/7093#issuecomment-623318684 @liyafan82 just noticed a few entries of `io.netty.buffer.ArrowBuf` after the recent merge of your patch to move it to `com.apache.arrow.memory.ArrowBuf`

  1   2   >