[GitHub] [arrow] kszucs commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kszucs commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622672970 @kou yes we use asf provided credentials on github actions to upload the images. We need a user with write access to that repository with a custom dockerhub token. Just granted

[GitHub] [arrow] kou edited a comment on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kou edited a comment on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622586181 @kszucs I want to set `DOCKERHUB_USER` and `DOCKERHUB_TOKEN` in https://travis-ci.org/github/ursa-labs/crossbow and https://github.com/ursa-labs/crossbow. Which user should we

[GitHub] [arrow] kou edited a comment on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kou edited a comment on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622586181 @kszucs I want to set `DOCKERHUB_USER` and `DOCKERHUB_TOKEN` in https://travis-ci.org/github/ursa-labs/crossbow and https://github.com/ursa-labs/crossbow. Which user should we

[GitHub] [arrow] sunchao commented on pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity

2020-05-01 Thread GitBox
sunchao commented on pull request #7076: URL: https://github.com/apache/arrow/pull/7076#issuecomment-622616144 Merged. Thanks @tustvold ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] tustvold commented on a change in pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity

2020-05-01 Thread GitBox
tustvold commented on a change in pull request #7076: URL: https://github.com/apache/arrow/pull/7076#discussion_r418774678 ## File path: rust/parquet/src/arrow/converter.rs ## @@ -128,7 +128,10 @@ pub struct Utf8ArrayConverter {} impl Converter>, StringArray> for

[GitHub] [arrow] wesm edited a comment on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format

2020-05-01 Thread GitBox
wesm edited a comment on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622586463 Problems: * There aren't any unit tests in this patch so there is some work to do to get this merged * Code is duplicated from arrow/util/parsing.h I started

[GitHub] [arrow] wesm edited a comment on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format

2020-05-01 Thread GitBox
wesm edited a comment on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622586463 Problems: * There aren't any unit tests in this patch so there is some work to do to get this merged * Code is duplicated from arrow/util/parsing.h

[GitHub] [arrow] wesm commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format

2020-05-01 Thread GitBox
wesm commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622586463 There aren't any unit tests in this patch so there is some work to do to get this merged This is an automated

[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622586181 @kszucs I want to set `DOCKERHUB_USER` and `DOCKERHUB_TOKEN` in https://travis-ci.org/github/ursa-labs/crossbow . Which user should we use for this? It seems that we use them for

[GitHub] [arrow] github-actions[bot] commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622585325 Revision: c73b2b65c373892f95dab8d65cf6fae06a39fe68 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622584942 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mayuropensource commented on pull request #7022: ARROW-8562: [C++] IO: Parameterize I/O Coalescing using S3 metrics

2020-05-01 Thread GitBox
mayuropensource commented on pull request #7022: URL: https://github.com/apache/arrow/pull/7022#issuecomment-622581788 thank you @wesm This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] pauldix commented on a change in pull request #7064: ARROW-6945: [Rust] WIP: Add initial skeleton for Rust integration tests

2020-05-01 Thread GitBox
pauldix commented on a change in pull request #7064: URL: https://github.com/apache/arrow/pull/7064#discussion_r418732315 ## File path: rust/arrow/Cargo.toml ## @@ -50,6 +50,7 @@ chrono = "0.4" flatbuffers = "0.6" hex = "0.4" arrow-flight = { path = "../arrow-flight",

[GitHub] [arrow] wesm commented on issue #7082: pyarrow 0.17 atexit handler causes a segmentation fault

2020-05-01 Thread GitBox
wesm commented on issue #7082: URL: https://github.com/apache/arrow/issues/7082#issuecomment-622562649 Please also indicate which exact version of Python you're using. From searching the internet, it seems that there was a bug fix in CPython that may affect older versions of Python 3.6 or

[GitHub] [arrow] kou commented on pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kou commented on pull request #7085: URL: https://github.com/apache/arrow/pull/7085#issuecomment-622562008 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kou opened a new pull request #7085: ARROW-8668: [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages

2020-05-01 Thread GitBox
kou opened a new pull request #7085: URL: https://github.com/apache/arrow/pull/7085 If we use QEMU on GitHub Actions, it takes 6h+. If we use ARM machine on Travis CI, it takes 30-40m. This change adds Docker image caching to

[GitHub] [arrow] tobim commented on pull request #7038: ARROW-8593: [C++][Parquet] Fix build with musl libc

2020-05-01 Thread GitBox
tobim commented on pull request #7038: URL: https://github.com/apache/arrow/pull/7038#issuecomment-622535149 @fsaintjacques @emkornfield sorry for the long silence, I updated the commit as you suggested. This is an

[GitHub] [arrow] wesm commented on pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-05-01 Thread GitBox
wesm commented on pull request #6744: URL: https://github.com/apache/arrow/pull/6744#issuecomment-622529037 thanks @lidavidm! I'm confident we'll be able to devise some solutions to the resource allocation problem This is

[GitHub] [arrow] wesm commented on pull request #7077: ARROW-8660: [C++][Gandiva] Reduce usage of Boost in Gandiva codebase

2020-05-01 Thread GitBox
wesm commented on pull request #7077: URL: https://github.com/apache/arrow/pull/7077#issuecomment-622513296 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] fsaintjacques commented on pull request #7038: ARROW-8593: [C++][Parquet] Fix build with musl libc

2020-05-01 Thread GitBox
fsaintjacques commented on pull request #7038: URL: https://github.com/apache/arrow/pull/7038#issuecomment-622503258 @tobim, I do not have the rights to push-force on this branch. You can apply this locally: ``` diff --git a/cpp/src/parquet/file_serialize_test.cc

[GitHub] [arrow] wesm commented on issue #7082: pyarrow 0.17 atexit handler causes a segmentation fault

2020-05-01 Thread GitBox
wesm commented on issue #7082: URL: https://github.com/apache/arrow/issues/7082#issuecomment-622500836 Can you please open a JIRA issue and provide more information about your system configuration? We saw this error inside GitHub Actions but I haven't been able to reproduce it locally

[GitHub] [arrow] wesm edited a comment on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
wesm edited a comment on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622496448 Is this change necessary? I understand why we are using VS2017 in the conda package but why in the wheels? I'm sort of -0.5 on this unless there is a concrete reason why we

[GitHub] [arrow] wesm commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
wesm commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622496448 Is this change necessary? I understand why we are using VS2017 in the conda package but why in the wheels? This is

[GitHub] [arrow] nealrichardson commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
nealrichardson commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622495803 For what it's worth, R on Windows uses mingw, not msvc This is an automated message from the Apache Git

[GitHub] [arrow] xhochy commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
xhochy commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622494222 > @xhochy to run the wheels, or build them? To run. This is an automated message from the Apache Git

[GitHub] [arrow] wesm edited a comment on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
wesm edited a comment on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622493938 > I might lean towards macros around FMV for clang/GCC that could enable fallback to a slow version for MSVC FTR, it would seem unfortunate to do the work of

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622493938 > I might lean towards macros around FMV for clang/GCC that could enable fallback to a slow version for MSVC FTR, it would seem unfortunate to do the work of SIMD-ifying code

[GitHub] [arrow] fsaintjacques commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
fsaintjacques commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622491245 @xhochy to run the wheels, or build them? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] tustvold commented on a change in pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity

2020-05-01 Thread GitBox
tustvold commented on a change in pull request #7076: URL: https://github.com/apache/arrow/pull/7076#discussion_r418650048 ## File path: rust/arrow/src/array/builder.rs ## @@ -527,11 +527,18 @@ pub struct ListBuilder { impl ListBuilder { /// Creates a new

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-01 Thread GitBox
fsaintjacques commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r418599653 ## File path: .github/workflows/java.yml ## @@ -38,6 +38,8 @@ on: env: DOCKER_BUILDKIT: 0 COMPOSE_DOCKER_CLI_BUILD: 1 + ARCHERY_DOCKER_USER:

[GitHub] [arrow] nealrichardson commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-01 Thread GitBox
nealrichardson commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r418622281 ## File path: docs/source/developers/docker.rst ## @@ -0,0 +1,143 @@ +.. raw:: html + + + +Running Docker Builds += + +Most

[GitHub] [arrow] github-actions[bot] commented on pull request #7084: ARROW-8664: [Java] Add flag to skip null check

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7084: URL: https://github.com/apache/arrow/pull/7084#issuecomment-622465804 https://issues.apache.org/jira/browse/ARROW-8664 This is an automated message from the Apache Git

[GitHub] [arrow] eerhardt commented on pull request #6121: ARROW-6603: [C#] - Nullable Array Support

2020-05-01 Thread GitBox
eerhardt commented on pull request #6121: URL: https://github.com/apache/arrow/pull/6121#issuecomment-622464999 Thanks again for this work, @abbotware. I'm going to close this PR out in favor of #7032. @abbotware - can you check out the functionality added there?

[GitHub] [arrow] rymurr opened a new pull request #7084: ARROW-8664: [Java] Add flag to skip null check

2020-05-01 Thread GitBox
rymurr opened a new pull request #7084: URL: https://github.com/apache/arrow/pull/7084 All Vector containers should skip null check when null check flag is enabled This is an automated message from the Apache Git Service.

[GitHub] [arrow] nealrichardson commented on pull request #6425: ARROW-6111: [Java] Support LargeVarChar and LargeBinary types

2020-05-01 Thread GitBox
nealrichardson commented on pull request #6425: URL: https://github.com/apache/arrow/pull/6425#issuecomment-622458158 If you're doing integration tests as part of this patch, please remove this skip:

[GitHub] [arrow] nealrichardson commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
nealrichardson commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622453709 @emkornfield yeah everything is run twice, 32-bit and 64-bit, because Windows. Usually when there's an failure with no clear error, it means that the process

[GitHub] [arrow] wesm commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format

2020-05-01 Thread GitBox
wesm commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622447876 I can pick up this patch today and take it the last mile so it can be merged. This is an automated message from

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-01 Thread GitBox
fsaintjacques commented on a change in pull request #7021: URL: https://github.com/apache/arrow/pull/7021#discussion_r418596433 ## File path: .github/workflows/archery.yml ## @@ -51,10 +53,12 @@ jobs: python-version: '3.7' - name: Install

[GitHub] [arrow] nealrichardson commented on pull request #6631: ARROW-8111: [C++][CSV] Support MM/DD/YYYY date format

2020-05-01 Thread GitBox
nealrichardson commented on pull request #6631: URL: https://github.com/apache/arrow/pull/6631#issuecomment-622437524 @github-actions autotune everything This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on pull request #7083: ARROW-8663: [Documentation] Small correction to building.rst

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7083: URL: https://github.com/apache/arrow/pull/7083#issuecomment-622435457 https://issues.apache.org/jira/browse/ARROW-8663 This is an automated message from the Apache Git

[GitHub] [arrow] crd477 opened a new pull request #7083: Update building.rst

2020-05-01 Thread GitBox
crd477 opened a new pull request #7083: URL: https://github.com/apache/arrow/pull/7083 simple typo: not -> note This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #7080: ARROW-8662: [CI] Consolidate appveyor scripts

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622423422 https://issues.apache.org/jira/browse/ARROW-8662 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs edited a comment on pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-01 Thread GitBox
kszucs edited a comment on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622417613 With warmed up cache the build time has been reduced to 6m from 17m which is promising. I'll need to do some gymnastics with the cache keys because the cache plugin

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622422074 @nealrichardson i'm not sure what the error is saying? also 32-bit R didn't realize that is still a thing :)

[GitHub] [arrow] kszucs commented on pull request #7080: [CI] Consolidate appveyor scripts [WIP]

2020-05-01 Thread GitBox
kszucs commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622418446 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache

[GitHub] [arrow] kszucs removed a comment on pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-01 Thread GitBox
kszucs removed a comment on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622418381 Checking that the cache properly works on my fork's master branch. This is an automated message from the

[GitHub] [arrow] kszucs removed a comment on pull request #7080: [CI] Consolidate appveyor scripts [WIP]

2020-05-01 Thread GitBox
kszucs removed a comment on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622408594 Checking that the cache properly works on my fork's master branch. This is an automated message from the

[GitHub] [arrow] kszucs commented on pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-01 Thread GitBox
kszucs commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622418381 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache

[GitHub] [arrow] kszucs commented on pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-01 Thread GitBox
kszucs commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622417613 With warmed up cache the build time has been reduced to 6m from 17m which is promising. This is an automated

[GitHub] [arrow] hantusk opened a new issue #7082: pyarrow 0.17 atexit handler causes a segmentation fault

2020-05-01 Thread GitBox
hantusk opened a new issue #7082: URL: https://github.com/apache/arrow/issues/7082 When running an ASGI webapp in python with uvicorn, I am getting the following error when shutting down. Solved by reverting back to pyarrow 0.16.0 ```python Error in atexit._run_exitfuncs:

[GitHub] [arrow] kszucs commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
kszucs commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622410968 I'd like elaborate a bit more on the generic dataset class regardless what kind of wrappers do we provide. - Do you plan to unify the filesystem classes into a single one which

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622409707 The 32-bit R failure seems like it could be real This is an automated message from the Apache Git Service. To

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418564299 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -83,131 +83,67 @@ Result FileFragment::Scan(std::shared_ptr options

[GitHub] [arrow] kszucs commented on pull request #7080: [CI] Consolidate appveyor scripts [WIP]

2020-05-01 Thread GitBox
kszucs commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622408594 Checking that the cache properly works on my fork's master branch. This is an automated message from the Apache

[GitHub] [arrow] cyb70289 commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-05-01 Thread GitBox
cyb70289 commented on a change in pull request #6954: URL: https://github.com/apache/arrow/pull/6954#discussion_r418558120 ## File path: cpp/src/arrow/util/simd.h ## @@ -17,6 +17,24 @@ #pragma once +#ifdef _MSC_VER +// MSVC x86_64/arm64 + +#if defined(_M_AMD64) ||

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418556583 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -221,42 +157,34 @@ Result> FileSystemDataset::Write( filesystem = std::make_shared();

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418556583 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -221,42 +157,34 @@ Result> FileSystemDataset::Write( filesystem = std::make_shared();

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
fsaintjacques commented on a change in pull request #7073: URL: https://github.com/apache/arrow/pull/7073#discussion_r418554202 ## File path: cpp/src/arrow/dataset/file_base.cc ## @@ -83,131 +83,67 @@ Result FileFragment::Scan(std::shared_ptr options

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622397441 Revision: b65130bd5eae0e6fe79ace9d529a57f76869f621 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622396964 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418385072 ## File path: csharp/src/Apache.Arrow/Arrays/StringArray.cs ## @@ -71,6 +76,15 @@ public string GetString(int index, Encoding encoding = default)

[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r417711102 ## File path: csharp/src/Apache.Arrow/Arrays/StringArray.cs ## @@ -71,6 +76,15 @@ public string GetString(int index, Encoding encoding = default)

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418383314 ## File path: csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs ## @@ -162,8 +188,8 @@ public TBuilder Swap(int i, int j) public TArray

[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r417709163 ## File path: csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs ## @@ -99,55 +105,75 @@ public abstract class PrimitiveArrayBuilder : IArrowArrayBu

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418380212 ## File path: csharp/src/Apache.Arrow/Arrays/PrimitiveArrayBuilder.cs ## @@ -99,55 +105,75 @@ public abstract class PrimitiveArrayBuilder : IArrowArrayBu

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418424144 ## File path: csharp/src/Apache.Arrow/Arrays/ListArray.cs ## @@ -135,6 +152,11 @@ public int GetValueOffset(int index) public int

[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418393347 ## File path: csharp/src/Apache.Arrow/Arrays/ListArray.cs ## @@ -135,6 +152,11 @@ public int GetValueOffset(int index) public int

[GitHub] [arrow] github-actions[bot] commented on pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7081: URL: https://github.com/apache/arrow/pull/7081#issuecomment-622378960 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r417708201 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -73,24 +76,34 @@ public TArray Build(MemoryAllocator allocator = default) {

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r416049921 ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -84,7 +84,7 @@ public ArrayData Slice(int offset, int length) length =

[GitHub] [arrow] eerhardt commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
eerhardt commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r415075754 ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -84,7 +84,7 @@ public ArrayData Slice(int offset, int length) length =

[GitHub] [arrow] zgramana commented on a change in pull request #7032: ARROW-6603: [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support

2020-05-01 Thread GitBox
zgramana commented on a change in pull request #7032: URL: https://github.com/apache/arrow/pull/7032#discussion_r418423816 ## File path: csharp/src/Apache.Arrow/Arrays/ArrayData.cs ## @@ -22,6 +22,8 @@ namespace Apache.Arrow { public sealed class ArrayData : IDisposable

[GitHub] [arrow] kszucs opened a new pull request #7081: [CI] Cache docker volumes [WIP]

2020-05-01 Thread GitBox
kszucs opened a new pull request #7081: URL: https://github.com/apache/arrow/pull/7081 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622373323 Take a look at how this is currently being handled in NumPy * https://numpy.org/neps/nep-0038-SIMD-optimizations.html * https://github.com/numpy/numpy/pull/13516 I

[GitHub] [arrow] wesm commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
wesm commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622371728 What I've seen other projects do (have to dig for some examples) is to have files like ``` functionality_nosimd.cc functionality_sse42.cc functionality_avx2.cc ```

[GitHub] [arrow] bkietz commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
bkietz commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622364855 WRT convenient single format or single file system datasets, it would be straightforward (and possibly more useful) to provide accessories for subsets,

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622357099 Revision: 8852e2f5f32402ca9c85877289c7948db141cca7 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622356800 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622354887 Do we need FileSystemDataset, maybe not. Is it still useful, IMO yes. As mentioned above, I personally find it convenient to know that my dataset has a single

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622346215 Revision: 8852e2f5f32402ca9c85877289c7948db141cca7 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622345879 @github-actions crossbow submit wheel-win-cp38 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on pull request #7080: [CI] Consolidate appveyor scripts [WIP]

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7080: URL: https://github.com/apache/arrow/pull/7080#issuecomment-622345293 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] kszucs opened a new pull request #7080: [CI] Consolidate appveyor scripts [WIP]

2020-05-01 Thread GitBox
kszucs opened a new pull request #7080: URL: https://github.com/apache/arrow/pull/7080 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kszucs edited a comment on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
kszucs edited a comment on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-62291 > * Simplified FileSystemDataset to hold a FragmentVector. Each Fragment must be a FileFragment and is checked at `FileSystemDataset::Make`. Fragments are not required to

[GitHub] [arrow] kszucs edited a comment on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
kszucs edited a comment on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-62291 > Fragments are not required to use the same backing filesystem nor the same format. This makes me wonder, why do we need FileSystemDataset and/or UnionDataset at

[GitHub] [arrow] kszucs commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox
kszucs commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-62291 > * Simplified FileSystemDataset to hold a FragmentVector. Each Fragment must be a FileFragment and is checked at `FileSystemDataset::Make`. Fragments are not required to use the

[GitHub] [arrow] github-actions[bot] commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-622320787 Revision: f3fe79f4e89811a0e24b240d1b306315cdce95ee Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #7021: ARROW-8628: [Dev] Wrap docker-compose commands with archery

2020-05-01 Thread GitBox
kszucs commented on pull request #7021: URL: https://github.com/apache/arrow/pull/7021#issuecomment-622320430 @github-actions crossbow submit -g test This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622319822 Revision: ea4f1375f4dcbc5fe7e81dc16b2c5c239bc30d30 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #7074: ARROW-8656: [Python] Switch to VS2017 in the windows wheel builds

2020-05-01 Thread GitBox
kszucs commented on pull request #7074: URL: https://github.com/apache/arrow/pull/7074#issuecomment-622319449 @github-actions crossbow submit wheel-win-cp38 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] mrkn commented on pull request #7079: ARROW-6501: [C++] Remove non_zero_length_ field from SparseIndex class

2020-05-01 Thread GitBox
mrkn commented on pull request #7079: URL: https://github.com/apache/arrow/pull/7079#issuecomment-622315057 @pitrou @rok Could you have a look at this? This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] kszucs commented on pull request #7060: ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTimeInterval

2020-05-01 Thread GitBox
kszucs commented on pull request #7060: URL: https://github.com/apache/arrow/pull/7060#issuecomment-622312875 > The ursabot build failures are spurious Occasionally happens after a force push. This is an automated

[GitHub] [arrow] kszucs commented on pull request #7060: ARROW-8619: [C++] Use distinct enum values for MonthInterval, DayTimeInterval

2020-05-01 Thread GitBox
kszucs commented on pull request #7060: URL: https://github.com/apache/arrow/pull/7060#issuecomment-622312610 @ursabot build This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] sunchao commented on a change in pull request #7076: ARROW-8659: [Rust] ListBuilder allocate with_capacity

2020-05-01 Thread GitBox
sunchao commented on a change in pull request #7076: URL: https://github.com/apache/arrow/pull/7076#discussion_r418460308 ## File path: rust/parquet/src/arrow/converter.rs ## @@ -128,7 +128,10 @@ pub struct Utf8ArrayConverter {} impl Converter>, StringArray> for

[GitHub] [arrow] github-actions[bot] commented on pull request #7079: ARROW-6501: [C++] Remove non_zero_length_ field from SparseIndex class

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7079: URL: https://github.com/apache/arrow/pull/7079#issuecomment-622287657 https://issues.apache.org/jira/browse/ARROW-6501 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7079: [ARROW-6501][C++] Remove non_zero_length_ field from SparseIndex class

2020-05-01 Thread GitBox
github-actions[bot] commented on pull request #7079: URL: https://github.com/apache/arrow/pull/7079#issuecomment-622283900 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] sunchao commented on a change in pull request #6898: ARROW-8399: [Rust] Extend memory alignments to include other architectures

2020-05-01 Thread GitBox
sunchao commented on a change in pull request #6898: URL: https://github.com/apache/arrow/pull/6898#discussion_r418443106 ## File path: rust/arrow/src/memory.rs ## @@ -21,7 +21,58 @@ use std::alloc::Layout; use std::mem::align_of; -pub const ALIGNMENT: usize = 64;

[GitHub] [arrow] mrkn opened a new pull request #7079: [ARROW-6501][C++] Remove non_zero_length_ field from SparseIndex class

2020-05-01 Thread GitBox
mrkn opened a new pull request #7079: URL: https://github.com/apache/arrow/pull/7079 This field is essentially needless, and may be obstacle to the future improvement of sparse tensors, such as adding value inserting feature.

[GitHub] [arrow] sunchao commented on pull request #7037: ARROW-6718: [Rust] Remove packed_simd

2020-05-01 Thread GitBox
sunchao commented on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-622272292 This definitely looks great from # of code deduction :D , but yeah it will be better if we can keep the perf loss minimum. > Also the future of packed_simd is unclear and

[GitHub] [arrow] sunchao commented on a change in pull request #7061: ARROW-8629: [Rust] Eliminate indirection of zero sized allocations

2020-05-01 Thread GitBox
sunchao commented on a change in pull request #7061: URL: https://github.com/apache/arrow/pull/7061#discussion_r418434103 ## File path: rust/arrow/src/util/bit_util.rs ## @@ -148,11 +148,17 @@ pub fn count_set_bits_offset(data: &[u8], offset: usize, length: usize) -> usize

  1   2   >