[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645113493 Not to beat a dead horse about ARROW-9155, but the turnaround time for simple benchmarks isn't great This is an

[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645113285 @ursabot benchmark --benchmark-filter=Greater 18e559b This is an automated message from the Apache Git Service. To

[GitHub] [arrow] dhirschfeld commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
dhirschfeld commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645141127 > Not to beat a dead horse about ARROW-9155 The bot is fine - I guess it links to whatever JIRA is listed in the title. That doesn't help if someone mentions a JIRA in

[GitHub] [arrow] houqp opened a new pull request #7464: ARROW-9157: [Rust][Datafusion] create_physical_plan should take self as immutable reference

2020-06-16 Thread GitBox
houqp opened a new pull request #7464: URL: https://github.com/apache/arrow/pull/7464 Since it's not mutating self, mutable reference is not necessary. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] emkornfield commented on pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox
emkornfield commented on pull request #7465: URL: https://github.com/apache/arrow/pull/7465#issuecomment-645150494 CC @wesm @pitrou This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] emkornfield edited a comment on pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox
emkornfield edited a comment on pull request #7465: URL: https://github.com/apache/arrow/pull/7465#issuecomment-645150494 CC @wesm @pitrou I would assume 1MM elements is still sufficient for any reasonable parquet file, but let me know if you think differently.

[GitHub] [arrow] wesm commented on a change in pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on a change in pull request #7461: URL: https://github.com/apache/arrow/pull/7461#discussion_r441240680 ## File path: cpp/src/arrow/compute/kernels/codegen_internal.h ## @@ -121,18 +123,34 @@ struct ArrayIterator> { template struct ArrayIterator> { -

[GitHub] [arrow] emkornfield opened a new pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox
emkornfield opened a new pull request #7465: URL: https://github.com/apache/arrow/pull/7465 Sets container size limit to have an upper bound memory footprint at the same order of magnitude as string size limits. This is an

[GitHub] [arrow] ursabot commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
ursabot commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645120596 [AMD64 Ubuntu 18.04 C++ Benchmark (#112729)](https://ci.ursalabs.org/#builders/73/builds/78) builder has been succeeded. Revision: 74caaae25e3bd95d57f3f6d9b835c2610639ab41

[GitHub] [arrow] dhirschfeld commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
dhirschfeld commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645115775 To help follow along it would be handy if references to JIRA could be autolinked:

[GitHub] [arrow] github-actions[bot] commented on pull request #7464: ARROW-9157: [Rust][Datafusion] create_physical_plan should take self as immutable reference

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7464: URL: https://github.com/apache/arrow/pull/7464#issuecomment-645143002 https://issues.apache.org/jira/browse/ARROW-9157 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7463: ARROW-9145: [C++] Implement BooleanArray::true_count and false_count, add Python bindings

2020-06-16 Thread GitBox
wesm commented on pull request #7463: URL: https://github.com/apache/arrow/pull/7463#issuecomment-645110679 FWIW `BooleanArray::true_count()` should be what's used for the `sum(boolean)` kernel This is an automated message

[GitHub] [arrow] ursabot commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
ursabot commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645111207 [AMD64 Ubuntu 18.04 C++ Benchmark (#112703)](https://ci.ursalabs.org/#builders/73/builds/77) builder has been succeeded. Revision: 301ffa539e634f2c464ca072cd5c543f1407f1f7

[GitHub] [arrow] wesm opened a new pull request #7463: ARROW-9145: [C++] Implement BooleanArray::true_count and false_count, add Python bindings

2020-06-16 Thread GitBox
wesm opened a new pull request #7463: URL: https://github.com/apache/arrow/pull/7463 This seemed like a reasonable place to put this, and it seems like it may come in handy. This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7463: ARROW-9145: [C++] Implement BooleanArray::true_count and false_count, add Python bindings

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7463: URL: https://github.com/apache/arrow/pull/7463#issuecomment-645116913 https://issues.apache.org/jira/browse/ARROW-9145 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645116862 Well we have these bot comments, is it not sufficient? https://github.com/apache/arrow/pull/7461#issuecomment-645086851

[GitHub] [arrow] wesm commented on pull request #7462: ARROW-7068: [C++] Add ListArray::offsets and LargeListArray::offsets returning boxed version of offsets as Int32Array/Int64Array

2020-06-16 Thread GitBox
wesm commented on pull request #7462: URL: https://github.com/apache/arrow/pull/7462#issuecomment-645116463 I'm a bit stumped on the MinGW failure ``` [ 64%] Linking CXX executable ../../release/arrow-array-test.exe

[GitHub] [arrow] github-actions[bot] commented on pull request #7465: PARQUET-1877: [C++] Reconcile thrift limits

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7465: URL: https://github.com/apache/arrow/pull/7465#issuecomment-645152229 https://issues.apache.org/jira/browse/PARQUET-1877 This is an automated message from the Apache Git

[GitHub] [arrow] liyafan82 commented on pull request #7287: ARROW-8771: [C++] Add boost/process library to build support

2020-06-16 Thread GitBox
liyafan82 commented on pull request #7287: URL: https://github.com/apache/arrow/pull/7287#issuecomment-645103962 > @liyafan82 I rebuilt the boost bundle and uploaded to bintray. Can you re-run whichever tests you have that failed because of this before and see if they work now? If they're

[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645104527 @ursabot benchmark --benchmark-filter=Greater 18e559b This is an automated message from the Apache Git Service. To

[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] runtime support for aggregate sum dense kernel

2020-06-16 Thread GitBox
jianxind commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-645130206 @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark This is an automated message from the Apache

[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645129920 Well my theory about greater/less didn't hold. The other relevant change was moving things into the anonymous namespace. It's possible that anonymous namespaces impact inlining

[GitHub] [arrow] ursabot commented on pull request #7314: ARROW-8996: [C++] runtime support for aggregate sum dense kernel

2020-06-16 Thread GitBox
ursabot commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-645135876 [AMD64 Ubuntu 18.04 C++ Benchmark (#112762)](https://ci.ursalabs.org/#builders/73/builds/79) builder has been succeeded. Revision: 525caea882fe49c0248932fff77df6bcd3f2f477

[GitHub] [arrow] wesm commented on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm commented on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645111941 Ah! It's because Greater is not implemented using Less. Let me switch things around This is an automated message

[GitHub] [arrow] wesm edited a comment on pull request #7461: ARROW-8969: [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators

2020-06-16 Thread GitBox
wesm edited a comment on pull request #7461: URL: https://github.com/apache/arrow/pull/7461#issuecomment-645111941 Ah! It's because Greater is now implemented using Less. Let me switch things around so things are based on Greater/GreaterEqual instead

[GitHub] [arrow] liyafan82 commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package

2020-06-16 Thread GitBox
liyafan82 commented on pull request #6729: URL: https://github.com/apache/arrow/pull/6729#issuecomment-644559038 > @liyafan82 OK. Could you open an JIRA issue for Spark to notify this to Spark developers? https://issues.apache.org/jira/browse/SPARK > FYI:

[GitHub] [arrow] emkornfield commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-16 Thread GitBox
emkornfield commented on pull request #7231: URL: https://github.com/apache/arrow/pull/7231#issuecomment-644564824 Not familiar enough with the implications @kszucs @kou what are the correct steps here? This is an automated

[GitHub] [arrow] emkornfield commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-16 Thread GitBox
emkornfield commented on pull request #7231: URL: https://github.com/apache/arrow/pull/7231#issuecomment-644556734 @tianchen92 did you use the merge tool to merge this (if you haven't please read

[GitHub] [arrow] tianchen92 merged pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-16 Thread GitBox
tianchen92 merged pull request #7231: URL: https://github.com/apache/arrow/pull/7231 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-16 Thread GitBox
tianchen92 commented on pull request #7231: URL: https://github.com/apache/arrow/pull/7231#issuecomment-644557502 > @tianchen92 did you use the merge tool to merge this (if you haven't please read

[GitHub] [arrow] liyafan82 commented on pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-06-16 Thread GitBox
liyafan82 commented on pull request #7326: URL: https://github.com/apache/arrow/pull/7326#issuecomment-644563536 > Hi, I have a question... probably not related to what this PR focus on. > What if the compressor / decompressor for the codec will have JNI call for compression /

[GitHub] [arrow] nevi-me closed pull request #7400: ARROW-9088: [Rust] Make prettyprint optional

2020-06-16 Thread GitBox
nevi-me closed pull request #7400: URL: https://github.com/apache/arrow/pull/7400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] zeapo commented on pull request #7309: ARROW-8993: [Rust] support reading non-seekable sources

2020-06-16 Thread GitBox
zeapo commented on pull request #7309: URL: https://github.com/apache/arrow/pull/7309#issuecomment-644591933 Hey @nevi-me, it should be done ^_^ This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] nevi-me closed pull request #7309: ARROW-8993: [Rust] support reading non-seekable sources

2020-06-16 Thread GitBox
nevi-me closed pull request #7309: URL: https://github.com/apache/arrow/pull/7309 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7436: URL: https://github.com/apache/arrow/pull/7436#issuecomment-644697102 Revision: a0c979addaa12b902680b18cfe914df6d3e81832 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] github-actions[bot] commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7436: URL: https://github.com/apache/arrow/pull/7436#issuecomment-644710999 Revision: a0c979addaa12b902680b18cfe914df6d3e81832 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #6512: ARROW-8430: [CI] Configure self-hosted runners for Github Actions

2020-06-16 Thread GitBox
kszucs commented on pull request #6512: URL: https://github.com/apache/arrow/pull/6512#issuecomment-644714811 Once approved I'll change the arm jobs to be triggered on push instead on pull request. This is an automated

[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-16 Thread GitBox
tianchen92 commented on pull request #7231: URL: https://github.com/apache/arrow/pull/7231#issuecomment-644741925 > The commit looks OK but @tianchen92 please use the merge tool not the GitHub UI to merge patches. I see, sorry for the incorrect merge operation :)

[GitHub] [arrow] wesm commented on issue #7443: module 'pyarrow.fs' has no attribute 'S3FileSystem'

2020-06-16 Thread GitBox
wesm commented on issue #7443: URL: https://github.com/apache/arrow/issues/7443#issuecomment-644741874 S3FileSystem is not yet available in the wheel packages from PyPI. Use conda-forge to install instead. If you run into an issue other than this please open a JIRA issue

[GitHub] [arrow] wesm commented on pull request #7447: ARROW-9130: [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask

2020-06-16 Thread GitBox
wesm commented on pull request #7447: URL: https://github.com/apache/arrow/pull/7447#issuecomment-644740913 @kszucs or @jorisvandenbossche could you investigate the failing Dask tests? This is an automated message from the

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7437: ARROW-8943: [C++][Python][Dataset] Add partitioning support to ParquetDatasetFactory

2020-06-16 Thread GitBox
jorisvandenbossche commented on a change in pull request #7437: URL: https://github.com/apache/arrow/pull/7437#discussion_r440673889 ## File path: python/pyarrow/dataset.py ## @@ -445,7 +446,8 @@ def _union_dataset(children, schema=None, **kwargs): return

[GitHub] [arrow] romainfrancois commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-16 Thread GitBox
romainfrancois commented on a change in pull request #7435: URL: https://github.com/apache/arrow/pull/7435#discussion_r440723407 ## File path: r/tests/testthat/test-Array.R ## @@ -445,6 +445,9 @@ test_that("Array$create() handles vector -> list arrays (ARROW-7662)", {

[GitHub] [arrow] szdrasiak opened a new issue #7443: module 'pyarrow.fs' has no attribute 'S3FileSystem'

2020-06-16 Thread GitBox
szdrasiak opened a new issue #7443: URL: https://github.com/apache/arrow/issues/7443 pyarrow version = 0.17.1 **I have latest version installed but I cant import `S3FileSystem` from `pyarrow.fs` module.** I'm trying read parquet file from S3 using `read_table` with argument

[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
pitrou commented on pull request #7436: URL: https://github.com/apache/arrow/pull/7436#issuecomment-644696068 @github-actions crossbow submit -g wheel This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on pull request #7444: ARROW-9144: [CI] OSS-Fuzz build fails because recent changes in the google repository

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7444: URL: https://github.com/apache/arrow/pull/7444#issuecomment-644720262 https://issues.apache.org/jira/browse/ARROW-9144 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs opened a new pull request #7445: ARROW-8583: [C++][Doc] Undocumented parameter in Dataset namespace

2020-06-16 Thread GitBox
kszucs opened a new pull request #7445: URL: https://github.com/apache/arrow/pull/7445 Build doxygen documentation for each PR to prevent undocumented C++ parameter issues surfacing on the master branch. This is an

[GitHub] [arrow] github-actions[bot] commented on pull request #7445: ARROW-8583: [C++][Doc] Undocumented parameter in Dataset namespace

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7445: URL: https://github.com/apache/arrow/pull/7445#issuecomment-644732770 https://issues.apache.org/jira/browse/ARROW-8583 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7446: ARROW-8965: [Python][Doc] Pyarrow documentation for pip nightlies references 404'd location

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7446: URL: https://github.com/apache/arrow/pull/7446#issuecomment-644732769 https://issues.apache.org/jira/browse/ARROW-8965 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on pull request #7438: ARROW-9105: [C++][Dataset][Python] Infer partition schema from partition expression

2020-06-16 Thread GitBox
jorisvandenbossche commented on pull request #7438: URL: https://github.com/apache/arrow/pull/7438#issuecomment-644732807 I think we talked before about the difference between a "physical" schema and a "reader" (dataset) schema. Right now a Fragment only knows about the physical

[GitHub] [arrow] bkietz commented on pull request #7438: ARROW-9105: [C++][Dataset][Python] Infer partition schema from partition expression

2020-06-16 Thread GitBox
bkietz commented on pull request #7438: URL: https://github.com/apache/arrow/pull/7438#issuecomment-644742632 That's doable, and a more minimal change. The schema option would only be relevant to Python (since that's where implicit casts are inserted, so that's where we'd need the extra

[GitHub] [arrow] wesm commented on pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on pull request #7442: URL: https://github.com/apache/arrow/pull/7442#issuecomment-644742275 The RTools 4.0 build is spurious. This is ready for review This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm closed issue #7443: module 'pyarrow.fs' has no attribute 'S3FileSystem'

2020-06-16 Thread GitBox
wesm closed issue #7443: URL: https://github.com/apache/arrow/issues/7443 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy closed pull request #7342: ARROW-9023: [C++] Use mimalloc conda package

2020-06-16 Thread GitBox
xhochy closed pull request #7342: URL: https://github.com/apache/arrow/pull/7342 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] xhochy commented on pull request #7342: ARROW-9023: [C++] Use mimalloc conda package

2020-06-16 Thread GitBox
xhochy commented on pull request #7342: URL: https://github.com/apache/arrow/pull/7342#issuecomment-644680940 Will have a look at this in some weeks again, closing for now. This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on pull request #7440: ARROW-8631: [C++][Dataset][Python] Raise in discovery on unparsable partition expression

2020-06-16 Thread GitBox
jorisvandenbossche commented on pull request #7440: URL: https://github.com/apache/arrow/pull/7440#issuecomment-644723549 I think ARROW-8631 is not the correct JIRA? (but I also don't directly find what it then should be)

[GitHub] [arrow] kszucs opened a new pull request #7446: ARROW-8965: [Python][Documentation] Pyarrow documentation for pip nightlies references 404'd location

2020-06-16 Thread GitBox
kszucs opened a new pull request #7446: URL: https://github.com/apache/arrow/pull/7446 It was recently renamed by gemfury. `www` queries work only with the new `pypi` subdomain, but both urls work when via pip, so we don't need to rerender the documentation.

[GitHub] [arrow] wesm opened a new pull request #7447: ARROW-9130: [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask

2020-06-16 Thread GitBox
wesm opened a new pull request #7447: URL: https://github.com/apache/arrow/pull/7447 This fixes the import error but the integration tests fail with other errors https://gist.github.com/wesm/312a26a9fcb2521756410251f6672e99

[GitHub] [arrow] bkietz commented on pull request #7440: ARROW-8613: [C++][Dataset][Python] Raise in discovery on unparsable partition expression

2020-06-16 Thread GitBox
bkietz commented on pull request #7440: URL: https://github.com/apache/arrow/pull/7440#issuecomment-644739464 @jorisvandenbossche I had the last two digits swapped, sorry. https://issues.apache.org/jira/browse/ARROW-8613

[GitHub] [arrow] github-actions[bot] commented on pull request #7440: ARROW-8613: [C++][Dataset][Python] Raise in discovery on unparsable partition expression

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7440: URL: https://github.com/apache/arrow/pull/7440#issuecomment-644739970 https://issues.apache.org/jira/browse/ARROW-8613 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
pitrou commented on a change in pull request #7436: URL: https://github.com/apache/arrow/pull/7436#discussion_r440745088 ## File path: python/manylinux1/scripts/build_boost.sh ## @@ -16,12 +16,12 @@ # specific language governing permissions and limitations # under the

[GitHub] [arrow] romainfrancois commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-16 Thread GitBox
romainfrancois commented on pull request #7435: URL: https://github.com/apache/arrow/pull/7435#issuecomment-644690630 I need to add tests for this: ``` r library(arrow, warn.conflicts = FALSE) nrows <- 1:3 df <- tibble::tibble( id = 1L, data = list(

[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
pitrou commented on pull request #7436: URL: https://github.com/apache/arrow/pull/7436#issuecomment-644710199 @github-actions crossbow submit -g wheel This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] pitrou commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
pitrou commented on a change in pull request #7436: URL: https://github.com/apache/arrow/pull/7436#discussion_r440785252 ## File path: python/manylinux1/scripts/build_boost.sh ## @@ -16,12 +16,12 @@ # specific language governing permissions and limitations # under the

[GitHub] [arrow] kszucs commented on a change in pull request #6512: ARROW-8430: [CI] Configure self-hosted runners for Github Actions

2020-06-16 Thread GitBox
kszucs commented on a change in pull request #6512: URL: https://github.com/apache/arrow/pull/6512#discussion_r440790297 ## File path: .github/workflows/cpp.yml ## @@ -92,6 +92,67 @@ jobs: continue-on-error: true run: archery docker push ${{ matrix.image }}

[GitHub] [arrow] kszucs opened a new pull request #7444: ARROW-9144: [CI] OSS-Fuzz build fails because recent changes in the google repository

2020-06-16 Thread GitBox
kszucs opened a new pull request #7444: URL: https://github.com/apache/arrow/pull/7444 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-16 Thread GitBox
wesm commented on pull request #7231: URL: https://github.com/apache/arrow/pull/7231#issuecomment-644734838 The commit looks OK but @tianchen92 please use the merge tool not the GitHub UI to merge patches. This is an

[GitHub] [arrow] xhochy closed pull request #7446: ARROW-8965: [Python][Doc] Pyarrow documentation for pip nightlies references 404'd location

2020-06-16 Thread GitBox
xhochy closed pull request #7446: URL: https://github.com/apache/arrow/pull/7446 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] xhochy commented on pull request #7446: ARROW-8965: [Python][Doc] Pyarrow documentation for pip nightlies references 404'd location

2020-06-16 Thread GitBox
xhochy commented on pull request #7446: URL: https://github.com/apache/arrow/pull/7446#issuecomment-644748682 Canceled the Travis builds as they are unrelated. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] fsaintjacques commented on pull request #7438: ARROW-9105: [C++][Dataset][Python] Infer partition schema from partition expression

2020-06-16 Thread GitBox
fsaintjacques commented on pull request #7438: URL: https://github.com/apache/arrow/pull/7438#issuecomment-644747978 I agree with the proposition #3, it aligns with the other method exposed. This is an automated message from

[GitHub] [arrow] wesm opened a new pull request #7448: ARROW-9143: [C++] Do not produce internal ArrayData with kUnknownNullCount in RecordBatch::Slice if source ArrayData::null_count is set to 0

2020-06-16 Thread GitBox
wesm opened a new pull request #7448: URL: https://github.com/apache/arrow/pull/7448 This field being non-zero caused code paths that assumed `buffers[0]` to be non-null. I also changed `ArrayData::Slice` to return a shared_ptr since there's little useful about a stack-allocated

[GitHub] [arrow] github-actions[bot] commented on pull request #7448: ARROW-9143: [C++] Do not produce internal ArrayData with kUnknownNullCount in RecordBatch::Slice if source ArrayData::null_count i

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7448: URL: https://github.com/apache/arrow/pull/7448#issuecomment-644776603 https://issues.apache.org/jira/browse/ARROW-9143 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7450: ARROW-8863: [C++] Ensure that ArrayData::null_count is always set to 0 when using ArrayData::Make and supplying null validity bitm

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7450: URL: https://github.com/apache/arrow/pull/7450#issuecomment-644804944 https://issues.apache.org/jira/browse/ARROW-8863 This is an automated message from the Apache Git

[GitHub] [arrow] rymurr commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-16 Thread GitBox
rymurr commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r440899130 ## File path: java/memory/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java ## @@ -78,6 +77,55 @@ public Object run() { Field addressField

[GitHub] [arrow] wesm commented on pull request #7342: ARROW-9023: [C++] Use mimalloc conda package

2020-06-16 Thread GitBox
wesm commented on pull request #7342: URL: https://github.com/apache/arrow/pull/7342#issuecomment-644821869 We should try to ship mimalloc in Windows conda builds for 1.0.0, @kszucs do you think you might be able to help?

[GitHub] [arrow] kszucs commented on pull request #7342: ARROW-9023: [C++] Use mimalloc conda package

2020-06-16 Thread GitBox
kszucs commented on pull request #7342: URL: https://github.com/apache/arrow/pull/7342#issuecomment-644835003 I'm a bit confused since mimalloc is already enabled in the [arrow-cpp-feedstock](https://github.com/conda-forge/arrow-cpp-feedstock/blob/master/recipe/bld-arrow.bat#L23) although

[GitHub] [arrow] wesm commented on a change in pull request #7410: ARROW-971: [C++][Compute] IsValid, IsNull kernels

2020-06-16 Thread GitBox
wesm commented on a change in pull request #7410: URL: https://github.com/apache/arrow/pull/7410#discussion_r440952996 ## File path: cpp/src/arrow/testing/random.h ## @@ -250,6 +250,21 @@ class ARROW_EXPORT RandomArrayGenerator {

[GitHub] [arrow] github-actions[bot] commented on pull request #7447: ARROW-9130: [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7447: URL: https://github.com/apache/arrow/pull/7447#issuecomment-644749299 https://issues.apache.org/jira/browse/ARROW-9130 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
pitrou commented on pull request #7436: URL: https://github.com/apache/arrow/pull/7436#issuecomment-644754431 The wheel build failure are a mixture of failures downloading Boost from SourceForge, and failures uploading the built wheel artifacts.

[GitHub] [arrow] wesm opened a new pull request #7450: ARROW-8863: [C++] Ensure that ArrayData::null_count is always set to 0 when using ArrayData::Make and supplying null validity bitmap

2020-06-16 Thread GitBox
wesm opened a new pull request #7450: URL: https://github.com/apache/arrow/pull/7450 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm commented on pull request #7447: ARROW-9130: [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask

2020-06-16 Thread GitBox
wesm commented on pull request #7447: URL: https://github.com/apache/arrow/pull/7447#issuecomment-644788995 Appveyor build https://ci.appveyor.com/project/wesm/arrow/builds/33551938 This is an automated message from the

[GitHub] [arrow] wesm closed pull request #7447: ARROW-9130: [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask

2020-06-16 Thread GitBox
wesm closed pull request #7447: URL: https://github.com/apache/arrow/pull/7447 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox
xhochy commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-644795203 > We'll need to make utf8proc a proper toolchain library, @pitrou should be able to help you with that. I can take care of that!

[GitHub] [arrow] rymurr commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-16 Thread GitBox
rymurr commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r440896771 ## File path: java/memory/src/main/java/org/apache/arrow/memory/util/MemoryUtil.java ## @@ -78,6 +77,55 @@ public Object run() { Field addressField

[GitHub] [arrow] xhochy opened a new pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-16 Thread GitBox
xhochy opened a new pull request #7452: URL: https://github.com/apache/arrow/pull/7452 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] xhochy commented on pull request #7342: ARROW-9023: [C++] Use mimalloc conda package

2020-06-16 Thread GitBox
xhochy commented on pull request #7342: URL: https://github.com/apache/arrow/pull/7342#issuecomment-644823682 > We should try to ship mimalloc in Windows conda builds for 1.0.0, @kszucs do you think you might be able to help? Vendoring should be a viable solution if we include the

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7447: ARROW-9130: [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask

2020-06-16 Thread GitBox
jorisvandenbossche edited a comment on pull request #7447: URL: https://github.com/apache/arrow/pull/7447#issuecomment-644752807 @wesm those failures are "expected", in the sense that the integration tests already have been failing for a few weeks with this error. At the time, I

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7441: ARROW-3446: [R] Document mapping of Arrow <-> R types

2020-06-16 Thread GitBox
fsaintjacques commented on a change in pull request #7441: URL: https://github.com/apache/arrow/pull/7441#discussion_r440846823 ## File path: r/vignettes/arrow.Rmd ## @@ -86,7 +88,73 @@ to other applications and services that use Arrow. One example is Spark: the move data to

[GitHub] [arrow] wesm commented on pull request #7448: ARROW-9143: [C++] Do not produce internal ArrayData with kUnknownNullCount in RecordBatch::Slice if source ArrayData::null_count is set to 0

2020-06-16 Thread GitBox
wesm commented on pull request #7448: URL: https://github.com/apache/arrow/pull/7448#issuecomment-644788666 The RTools 4.0 build is flaking this morning This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] romainfrancois commented on pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-16 Thread GitBox
romainfrancois commented on pull request #7435: URL: https://github.com/apache/arrow/pull/7435#issuecomment-644800961 also added `VectorToArrayConverter::Visit` for dictionaries to that we can handle things like list(factor()), the implementation is simpler than the original

[GitHub] [arrow] rymurr commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-06-16 Thread GitBox
rymurr commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r440909024 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtility.java ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] rymurr commented on a change in pull request #7326: ARROW-9010: [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-06-16 Thread GitBox
rymurr commented on a change in pull request #7326: URL: https://github.com/apache/arrow/pull/7326#discussion_r440915137 ## File path: java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionCodec.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] wesm commented on a change in pull request #7442: ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size

2020-06-16 Thread GitBox
wesm commented on a change in pull request #7442: URL: https://github.com/apache/arrow/pull/7442#discussion_r440918212 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -0,0 +1,1758 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] kszucs commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
kszucs commented on a change in pull request #7436: URL: https://github.com/apache/arrow/pull/7436#discussion_r440953705 ## File path: python/manylinux1/scripts/build_rapidjson.sh ## @@ -16,16 +16,16 @@ # specific language governing permissions and limitations # under the

[GitHub] [arrow] pitrou closed pull request #7444: ARROW-9144: [CI] OSS-Fuzz build fails because recent changes in the google repository

2020-06-16 Thread GitBox
pitrou closed pull request #7444: URL: https://github.com/apache/arrow/pull/7444 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou edited a comment on pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
pitrou edited a comment on pull request #7436: URL: https://github.com/apache/arrow/pull/7436#issuecomment-644754431 The wheel build failures are a mixture of failures downloading Boost from SourceForge, and failures uploading the built wheel artifacts.

[GitHub] [arrow] github-actions[bot] commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-644804943 https://issues.apache.org/jira/browse/ARROW-9133 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7451: ARROW-8769: [C++][R] Add convenience accessor for StructScalar fields

2020-06-16 Thread GitBox
github-actions[bot] commented on pull request #7451: URL: https://github.com/apache/arrow/pull/7451#issuecomment-644804946 https://issues.apache.org/jira/browse/ARROW-8769 This is an automated message from the Apache Git

[GitHub] [arrow] rymurr commented on a change in pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-06-16 Thread GitBox
rymurr commented on a change in pull request #7347: URL: https://github.com/apache/arrow/pull/7347#discussion_r440906089 ## File path: java/memory/src/main/java/org/apache/arrow/memory/rounding/DefaultRoundingPolicy.java ## @@ -31,19 +28,18 @@ public final long

[GitHub] [arrow] xhochy commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-16 Thread GitBox
xhochy commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-644818480 cc @wesm @maartenbreddels This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] kszucs commented on a change in pull request #7436: ARROW-9094: [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-16 Thread GitBox
kszucs commented on a change in pull request #7436: URL: https://github.com/apache/arrow/pull/7436#discussion_r440956230 ## File path: python/manylinux1/scripts/build_rapidjson.sh ## @@ -16,16 +16,16 @@ # specific language governing permissions and limitations # under the

  1   2   3   >