[GitHub] [arrow] jorisvandenbossche opened a new pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
jorisvandenbossche opened a new pull request #7688: URL: https://github.com/apache/arrow/pull/7688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r452084718 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -283,7 +285,7 @@ public long getValidityBufferAddress() { public

[GitHub] [arrow] pitrou commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
pitrou commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452090022 ## File path: python/pyarrow/dataset.py ## @@ -239,15 +240,18 @@ def _ensure_filesystem(fs_or_uri): ) filesystem =

[GitHub] [arrow] pitrou commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-09 Thread GitBox
pitrou commented on pull request #7664: URL: https://github.com/apache/arrow/pull/7664#issuecomment-656033704 Rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] kszucs commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-09 Thread GitBox
kszucs commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655982612 I'm closing this because the upstream patch has already resolved the integration failure against turbodbc master. The latest release will fail until turbodbc cuts a new release

[GitHub] [arrow] kszucs closed pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-09 Thread GitBox
kszucs closed pull request #7680: URL: https://github.com/apache/arrow/pull/7680 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r452086650 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -812,12 +757,17 @@ public int getValueCount() { } public boolean

[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r452082717 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -268,11 +270,11 @@ public long getDataBufferAddress() { @Override

[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r452082646 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -268,11 +270,11 @@ public long getDataBufferAddress() { @Override

[GitHub] [arrow] rymurr commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
rymurr commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-656015125 Thanks again @liyafan82 and @BryanCutler have addressed your comments. This is an automated message from the

[GitHub] [arrow] pitrou opened a new pull request #7689: ARROW-9384: [C++] Avoid memory blowup on invalid IPC input

2020-07-09 Thread GitBox
pitrou opened a new pull request #7689: URL: https://github.com/apache/arrow/pull/7689 Do not attempt to allocate a null bitmap when concatenating null arrays. Also add a test for concatenation of null arrays. Should fix the following issue: *

[GitHub] [arrow] rymurr commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r452103046 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -812,12 +757,17 @@ public int getValueCount() { } public boolean

[GitHub] [arrow] pitrou commented on pull request #7689: ARROW-9384: [C++] Avoid memory blowup on invalid IPC input

2020-07-09 Thread GitBox
pitrou commented on pull request #7689: URL: https://github.com/apache/arrow/pull/7689#issuecomment-656038881 Need merging other `testing`-updating PRs before this one. This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7589: ARROW-9276: [Dev] Enable ARROW_CUDA when generating API documentations

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7589: URL: https://github.com/apache/arrow/pull/7589#issuecomment-655985904 Revision: 0d645d5260adc639a6fa6cc7a0adab879254f1d6 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r452076588 ## File path: java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java ## @@ -0,0 +1,64 @@ +/* + * Licensed to

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r452076883 ## File path: java/memory/memory-netty/pom.xml ## @@ -0,0 +1,106 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] github-actions[bot] commented on pull request #7689: ARROW-9384: [C++] Avoid memory blowup on invalid IPC input

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7689: URL: https://github.com/apache/arrow/pull/7689#issuecomment-656035748 https://issues.apache.org/jira/browse/ARROW-9384 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7688: URL: https://github.com/apache/arrow/pull/7688#issuecomment-655964759 https://issues.apache.org/jira/browse/ARROW-9383 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on pull request #7589: ARROW-9276: [Dev] Enable ARROW_CUDA when generating API documentations

2020-07-09 Thread GitBox
kszucs commented on pull request #7589: URL: https://github.com/apache/arrow/pull/7589#issuecomment-655984986 @github-actions crossbow submit test-ubuntu-18.04-docs This is an automated message from the Apache Git Service.

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r452076262 ## File path: java/memory/memory-core/src/main/java/org/apache/arrow/memory/DefaultAllocationManagerOption.java ## @@ -109,7 +109,8 @@ static

[GitHub] [arrow] pitrou commented on a change in pull request #7675: ARROW-9353: [Python][CI] Disable known failures in dask integration tests

2020-07-09 Thread GitBox
pitrou commented on a change in pull request #7675: URL: https://github.com/apache/arrow/pull/7675#discussion_r452112505 ## File path: ci/scripts/integration_dask.sh ## @@ -32,7 +32,11 @@ python -c "import dask.dataframe" # pytest -sv --pyargs dask.bytes.tests.test_hdfs #

[GitHub] [arrow] pitrou closed pull request #7675: ARROW-9353: [Python][CI] Disable known failures in dask integration tests

2020-07-09 Thread GitBox
pitrou closed pull request #7675: URL: https://github.com/apache/arrow/pull/7675 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452154974 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,31 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452154905 ## File path: python/pyarrow/dataset.py ## @@ -239,15 +240,18 @@ def _ensure_filesystem(fs_or_uri): ) filesystem =

[GitHub] [arrow] romainfrancois commented on a change in pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-09 Thread GitBox
romainfrancois commented on a change in pull request #7645: URL: https://github.com/apache/arrow/pull/7645#discussion_r452154972 ## File path: r/src/array_to_vector.cpp ## @@ -180,7 +183,7 @@ class Converter_Date32 : public Converter_SimpleArray { } Status

[GitHub] [arrow] lidavidm commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-09 Thread GitBox
lidavidm commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r452172011 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -812,12 +757,17 @@ public int getValueCount() { } public boolean

[GitHub] [arrow] pitrou commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
pitrou commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452171864 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,32 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7690: ARROW-9346: [C++][Python][Dataset] Add total_byte_size metadata to RowGroupInfo

2020-07-09 Thread GitBox
jorisvandenbossche opened a new pull request #7690: URL: https://github.com/apache/arrow/pull/7690 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] pitrou closed pull request #7683: ARROW-9326: [Python] Remove setuptools pinning

2020-07-09 Thread GitBox
pitrou closed pull request #7683: URL: https://github.com/apache/arrow/pull/7683 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorisvandenbossche commented on pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
jorisvandenbossche commented on pull request #7688: URL: https://github.com/apache/arrow/pull/7688#issuecomment-656095978 @rjzamora latest master now also supports fsspec filesystems in the high-level dataset API, so no need to do the fsspec -> pyarrow.fs filesystem conversion on the dask

[GitHub] [arrow] github-actions[bot] commented on pull request #7690: ARROW-9346: [C++][Python][Dataset] Add total_byte_size metadata to RowGroupInfo

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7690: URL: https://github.com/apache/arrow/pull/7690#issuecomment-656120992 https://issues.apache.org/jira/browse/ARROW-9346 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression

2020-07-09 Thread GitBox
jorisvandenbossche opened a new pull request #7691: URL: https://github.com/apache/arrow/pull/7691 Not an actual proper fix for ARROW-8655, but it can provide a workaround for now to retrieve the partition fields' name and value from the `partition_expression`

[GitHub] [arrow] pitrou closed pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
pitrou closed pull request #7688: URL: https://github.com/apache/arrow/pull/7688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorisvandenbossche commented on pull request #7690: ARROW-9346: [C++][Python][Dataset] Add total_byte_size metadata to RowGroupInfo

2020-07-09 Thread GitBox
jorisvandenbossche commented on pull request #7690: URL: https://github.com/apache/arrow/pull/7690#issuecomment-656116619 cc @rjzamora This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452158369 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,32 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452158369 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,32 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if

[GitHub] [arrow] github-actions[bot] commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7691: URL: https://github.com/apache/arrow/pull/7691#issuecomment-656147394 https://issues.apache.org/jira/browse/ARROW-8655 This is an automated message from the Apache Git

[GitHub] [arrow] romainfrancois commented on a change in pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-09 Thread GitBox
romainfrancois commented on a change in pull request #7645: URL: https://github.com/apache/arrow/pull/7645#discussion_r452155774 ## File path: r/src/array_to_vector.cpp ## @@ -180,7 +183,7 @@ class Converter_Date32 : public Converter_SimpleArray { } Status

[GitHub] [arrow] jorisvandenbossche commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-07-09 Thread GitBox
jorisvandenbossche commented on pull request #7272: URL: https://github.com/apache/arrow/pull/7272#issuecomment-656105473 I updated this, and added a small C++ test. > Are you intending to make that change in this PR too? Realistically speaking, not at the moment (I certainly

[GitHub] [arrow] jorisvandenbossche commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression

2020-07-09 Thread GitBox
jorisvandenbossche commented on pull request #7691: URL: https://github.com/apache/arrow/pull/7691#issuecomment-656143156 Not the cleanest solution, but could do this relatively quickly because it's based on what I did earlier in https://github.com/apache/arrow/pull/7523. But I think a

[GitHub] [arrow] wesm commented on issue #7443: module 'pyarrow.fs' has no attribute 'S3FileSystem'

2020-07-09 Thread GitBox
wesm commented on issue #7443: URL: https://github.com/apache/arrow/issues/7443#issuecomment-656159164 See https://github.com/apache/arrow/commit/d04f54bcf6e70f3b0a46fb36f68d461ef71f764c. This is an automated message from

[GitHub] [arrow] nealrichardson commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-07-09 Thread GitBox
nealrichardson commented on pull request #7272: URL: https://github.com/apache/arrow/pull/7272#issuecomment-656201358 I added https://issues.apache.org/jira/browse/ARROW-9387 for using this in R. It might be trivial but in case it isn't I don't want to block this.

[GitHub] [arrow] BryanCutler commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
BryanCutler commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r452341314 ## File path: java/memory/memory-core/src/main/java/org/apache/arrow/memory/DefaultAllocationManagerOption.java ## @@ -114,10 +114,20 @@ static

[GitHub] [arrow] romainfrancois commented on pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-09 Thread GitBox
romainfrancois commented on pull request #7660: URL: https://github.com/apache/arrow/pull/7660#issuecomment-656224658 Further progress re FixedSizeList: ``` r library(arrow, warn.conflicts = FALSE) a <- Array$create(list(1:4), type = fixed_size_list_of(int32(), 4L)) a

[GitHub] [arrow] lidavidm closed pull request #7587: ARROW-8973: [Java] Support batch value appending for large varchar/varbinary vectors

2020-07-09 Thread GitBox
lidavidm closed pull request #7587: URL: https://github.com/apache/arrow/pull/7587 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] bkietz commented on a change in pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-09 Thread GitBox
bkietz commented on a change in pull request #7645: URL: https://github.com/apache/arrow/pull/7645#discussion_r452321250 ## File path: r/src/array_to_vector.cpp ## @@ -56,15 +56,16 @@ class Converter { // ingest the values from the array into data[ start : (start + n)]

[GitHub] [arrow] kiszk commented on pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-09 Thread GitBox
kiszk commented on pull request #7555: URL: https://github.com/apache/arrow/pull/7555#issuecomment-656238676 Are there any other points that I have to address? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r452356610 ## File path: java/memory/memory-core/src/main/java/org/apache/arrow/memory/DefaultAllocationManagerOption.java ## @@ -114,10 +114,20 @@ static

[GitHub] [arrow] maxburke opened a new pull request #7693: Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.

2020-07-09 Thread GitBox
maxburke opened a new pull request #7693: URL: https://github.com/apache/arrow/pull/7693 This issue also applies to 32-bit integers miscast to 64-bit integers, however on little endian machines the cast + truncation results in a correct value. In the float32 case, the mis-read

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7692: URL: https://github.com/apache/arrow/pull/7692#discussion_r452408096 ## File path: python/pyarrow/_dataset.pyx ## @@ -909,13 +909,24 @@ cdef class ParquetFileFragment(FileFragment): def __reduce__(self):

[GitHub] [arrow] nealrichardson closed pull request #7650: ARROW-9340: [R] Use CRAN version of decor package

2020-07-09 Thread GitBox
nealrichardson closed pull request #7650: URL: https://github.com/apache/arrow/pull/7650 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] bkietz opened a new pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
bkietz opened a new pull request #7692: URL: https://github.com/apache/arrow/pull/7692 Populate ParquetFileFragment statistics whenever a reader is opened anyway. Also provides an explicit method for forcing load of statistics. (I exposed this as a public method, but maybe we'd prefer to

[GitHub] [arrow] github-actions[bot] commented on pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7692: URL: https://github.com/apache/arrow/pull/7692#issuecomment-656271917 https://issues.apache.org/jira/browse/ARROW-9321 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7693: ARROW-9391: [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7693: URL: https://github.com/apache/arrow/pull/7693#issuecomment-656271916 https://issues.apache.org/jira/browse/ARROW-9391 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on pull request #7694: WIP debugging CRAN fedora clang failure

2020-07-09 Thread GitBox
nealrichardson commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656284917 @github-actions crossbow submit *as-cran* This is an automated message from the Apache Git Service. To

[GitHub] [arrow] nealrichardson opened a new pull request #7694: WIP debugging CRAN fedora clang failure

2020-07-09 Thread GitBox
nealrichardson opened a new pull request #7694: URL: https://github.com/apache/arrow/pull/7694 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] paddyhoran commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-09 Thread GitBox
paddyhoran commented on pull request #7666: URL: https://github.com/apache/arrow/pull/7666#issuecomment-656306027 cc @maxburke @mcassels you might have opinions on this. This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to get keys from a partition expression

2020-07-09 Thread GitBox
jorisvandenbossche edited a comment on pull request #7691: URL: https://github.com/apache/arrow/pull/7691#issuecomment-656312308 @bkietz Thanks! I indeed basically reimplemented `VisitKeys` in cython .. This is an automated

[GitHub] [arrow] jorisvandenbossche commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to get keys from a partition expression

2020-07-09 Thread GitBox
jorisvandenbossche commented on pull request #7691: URL: https://github.com/apache/arrow/pull/7691#issuecomment-656312308 Thanks! I indeed basically reimplemented `VisitKeys` in cython .. This is an automated message from

[GitHub] [arrow] github-actions[bot] commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static libra

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656317070 https://issues.apache.org/jira/browse/ARROW-7605 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] removed a comment on pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels

2020-07-09 Thread GitBox
github-actions[bot] removed a comment on pull request #7668: URL: https://github.com/apache/arrow/pull/7668#issuecomment-655124179 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

[GitHub] [arrow] paddyhoran commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-09 Thread GitBox
paddyhoran commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r452371425 ## File path: rust/arrow/src/record_batch.rs ## @@ -216,15 +216,28 @@ impl Into for RecordBatch { } } -/// Definition of record batch reader.

[GitHub] [arrow] pitrou opened a new pull request #7695: ARROW-8989: [C++][Doc] Document available compute functions

2020-07-09 Thread GitBox
pitrou opened a new pull request #7695: URL: https://github.com/apache/arrow/pull/7695 Also fix glaring bugs in arithmetic kernels (signed overflow detection was broken). This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7695: ARROW-8989: [C++][Doc] Document available compute functions

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7695: URL: https://github.com/apache/arrow/pull/7695#issuecomment-656304218 https://issues.apache.org/jira/browse/ARROW-8989 This is an automated message from the Apache Git

[GitHub] [arrow] bkietz commented on a change in pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
bkietz commented on a change in pull request #7692: URL: https://github.com/apache/arrow/pull/7692#discussion_r452394725 ## File path: python/pyarrow/_dataset.pyx ## @@ -909,13 +909,24 @@ cdef class ParquetFileFragment(FileFragment): def __reduce__(self):

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to get keys from a partition expression

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7691: URL: https://github.com/apache/arrow/pull/7691#discussion_r452446416 ## File path: python/pyarrow/includes/libarrow_dataset.pxd ## @@ -314,6 +314,10 @@ cdef extern from "arrow/dataset/api.h" namespace

[GitHub] [arrow] nealrichardson commented on a change in pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED stati

2020-07-09 Thread GitBox
nealrichardson commented on a change in pull request #7696: URL: https://github.com/apache/arrow/pull/7696#discussion_r452446894 ## File path: cpp/CMakeLists.txt ## @@ -642,10 +642,6 @@ endif() # # TODO: Also rework how these libs work Review comment: Remove this

[GitHub] [arrow] github-actions[bot] commented on pull request #7668: ARROW-6982: [R] Add bindings for compare and boolean kernels

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7668: URL: https://github.com/apache/arrow/pull/7668#issuecomment-656256685 https://issues.apache.org/jira/browse/ARROW-6982 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7694: WIP debugging CRAN fedora clang failure

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7694: URL: https://github.com/apache/arrow/pull/7694#issuecomment-656285845 Revision: 621d79c0d0918d3559384759bcaf9a7e39ce40cd Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] BryanCutler commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
BryanCutler commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-656296147 merged to master, thanks @rymurr ! This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] BryanCutler closed pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-09 Thread GitBox
BryanCutler closed pull request #7619: URL: https://github.com/apache/arrow/pull/7619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] pitrou commented on pull request #7695: ARROW-8989: [C++][Doc] Document available compute functions

2020-07-09 Thread GitBox
pitrou commented on pull request #7695: URL: https://github.com/apache/arrow/pull/7695#issuecomment-65631 @nealrichardson You should like this. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] wesm opened a new pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-09 Thread GitBox
wesm opened a new pull request #7696: URL: https://github.com/apache/arrow/pull/7696 This PR is a renewed attempt to address the brokenness our static libraries and their exported CMake targets. To summarize what's wrong: if any BUNDLED library source is used, or if the jemalloc

[GitHub] [arrow] bkietz commented on pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
bkietz commented on pull request #7692: URL: https://github.com/apache/arrow/pull/7692#issuecomment-656331228 @jorisvandenbossche > I exposed this as a public method, but maybe we'd prefer to hide it inside the statistics property the way we do physical schema? I mean it could

[GitHub] [arrow] bkietz commented on a change in pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
bkietz commented on a change in pull request #7692: URL: https://github.com/apache/arrow/pull/7692#discussion_r452472840 ## File path: python/pyarrow/_dataset.pyx ## @@ -909,13 +909,24 @@ cdef class ParquetFileFragment(FileFragment): def __reduce__(self):

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
jorisvandenbossche commented on a change in pull request #7692: URL: https://github.com/apache/arrow/pull/7692#discussion_r452477463 ## File path: python/pyarrow/_dataset.pyx ## @@ -909,13 +909,24 @@ cdef class ParquetFileFragment(FileFragment): def __reduce__(self):

[GitHub] [arrow] kou commented on a change in pull request #7589: ARROW-9276: [Dev] Enable ARROW_CUDA when generating API documentations

2020-07-09 Thread GitBox
kou commented on a change in pull request #7589: URL: https://github.com/apache/arrow/pull/7589#discussion_r452480967 ## File path: dev/release/post-09-docs.sh ## @@ -42,20 +42,20 @@ popd pushd "${ARROW_DIR}" git checkout "${release_tag}" -docker-compose build ubuntu-cpp

[GitHub] [arrow] mrkn commented on pull request #7643: ARROW-9331: [C++] Improve the performance of Tensor-to-SparseTensor conversion

2020-07-09 Thread GitBox
mrkn commented on pull request #7643: URL: https://github.com/apache/arrow/pull/7643#issuecomment-656385805 @wesm Yes, currently I’ve almost done for SparseCOOTensor. I think merging this before finishing for all the sparse format is better than nothing to be merged before 1.0.

[GitHub] [arrow] github-actions[bot] commented on pull request #7698: ARROW-9380: [C++] Fix Filter crashes and bug in kernels with NullHandling::OUTPUT_NOT_NULL

2020-07-09 Thread GitBox
github-actions[bot] commented on pull request #7698: URL: https://github.com/apache/arrow/pull/7698#issuecomment-656392177 https://issues.apache.org/jira/browse/ARROW-9380 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-07-09 Thread GitBox
wesm commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-656396021 @mrkn this needs a rebase -- I can review and then merge this once the build is passing? This is an automated

[GitHub] [arrow] bkietz closed pull request #7690: ARROW-9346: [C++][Python][Dataset] Add total_byte_size metadata to RowGroupInfo

2020-07-09 Thread GitBox
bkietz closed pull request #7690: URL: https://github.com/apache/arrow/pull/7690 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on a change in pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library

2020-07-09 Thread GitBox
wesm commented on a change in pull request #7696: URL: https://github.com/apache/arrow/pull/7696#discussion_r452471124 ## File path: docs/source/developers/cpp/building.rst ## @@ -347,6 +352,50 @@ You can then invoke CMake to create the build directory and it will use the

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
jorisvandenbossche edited a comment on pull request #7692: URL: https://github.com/apache/arrow/pull/7692#issuecomment-656338363 > I mean it could be called inside the statistics property accessor so that the returned statistics are never None Ah, I misunderstood. That might also be

[GitHub] [arrow] bkietz commented on pull request #7686: ARROW-9345: [C++][Dataset] Support casting scalars to dictionary scalars

2020-07-09 Thread GitBox
bkietz commented on pull request #7686: URL: https://github.com/apache/arrow/pull/7686#issuecomment-656342273 @pitrou done, PTAL This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] wesm commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-09 Thread GitBox
wesm commented on pull request #7664: URL: https://github.com/apache/arrow/pull/7664#issuecomment-656341760 I'm quickly taking care of these small things so this can be merged This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on a change in pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
wesm commented on a change in pull request #7692: URL: https://github.com/apache/arrow/pull/7692#discussion_r452483172 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -335,91 +335,39 @@ static inline bool RowGroupInfosAreComplete(const std::vector& inf

[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-09 Thread GitBox
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656348362 The CI failure https://github.com/apache/arrow/pull/7696/checks?check_run_id=855635184 is because ASF JIRA is hurting right now for some reason

[GitHub] [arrow] wesm commented on pull request #6220: ARROW-7605: [C++] Bundle private jemalloc symbols into static library libarrow.a

2020-07-09 Thread GitBox
wesm commented on pull request #6220: URL: https://github.com/apache/arrow/pull/6220#issuecomment-656348034 This work has been superseded by https://github.com/apache/arrow/pull/7696 This is an automated message from the

[GitHub] [arrow] wesm commented on a change in pull request #7692: ARROW-9321: [C++][Dataset] Populate statistics opportunistically

2020-07-09 Thread GitBox
wesm commented on a change in pull request #7692: URL: https://github.com/apache/arrow/pull/7692#discussion_r452488952 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -508,36 +456,93 @@ ParquetFileFragment::ParquetFileFragment(FileSource source,

[GitHub] [arrow] rymurr commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-07-09 Thread GitBox
rymurr commented on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-656355715 > @rymurr could you take another look at this? sure! Will check first thing my am This is an automated

[GitHub] [arrow] wesm commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight

2020-07-09 Thread GitBox
wesm commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r452494159 ## File path: cpp/src/arrow/testing/gtest_util.cc ## @@ -389,6 +406,28 @@ void CompareBatch(const RecordBatch& left, const RecordBatch& right, } }

[GitHub] [arrow] nealrichardson commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so

2020-07-09 Thread GitBox
nealrichardson commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656371696 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-09 Thread GitBox
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656373613 @tobim FYI, this approach should be less offensive than what I was doing previously (hacking libarrow.a directly)

[GitHub] [arrow] wesm commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-09 Thread GitBox
wesm commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-656376374 ping @BryanCutler, would be good to merge this for the release if possible This is an automated message from the

[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-09 Thread GitBox
/vv4_9tw56nv9k3tkvyszvwg8gn/T/hbtmp/apache-arrow-20200709-78324-1gv9ydo/build/jemalloc_ep-prefix/src/jemalloc_ep && /private/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T/build-apache-arrow/Cellar/cmake/3.12.2/bin/cmake -P /private/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T/hbtm

[GitHub] [arrow] wesm commented on pull request #7696: ARROW-7605: [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that stat

2020-07-09 Thread GitBox
wesm commented on pull request #7696: URL: https://github.com/apache/arrow/pull/7696#issuecomment-656383933 @nealrichardson the "test-conda-r-4.0" failure doesn't appear to be related to this patch This is an automated

[GitHub] [arrow] emkornfield commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-09 Thread GitBox
emkornfield commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-656384231 @jorisvandenbossche Is there a way to skip specific tests, I thought all of the live in spark code? This is

[GitHub] [arrow] wesm opened a new pull request #7698: ARROW-9380: [C++] Fix Filter crashes and bug in kernels with NullHandling::OUTPUT_NOT_NULL

2020-07-09 Thread GitBox
wesm opened a new pull request #7698: URL: https://github.com/apache/arrow/pull/7698 A few interrelated fixes: * The `is_null` kernel was returning a slightly malformed `ArrayData` with the null_count set to -1 even though the validity bitmap is null. * Adds

[GitHub] [arrow] lidavidm commented on a change in pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-09 Thread GitBox
lidavidm commented on a change in pull request #7664: URL: https://github.com/apache/arrow/pull/7664#discussion_r452537636 ## File path: python/pyarrow/ipc.pxi ## @@ -18,6 +18,32 @@ import warnings +cpdef enum MetadataVersion: +V1 = CMetadataVersion_V1 +V2 =

[GitHub] [arrow] lidavidm commented on a change in pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-09 Thread GitBox
lidavidm commented on a change in pull request #7664: URL: https://github.com/apache/arrow/pull/7664#discussion_r452537636 ## File path: python/pyarrow/ipc.pxi ## @@ -18,6 +18,32 @@ import warnings +cpdef enum MetadataVersion: +V1 = CMetadataVersion_V1 +V2 =

  1   2   >