[GitHub] [arrow] BryanCutler closed pull request #7677: ARROW-9370: [Java] Bump Netty version

2020-07-08 Thread GitBox
BryanCutler closed pull request #7677: URL: https://github.com/apache/arrow/pull/7677 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] BryanCutler commented on pull request #7677: ARROW-9370: [Java] Bump Netty version

2020-07-08 Thread GitBox
BryanCutler commented on pull request #7677: URL: https://github.com/apache/arrow/pull/7677#issuecomment-655916604 Merged to master, thanks @rymurr ! This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] github-actions[bot] commented on pull request #7687: [Rust] [DataFusuin] Set of simplifications to Hash aggregations

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7687: URL: https://github.com/apache/arrow/pull/7687#issuecomment-655910321 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] jorgecarleitao opened a new pull request #7687: [Rust] [DataFusuin] Set of simplifications to Hash aggregations

2020-07-08 Thread GitBox
jorgecarleitao opened a new pull request #7687: URL: https://github.com/apache/arrow/pull/7687 See individual commits for details This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] kkozmic commented on issue #7443: module 'pyarrow.fs' has no attribute 'S3FileSystem'

2020-07-08 Thread GitBox
kkozmic commented on issue #7443: URL: https://github.com/apache/arrow/issues/7443#issuecomment-655897659 > S3FileSystem is not yet available in the wheel packages from PyPI Is addressing that on the roadmap? Using Conda isn't a viable option for me, so that issue is a bit of a

[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-08 Thread GitBox
houqp commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451955910 ## File path: rust/arrow/src/record_batch.rs ## @@ -216,15 +216,28 @@ impl Into for RecordBatch { } } -/// Definition of record batch reader. +///

[GitHub] [arrow] liyafan82 commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
liyafan82 commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-655849190 > Hey all, > > Thanks @BryanCutler and @liyafan82 for the comments and thanks @jacques-n for the reasoning behind unsafe/netty module split. > > I have addressed

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451918970 ## File path: java/performance/pom.xml ## @@ -53,7 +53,7 @@ org.apache.arrow -arrow-memory +

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451917617 ## File path: java/memory/memory-netty/pom.xml ## @@ -0,0 +1,106 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451916739 ## File path: java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java ## @@ -0,0 +1,64 @@ +/* + * Licensed to

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914666 ## File path: cpp/src/arrow/compute/registry.cc ## @@ -115,6 +116,19 @@ static std::unique_ptr CreateBuiltInRegistry() {

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451916262 ## File path: cpp/src/arrow/compute/kernels/aggregate_sum_avx2.cc ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451916161 ## File path: cpp/src/arrow/compute/kernels/aggregate_sum_avx512.cc ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914951 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic_internal.h ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914906 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic_internal.h ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451915011 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic_internal.h ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914803 ## File path: cpp/src/arrow/util/bit_block_counter.h ## @@ -76,6 +76,9 @@ struct BitBlockCount { bool AllSet() const { return this->length ==

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914848 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -52,7 +52,22 @@ if(ARROW_CPU_FLAG STREQUAL "x86")

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914733 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic_internal.h ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jianxind commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
jianxind commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451914666 ## File path: cpp/src/arrow/compute/registry.cc ## @@ -115,6 +116,19 @@ static std::unique_ptr CreateBuiltInRegistry() {

[GitHub] [arrow] wesm commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
wesm commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451834885 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic_internal.h ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] brills commented on pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions

2020-07-08 Thread GitBox
brills commented on pull request #7632: URL: https://github.com/apache/arrow/pull/7632#issuecomment-655760157 @wesm: Thanks! These will replace our own implementations. This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-08 Thread GitBox
wesm commented on pull request #7659: URL: https://github.com/apache/arrow/pull/7659#issuecomment-655756907 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm closed pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-08 Thread GitBox
wesm closed pull request #7659: URL: https://github.com/apache/arrow/pull/7659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-08 Thread GitBox
jorisvandenbossche commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r451819257 ## File path: python/pyarrow/_dataset.pyx ## @@ -881,10 +881,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] nealrichardson commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-08 Thread GitBox
nealrichardson commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r451810399 ## File path: python/pyarrow/_dataset.pyx ## @@ -881,10 +881,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] bkietz commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-08 Thread GitBox
bkietz commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r451809431 ## File path: python/pyarrow/_dataset.pyx ## @@ -881,10 +881,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-08 Thread GitBox
jorisvandenbossche commented on a change in pull request #7608: URL: https://github.com/apache/arrow/pull/7608#discussion_r451801680 ## File path: cpp/src/arrow/dataset/partition.cc ## @@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory { }

[GitHub] [arrow] BryanCutler commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
BryanCutler commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451797963 ## File path: java/memory/memory-core/src/main/java/org/apache/arrow/memory/DefaultAllocationManagerOption.java ## @@ -109,7 +109,8 @@ static

[GitHub] [arrow] bkietz commented on pull request #7686: ARROW-9345: [C++][Dataset] Support casting scalars to dictionary scalars

2020-07-08 Thread GitBox
bkietz commented on pull request #7686: URL: https://github.com/apache/arrow/pull/7686#issuecomment-655728439 @jorisvandenbossche precisely This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] BryanCutler commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
BryanCutler commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451792885 ## File path: java/memory/memory-unsafe/pom.xml ## @@ -0,0 +1,54 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] rymurr commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
rymurr commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-655716943 > Just a minor nit on making sure there are newlines at the end of the new pom.xml files for consistency. Also, would you mind improving the error message if no`

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7675: ARROW-9353: [Python][CI] Disable known failures in dask integration tests

2020-07-08 Thread GitBox
jorisvandenbossche commented on a change in pull request #7675: URL: https://github.com/apache/arrow/pull/7675#discussion_r451768761 ## File path: ci/scripts/integration_dask.sh ## @@ -32,7 +32,11 @@ python -c "import dask.dataframe" # pytest -sv --pyargs

[GitHub] [arrow] github-actions[bot] commented on pull request #7686: ARROW-9345: [C++][Dataset] Support casting scalars to dictionary scalars

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7686: URL: https://github.com/apache/arrow/pull/7686#issuecomment-655699975 https://issues.apache.org/jira/browse/ARROW-9345 This is an automated message from the Apache Git

[GitHub] [arrow] bkietz opened a new pull request #7686: ARROW-9345: [C++][Dataset] Support casting scalars to dictionary scalars

2020-07-08 Thread GitBox
bkietz opened a new pull request #7686: URL: https://github.com/apache/arrow/pull/7686 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] lidavidm commented on a change in pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-08 Thread GitBox
lidavidm commented on a change in pull request #7290: URL: https://github.com/apache/arrow/pull/7290#discussion_r451749003 ## File path: java/vector/src/main/codegen/templates/DenseUnionVector.java ## @@ -268,11 +270,11 @@ public long getDataBufferAddress() { @Override

[GitHub] [arrow] andygrove commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-08 Thread GitBox
andygrove commented on pull request #7666: URL: https://github.com/apache/arrow/pull/7666#issuecomment-655677147 I've only skimmed through the changes but LGTM overall. I will try and find time this weekend for a full review. Thanks @paddyhoran I think this is a great improvement.

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451735636 ## File path: java/memory/memory-core/pom.xml ## @@ -0,0 +1,60 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451734778 ## File path: java/memory/memory-unsafe/pom.xml ## @@ -0,0 +1,54 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] lidavidm commented on pull request #7685: ARROW-9362: [Java] Increment default MetadataVersion to V5

2020-07-08 Thread GitBox
lidavidm commented on pull request #7685: URL: https://github.com/apache/arrow/pull/7685#issuecomment-655665159 We should wait for #7290, yes. (Is anyone reviewing it?) Also, this now checks the metadata version against the schema before writing.

[GitHub] [arrow] pitrou closed pull request #7674: ARROW-9368: [Python] Rename predicate argument to filter in split_by_row_group()

2020-07-08 Thread GitBox
pitrou closed pull request #7674: URL: https://github.com/apache/arrow/pull/7674 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou commented on pull request #7674: ARROW-9368: [Python] Rename predicate argument to filter in split_by_row_group()

2020-07-08 Thread GitBox
pitrou commented on pull request #7674: URL: https://github.com/apache/arrow/pull/7674#issuecomment-655664193 Thank you @jorisvandenbossche ! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on pull request #7685: ARROW-9362: [Java] Increment default MetadataVersion to V5

2020-07-08 Thread GitBox
pitrou commented on pull request #7685: URL: https://github.com/apache/arrow/pull/7685#issuecomment-655663379 Does Java already implement the required IPC union layout and semantics? Otherwise perhaps we should defer this PR until the union work is done.

[GitHub] [arrow] paddyhoran commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-08 Thread GitBox
paddyhoran commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451719804 ## File path: rust/arrow/src/record_batch.rs ## @@ -216,15 +216,28 @@ impl Into for RecordBatch { } } -/// Definition of record batch reader.

[GitHub] [arrow] paddyhoran commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-08 Thread GitBox
paddyhoran commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451719340 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -20,13 +20,13 @@ use std::sync::{Arc, Mutex}; use arrow::datatypes::Schema; +use

[GitHub] [arrow] paddyhoran commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-08 Thread GitBox
paddyhoran commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451717658 ## File path: rust/arrow/src/record_batch.rs ## @@ -216,15 +216,28 @@ impl Into for RecordBatch { } } -/// Definition of record batch reader.

[GitHub] [arrow] BryanCutler commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-08 Thread GitBox
BryanCutler commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451710176 ## File path: java/memory/memory-netty/pom.xml ## @@ -0,0 +1,107 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] nealrichardson commented on a change in pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-08 Thread GitBox
nealrichardson commented on a change in pull request #7660: URL: https://github.com/apache/arrow/pull/7660#discussion_r451700766 ## File path: r/src/symbols.cpp ## @@ -28,36 +28,41 @@ SEXP symbols::row_names = Rf_install("row.names"); SEXP symbols::serialize_arrow_r_metadata

[GitHub] [arrow] nealrichardson commented on a change in pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-08 Thread GitBox
nealrichardson commented on a change in pull request #7645: URL: https://github.com/apache/arrow/pull/7645#discussion_r451696851 ## File path: r/src/array_to_vector.cpp ## @@ -180,7 +183,7 @@ class Converter_Date32 : public Converter_SimpleArray { } Status

[GitHub] [arrow] sbinet closed pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-08 Thread GitBox
sbinet closed pull request #7670: URL: https://github.com/apache/arrow/pull/7670 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7685: ARROW-9362: [Java] Increment default MetadataVersion to V5

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7685: URL: https://github.com/apache/arrow/pull/7685#issuecomment-655642038 https://issues.apache.org/jira/browse/ARROW-9362 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7675: ARROW-9353: [Python][CI] Disable known failures in dask integration tests

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7675: URL: https://github.com/apache/arrow/pull/7675#issuecomment-655640336 Revision: ae6b8ed68a570760bab2014e74d65b04ae455a0d Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] lidavidm opened a new pull request #7685: ARROW-9362: [Java] Increment default MetadataVersion to V5

2020-07-08 Thread GitBox
lidavidm opened a new pull request #7685: URL: https://github.com/apache/arrow/pull/7685 This also enables Flight to write differing metadata versions. Not implemented: any checks for unions in read/write based on metadata version. I refactored TestFileWriter very heavily as

[GitHub] [arrow] jorisvandenbossche commented on pull request #7675: ARROW-9353: [Python][CI] Disable known failures in dask integration tests

2020-07-08 Thread GitBox
jorisvandenbossche commented on pull request #7675: URL: https://github.com/apache/arrow/pull/7675#issuecomment-655639520 @github-actions crossbow submit test-conda-python-3.7-dask-latest test-conda-python-3.8-dask-master

[GitHub] [arrow] bkietz commented on a change in pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-08 Thread GitBox
bkietz commented on a change in pull request #7608: URL: https://github.com/apache/arrow/pull/7608#discussion_r451690654 ## File path: cpp/src/arrow/dataset/partition.cc ## @@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory { } } -

[GitHub] [arrow] pitrou commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
pitrou commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451686319 ## File path: cpp/src/arrow/util/bit_block_counter.h ## @@ -76,6 +76,9 @@ struct BitBlockCount { bool AllSet() const { return this->length ==

[GitHub] [arrow] pitrou commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
pitrou commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451684644 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic_internal.h ## @@ -0,0 +1,303 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] pitrou commented on a change in pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
pitrou commented on a change in pull request #7607: URL: https://github.com/apache/arrow/pull/7607#discussion_r451683288 ## File path: cpp/src/arrow/compute/registry.cc ## @@ -115,6 +116,19 @@ static std::unique_ptr CreateBuiltInRegistry() {

[GitHub] [arrow] jorisvandenbossche commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-08 Thread GitBox
jorisvandenbossche commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-655622278 If we agree we can "break" spark's tests like this, @emkornfield can you maybe skip the specific test in the spark integration tests, so the we notice if other tests

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-08 Thread GitBox
jorisvandenbossche commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r451646062 ## File path: python/pyarrow/_dataset.pyx ## @@ -881,10 +881,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] bkietz commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-08 Thread GitBox
bkietz commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r451637966 ## File path: python/pyarrow/_dataset.pyx ## @@ -881,10 +881,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] github-actions[bot] commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655595051 Revision: 5053400cb08816ae193a3086c461463f4edca033 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kszucs commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
kszucs commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655593959 @xhochy submitted a crossbow build to test that the upstream patch works This is an automated message from the

[GitHub] [arrow] kszucs commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
kszucs commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655593772 @github-actions crossbow submit test-conda-python-3.7-turbodbc-master This is an automated message from the Apache

[GitHub] [arrow] github-actions[bot] commented on pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7684: URL: https://github.com/apache/arrow/pull/7684#issuecomment-655592710 https://issues.apache.org/jira/browse/ARROW-9374 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

2020-07-08 Thread GitBox
kszucs commented on a change in pull request #7684: URL: https://github.com/apache/arrow/pull/7684#discussion_r451626479 ## File path: python/pyarrow/tests/test_array.py ## @@ -297,6 +297,24 @@ def test_nulls(ty): assert arr.type == ty

[GitHub] [arrow] kszucs opened a new pull request #7684: ARROW-9374: [C++][Python] Expose MakeArrayFromScalar [WIP]

2020-07-08 Thread GitBox
kszucs opened a new pull request #7684: URL: https://github.com/apache/arrow/pull/7684 Since we have a complete scalar implementation on the python side, we can implement `pa.repeat(value, size=n)` This is an automated

[GitHub] [arrow] romainfrancois commented on pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-08 Thread GitBox
romainfrancois commented on pull request #7660: URL: https://github.com/apache/arrow/pull/7660#issuecomment-655584137 So now either : - we have a compatible list, i.e. a list of raw vectors and we request `type = binary/large_binary/fixed_size_binary`. - we have R objects (that

[GitHub] [arrow] wesm commented on a change in pull request #7679: ARROW-9350: [C++] Fix Valgrind failures

2020-07-08 Thread GitBox
wesm commented on a change in pull request #7679: URL: https://github.com/apache/arrow/pull/7679#discussion_r451619897 ## File path: cpp/src/arrow/compute/kernel.cc ## @@ -46,23 +46,16 @@ namespace compute { // KernelContext Result> KernelContext::Allocate(int64_t nbytes)

[GitHub] [arrow] pitrou commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
pitrou commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-655576296 I see similar benchmark results on an AMD Ryzen (AVX2). This is an automated message from the Apache Git Service.

[GitHub] [arrow] romainfrancois commented on pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-08 Thread GitBox
romainfrancois commented on pull request #7660: URL: https://github.com/apache/arrow/pull/7660#issuecomment-655574225 Some more progress today. ``` r library(arrow, warn.conflicts = FALSE) # with explicit type= # no deduction, but testing raws <-

[GitHub] [arrow] pitrou commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
pitrou commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-655570194 The approach looks ok to me. I'll take a closer look later. This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on a change in pull request #7679: ARROW-9350: [C++] Fix Valgrind failures

2020-07-08 Thread GitBox
pitrou commented on a change in pull request #7679: URL: https://github.com/apache/arrow/pull/7679#discussion_r451605705 ## File path: cpp/src/arrow/compute/kernel.cc ## @@ -46,23 +46,16 @@ namespace compute { // KernelContext Result> KernelContext::Allocate(int64_t

[GitHub] [arrow] wesm commented on a change in pull request #7679: ARROW-9350: [C++] Fix Valgrind failures

2020-07-08 Thread GitBox
wesm commented on a change in pull request #7679: URL: https://github.com/apache/arrow/pull/7679#discussion_r451603218 ## File path: cpp/src/arrow/compute/kernel.cc ## @@ -46,23 +46,16 @@ namespace compute { // KernelContext Result> KernelContext::Allocate(int64_t nbytes)

[GitHub] [arrow] github-actions[bot] commented on pull request #7683: ARROW-9326: [Python] Remove setuptools pinning

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7683: URL: https://github.com/apache/arrow/pull/7683#issuecomment-655565819 https://issues.apache.org/jira/browse/ARROW-9326 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou opened a new pull request #7683: ARROW-9326: [Python] Remove setuptools pinning

2020-07-08 Thread GitBox
pitrou opened a new pull request #7683: URL: https://github.com/apache/arrow/pull/7683 It seems this was only needed as a temporary measure. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kszucs commented on a change in pull request #7661: ARROW-9020: [Python] read_json won't respect explicit_schema in parse_options

2020-07-08 Thread GitBox
kszucs commented on a change in pull request #7661: URL: https://github.com/apache/arrow/pull/7661#discussion_r451586029 ## File path: python/pyarrow/_json.pyx ## @@ -91,19 +92,29 @@ cdef class ParseOptions: newlines_in_values: bool, optional (default False)

[GitHub] [arrow] kszucs commented on a change in pull request #7661: ARROW-9020: [Python] read_json won't respect explicit_schema in parse_options

2020-07-08 Thread GitBox
kszucs commented on a change in pull request #7661: URL: https://github.com/apache/arrow/pull/7661#discussion_r451585808 ## File path: python/pyarrow/_json.pyx ## @@ -91,19 +92,29 @@ cdef class ParseOptions: newlines_in_values: bool, optional (default False)

[GitHub] [arrow] pitrou commented on pull request #7682: ARROW-9373: [C++] Fix Parquet crash on invalid input (OSS-Fuzz)

2020-07-08 Thread GitBox
pitrou commented on pull request #7682: URL: https://github.com/apache/arrow/pull/7682#issuecomment-655547849 #7664 needs to be merged first because of updates to the testing repo. This is an automated message from the

[GitHub] [arrow] kszucs edited a comment on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
kszucs edited a comment on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655546764 Created a PR upstream https://github.com/blue-yonder/turbodbc/pull/273 This is an automated message from

[GitHub] [arrow] kszucs commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
kszucs commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655546764 Created a PR upstream. This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] wesm closed pull request #7644: ARROW-9330: [C++] Fix crash and undefined behaviour on corrupt IPC input

2020-07-08 Thread GitBox
wesm closed pull request #7644: URL: https://github.com/apache/arrow/pull/7644 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
xhochy commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655543156 > Sure. Would you prefer that? Yes, keeping patches in the Arrow CI jobs is prone to get lost. This is an

[GitHub] [arrow] wesm closed pull request #7667: ARROW-9339: [Rust] Comments on SIMD in Arrow README are incorrect

2020-07-08 Thread GitBox
wesm closed pull request #7667: URL: https://github.com/apache/arrow/pull/7667 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kszucs commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
kszucs commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655541713 > > @xhochy we should push the patch upstream (and turbodbc has a couple of compile warnings as well). > > Can you just PR that against `turbodbc`? I can merge and release

[GitHub] [arrow] kszucs closed pull request #7681: ARROW-9334: [Dev][Archery] Push ancestor docker images

2020-07-08 Thread GitBox
kszucs closed pull request #7681: URL: https://github.com/apache/arrow/pull/7681 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean with runtime dispatch

2020-07-08 Thread GitBox
wesm commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-655537328 I'm tied up until mid-day today but plan to review this afternoon (US time) This is an automated message from the

[GitHub] [arrow] wesm commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-08 Thread GitBox
wesm commented on pull request #7664: URL: https://github.com/apache/arrow/pull/7664#issuecomment-655536712 I just sent an e-mail to the ML. We don't want to continue producing V4 metadata unless we need to for forward compatibility reasons.

[GitHub] [arrow] pitrou commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-08 Thread GitBox
pitrou commented on pull request #7664: URL: https://github.com/apache/arrow/pull/7664#issuecomment-655533965 I see, I will update the PR then. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] wesm commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-08 Thread GitBox
wesm commented on pull request #7664: URL: https://github.com/apache/arrow/pull/7664#issuecomment-655530738 @lidavidm @pitrou I haven't looked at the patch yet, but V5 must be the default produced by writers but V4 can be opted in to.

[GitHub] [arrow] wesm commented on pull request #7672: WIP ARROW-9348: [C++] Replace usages of TestBase::MakeRandomArray in testing/gtest_util.h with RandomArrayGenerator

2020-07-08 Thread GitBox
wesm commented on pull request #7672: URL: https://github.com/apache/arrow/pull/7672#issuecomment-655527753 I added WIP in the title so that CI builds don't run until the patch is more complete This is an automated message

[GitHub] [arrow] xhochy commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
xhochy commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655526767 > @xhochy we should push the patch upstream (and turbodbc has a couple of compile warnings as well). Can you just PR that against `turbodbc`? I can merge and release

[GitHub] [arrow] kszucs commented on pull request #7680: ARROW-9354: [C++] Turbodbc latest fails to build in the integration tests

2020-07-08 Thread GitBox
kszucs commented on pull request #7680: URL: https://github.com/apache/arrow/pull/7680#issuecomment-655525771 @xhochy we should push the patch upstream (and turbodbc has a couple of compile warnings as well). This is an

[GitHub] [arrow] kszucs closed pull request #7679: ARROW-9350: [C++] Fix Valgrind failures

2020-07-08 Thread GitBox
kszucs closed pull request #7679: URL: https://github.com/apache/arrow/pull/7679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7681: ARROW-9334: [Dev][Archery] Push ancestor docker images

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7681: URL: https://github.com/apache/arrow/pull/7681#issuecomment-655521824 https://issues.apache.org/jira/browse/ARROW-9334 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7682: ARROW-9373: [C++] Fix Parquet crash on invalid input (OSS-Fuzz)

2020-07-08 Thread GitBox
github-actions[bot] commented on pull request #7682: URL: https://github.com/apache/arrow/pull/7682#issuecomment-655521823 https://issues.apache.org/jira/browse/ARROW-9373 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou closed pull request #7673: ARROW-9363: [C++][Dataset] Preserve schema metadata in ParquetDatasetFactory

2020-07-08 Thread GitBox
pitrou closed pull request #7673: URL: https://github.com/apache/arrow/pull/7673 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou opened a new pull request #7682: ARROW-9373: [C++] Fix Parquet crash on invalid input (OSS-Fuzz)

2020-07-08 Thread GitBox
pitrou opened a new pull request #7682: URL: https://github.com/apache/arrow/pull/7682 Should fix the following issue: * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=24005 This is an automated message from the

[GitHub] [arrow] kszucs commented on pull request #7678: ARROW-9350: [C++][CI] Nightly valgrind job failures

2020-07-08 Thread GitBox
kszucs commented on pull request #7678: URL: https://github.com/apache/arrow/pull/7678#issuecomment-655519038 Closing. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] kszucs closed pull request #7678: ARROW-9350: [C++][CI] Nightly valgrind job failures

2020-07-08 Thread GitBox
kszucs closed pull request #7678: URL: https://github.com/apache/arrow/pull/7678 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   >