Re: [C++][Compute] RFC: add SIMD support to C++ kernel

2020-03-19 Thread Yibo Cai
Thanks Wes for quick response. Yes, inlining can be a problem for runtime dispatcher. It means we should take care of the whole loop[1], not the code inside the loop[2]. This may lead to some traps to developer. [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bpacking.h#L3760

Re: [C++][Compute] RFC: add SIMD support to C++ kernel

2020-03-19 Thread Wes McKinney
hi Yibo, I agree with this, having #ifdef in many places in the codebase is not maintainable longer-term. As far as runtime dispatch, we could populate a function table of all machine-dependent functions once so then the dispatch isn't happening on each function. Or some similar strategy This

Re: [C++][Compute] RFC: add SIMD support to C++ kernel

2020-03-19 Thread Yibo Cai
I'm revisiting this old thread as I see some avx512 code merged recently[1]. Code maintenance will be non-trivial if we want to cover more hardware(sse/avx/avx512/neon/sve/...) and optimize more code in the future. #ifdef is obviously no-go. So I'm selling my proposal again :) - put all

[jira] [Created] (ARROW-8169) [Java] Improve the performance of JDBC adapter by allocating memory proactively

2020-03-19 Thread Liya Fan (Jira)
Liya Fan created ARROW-8169: --- Summary: [Java] Improve the performance of JDBC adapter by allocating memory proactively Key: ARROW-8169 URL: https://issues.apache.org/jira/browse/ARROW-8169 Project: Apache

[jira] [Created] (ARROW-8168) Improve Java Plasma client off-heap memory usage

2020-03-19 Thread KunshangJi (Jira)
KunshangJi created ARROW-8168: - Summary: Improve Java Plasma client off-heap memory usage Key: ARROW-8168 URL: https://issues.apache.org/jira/browse/ARROW-8168 Project: Apache Arrow Issue Type:

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-19-1

2020-03-19 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-19-1 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-19-1 Failed Tasks: - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-19-1-travis-gandiva-jar-osx -

[jira] [Created] (ARROW-8167) [CI] Add support for skipping builds via commit messages

2020-03-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8167: -- Summary: [CI] Add support for skipping builds via commit messages Key: ARROW-8167 URL: https://issues.apache.org/jira/browse/ARROW-8167 Project: Apache Arrow

[jira] [Created] (ARROW-8166) [C++] AVX512 intrinsics fail to compile with clang-8 on Ubuntu 18.04

2020-03-19 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8166: --- Summary: [C++] AVX512 intrinsics fail to compile with clang-8 on Ubuntu 18.04 Key: ARROW-8166 URL: https://issues.apache.org/jira/browse/ARROW-8166 Project: Apache

[jira] [Created] (ARROW-8165) [Packaging] Make nightly wheels available

2020-03-19 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8165: -- Summary: [Packaging] Make nightly wheels available Key: ARROW-8165 URL: https://issues.apache.org/jira/browse/ARROW-8165 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8164) [C++][Dataset] Let datasets be viewable with non-identical schema

2020-03-19 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8164: --- Summary: [C++][Dataset] Let datasets be viewable with non-identical schema Key: ARROW-8164 URL: https://issues.apache.org/jira/browse/ARROW-8164 Project: Apache Arrow

[jira] [Created] (ARROW-8163) [C++][Dataset] Allow FileSystemDataset's file list to be lazy

2020-03-19 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8163: --- Summary: [C++][Dataset] Allow FileSystemDataset's file list to be lazy Key: ARROW-8163 URL: https://issues.apache.org/jira/browse/ARROW-8163 Project: Apache Arrow

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

2020-03-19 Thread David Li
> That's why it's important that we set ourselves up to do performance testing > in a realistic environment in AWS rather than simulating it. For my clarification, what are the plans for this (if any)? I couldn't find any prior discussion, though it sounds like the discussion around cloud CI

[jira] [Created] (ARROW-8162) [Format][Python] Add serialization for CSF sparse tensors

2020-03-19 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-8162: - Summary: [Format][Python] Add serialization for CSF sparse tensors Key: ARROW-8162 URL: https://issues.apache.org/jira/browse/ARROW-8162 Project: Apache Arrow

[jira] [Created] (ARROW-8161) [C++][Gandiva] Consolidate the data generation code for benchmark tests in gandiva into arrow/testing

2020-03-19 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-8161: - Summary: [C++][Gandiva] Consolidate the data generation code for benchmark tests in gandiva into arrow/testing Key: ARROW-8161 URL:

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-19-0

2020-03-19 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-19-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-19-0 Failed Tasks: - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-19-0-travis-gandiva-jar-trusty -

[jira] [Created] (ARROW-8160) [FlightRPC][C++] DoPutPayloadWriter doesn't always expose server error message

2020-03-19 Thread David Li (Jira)
David Li created ARROW-8160: --- Summary: [FlightRPC][C++] DoPutPayloadWriter doesn't always expose server error message Key: ARROW-8160 URL: https://issues.apache.org/jira/browse/ARROW-8160 Project: Apache

[jira] [Created] (ARROW-8159) [Python] pyarrow.Schema.from_pandas doesn't support ExtensionDtype

2020-03-19 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-8159: --- Summary: [Python] pyarrow.Schema.from_pandas doesn't support ExtensionDtype Key: ARROW-8159 URL: https://issues.apache.org/jira/browse/ARROW-8159 Project: Apache Arrow