[GitHub] [arrow] github-actions[bot] commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647292816 Revision: 80cc7570cefe64a0cb20cd530da6241a59e4052a Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] xhochy commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-21 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647292136 @github-actions crossbow submit conda-linux-gcc-py36-cuda This is an automated message from the Apache Git

[GitHub] [arrow] cyb70289 commented on a change in pull request #7512: ARROW-9204: [C++][Flight] Change records_per_stream to int64

2020-06-21 Thread GitBox
cyb70289 commented on a change in pull request #7512: URL: https://github.com/apache/arrow/pull/7512#discussion_r443324836 ## File path: cpp/src/arrow/flight/flight_benchmark.cc ## @@ -42,7 +42,7 @@ DEFINE_int32(server_port, 31337, "The port to connect to");

[GitHub] [arrow] github-actions[bot] commented on pull request #7511: ARROW-9205: [Documentation] Fix typos in Columnar.rst

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7511: URL: https://github.com/apache/arrow/pull/7511#issuecomment-647288808 https://issues.apache.org/jira/browse/ARROW-9205 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7512: ARROW-9204: [C++][Flight] Change records_per_stream to int64

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7512: URL: https://github.com/apache/arrow/pull/7512#issuecomment-647288806 https://issues.apache.org/jira/browse/ARROW-9204 This is an automated message from the Apache Git

[GitHub] [arrow] cyb70289 opened a new pull request #7512: ARROW-9204: [C++][Flight] Change records_per_stream to int64

2020-06-21 Thread GitBox
cyb70289 opened a new pull request #7512: URL: https://github.com/apache/arrow/pull/7512 Set `records_per_stream` in flight benchmark to int64 to be consistent with protobuf definition. We can pass a very large value at runtime to keep benchmark running and ease performance profiling

[GitHub] [arrow] cyb70289 opened a new pull request #7511: ARROW-9205: [Documentation] Fix typos in Columnar.rst

2020-06-21 Thread GitBox
cyb70289 opened a new pull request #7511: URL: https://github.com/apache/arrow/pull/7511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-21 Thread GitBox
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r443306549 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -1289,6 +1338,10 @@ INSTANTIATE_TEST_SUITE_P(StreamDecoderSmallChunksRoundTripTests,

[GitHub] [arrow] emkornfield commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
emkornfield commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647255915 > But in general I don't think we should be using the "ursabot benchmark" results (which use gcc) to make conclusions about what perf optimizations are working Hmm,

[GitHub] [arrow] kiszk commented on pull request #7505: ARROW-9195: [Java] Fixed UNSAFE.get from bytearray usage

2020-06-21 Thread GitBox
kiszk commented on pull request #7505: URL: https://github.com/apache/arrow/pull/7505#issuecomment-647236139 Looks good This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] wesm edited a comment on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm edited a comment on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647233286 @emkornfield I agree. Realistically we're going to have to look at them both. FWIW, in this particular case it seems that the Clang performance is the most representative of

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647233286 @emkornfield I agree. Realistically we're going to have to look at them both. FWIW, in this particular case it seems that the Clang performance is the most representative of how

[GitHub] [arrow] github-actions[bot] commented on pull request #7510: ARROW-7012: [C++] Add comments explaining high level detail about ChunkedArray class and questions about chunk sizes

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7510: URL: https://github.com/apache/arrow/pull/7510#issuecomment-647233266 https://issues.apache.org/jira/browse/ARROW-7012 This is an automated message from the Apache Git

[GitHub] [arrow] wesm opened a new pull request #7510: ARROW-7012: [C++] Add comments explaining high level detail about ChunkedArray class and questions about chunk sizes

2020-06-21 Thread GitBox
wesm opened a new pull request #7510: URL: https://github.com/apache/arrow/pull/7510 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] emkornfield commented on pull request #7314: ARROW-8996: [C++] AVX2/AVX512 runtime support for aggregate sum kernel

2020-06-21 Thread GitBox
emkornfield commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-647228634 I think we also need a way of setting max runtime instruction set for runtime dispatch (apologies if there is one and I missed it)

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-21 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647226180 > There is one loose end, the growth of the string can cause a utf8 array to be promoted to a large_utf8. I'd like to treat in-kernel type promotions as an anti-pattern in

[GitHub] [arrow] emkornfield commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
emkornfield commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647224569 The gcc vs clang performance has come up a few times. On the SIMD thread on the mailing list, I mentioned trying to standardise on a compiler at least in Linux so we can

[GitHub] [arrow] github-actions[bot] commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647210679 https://issues.apache.org/jira/browse/ARROW-9203 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647208647 Revision: 1bc76450a182388c532412e51454899a22a078e6 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kou commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-21 Thread GitBox
kou commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647208260 @github-actions crossbow submit debian-* ubuntu-* This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kou opened a new pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-21 Thread GitBox
kou opened a new pull request #7509: URL: https://github.com/apache/arrow/pull/7509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7508: ARROW-9202: [GLib] Add GArrowDatum

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7508: URL: https://github.com/apache/arrow/pull/7508#issuecomment-647207246 https://issues.apache.org/jira/browse/ARROW-9202 This is an automated message from the Apache Git

[GitHub] [arrow] kou opened a new pull request #7508: ARROW-9202: [GLib] Add GArrowDatum

2020-06-21 Thread GitBox
kou opened a new pull request #7508: URL: https://github.com/apache/arrow/pull/7508 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647189175 @nealrichardson @romainfrancois check out the `arrow::internal::IntegersCanFit` function within to help with int64->int32 narrowing in R

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647188209 Pretty massive speedups with MSVC 2017 (mimalloc allocator): https://gist.github.com/wesm/c7efa656ab0a4bd789e6029e5f791417/revisions?diff=split The 2nd revision is the

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647187062 Yeah, I'm trapped between two things that don't do what I need. `archery benchmark diff` doesn't print the results (AFAICT?) in a presentable way -- I opened ARROW-9201 about that

[GitHub] [arrow] projjal commented on pull request #7505: ARROW-9195: [Java] Fixed UNSAFE.get from bytearray usage

2020-06-21 Thread GitBox
projjal commented on pull request #7505: URL: https://github.com/apache/arrow/pull/7505#issuecomment-647180183 > How about adding test cases to cause a failure by the original code? done This is an automated message

[GitHub] [arrow] fsaintjacques commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
fsaintjacques commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647179697 Archery does via —cc and —Cxx, but ursabot doesn’t supports it. It’s probably just a matter of forwarding correctly argv.

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647179099 @kszucs @fsaintjacques is there a way to ask the benchmark differ to use clang-8? This is an automated message from

[GitHub] [arrow] ursabot commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
ursabot commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647178999 ``` Usage: @ursabot benchmark [OPTIONS] [] Run the benchmark suite in comparison mode. This command will run the benchmark suite for tip of the branch commit

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647178998 @ursabot benchmark --help This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647177191 Wow, really crazy performance investigation. The old code performs badly with clang-8 but actually very well with gcc-8 (maybe some better vectorization)? By contrast the new code

[GitHub] [arrow] houqp commented on pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-21 Thread GitBox
houqp commented on pull request #7466: URL: https://github.com/apache/arrow/pull/7466#issuecomment-647171524 @paddyhoran rebased on top of latest master with merge of #7464, ready for review. This is an automated message

[GitHub] [arrow] github-actions[bot] commented on pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-647171266 https://issues.apache.org/jira/browse/ARROW-8797 This is an automated message from the Apache Git

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-21 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-647170774 Comments are appreciated. In particular, `arrow::schema` and generation of RecordBatch using non-native endian representation cc @pitrou @kou

[GitHub] [arrow] kiszk commented on pull request #7505: ARROW-9195: [Java] Fixed UNSAFE.get from bytearray usage

2020-06-21 Thread GitBox
kiszk commented on pull request #7505: URL: https://github.com/apache/arrow/pull/7505#issuecomment-647169606 How about adding test cases to cause a failure by the original code? This is an automated message from the Apache

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647168887 I'll investigate the perf regressions This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] ursabot commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
ursabot commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647167504 [AMD64 Ubuntu 18.04 C++ Benchmark (#113872)](https://ci.ursalabs.org/#builders/73/builds/88) builder has been succeeded. Revision: 1f1f5535d2fb2e0e947d4b203b518d026acd8c2e

[GitHub] [arrow] github-actions[bot] commented on pull request #7507: [WIP] ARROW-8797: [C++] Create test to receive RecordBatch for different endian

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-647166030 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] kiszk opened a new pull request #7507: [WIP] ARROW-8797: [C++] Create test to receive RecordBatch for different endian

2020-06-21 Thread GitBox
kiszk opened a new pull request #7507: URL: https://github.com/apache/arrow/pull/7507 This PR creates a test to receive RecordBatch for different endian (e.g. receive RecordBatch with big-endian schema on little-endian platform). This PR changes - Introduce Endianness enum class

[GitHub] [arrow] wesm commented on a change in pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on a change in pull request #7506: URL: https://github.com/apache/arrow/pull/7506#discussion_r443244978 ## File path: cpp/src/arrow/compute/kernels/codegen_internal.h ## @@ -215,27 +215,23 @@ template struct BoxScalar> { using T = typename GetOutputType::T;

[GitHub] [arrow] github-actions[bot] commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647164457 https://issues.apache.org/jira/browse/ARROW-9197 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647164468 @ursabot benchmark --benchmark-filter=Cast 6538173 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] wesm opened a new pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-21 Thread GitBox
wesm opened a new pull request #7506: URL: https://github.com/apache/arrow/pull/7506 Bunch of stuff in this PR: * Speed up safe integer/floating<->integer/floating casts, especially when they are mostly not null * Compiled size of scalar_cast_numeric.cc.o is down to 736KB from

[GitHub] [arrow] wesm commented on pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-21 Thread GitBox
wesm commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647162905 Seems reasonable, I or @pitrou will have to take a closer look. Can we centralize the benchmarks, though, so we don't have benchmarks for stuff in src/arrow/util in src/gandiva?

[GitHub] [arrow] paddyhoran closed pull request #7480: ARROW-9176: [Rust] Fix for memory leaks in Arrow allocator

2020-06-21 Thread GitBox
paddyhoran closed pull request #7480: URL: https://github.com/apache/arrow/pull/7480 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] paddyhoran commented on a change in pull request #7480: ARROW-9176: [Rust] Fix for memory leaks in Arrow allocator

2020-06-21 Thread GitBox
paddyhoran commented on a change in pull request #7480: URL: https://github.com/apache/arrow/pull/7480#discussion_r443242626 ## File path: rust/arrow/src/memory.rs ## @@ -150,25 +150,32 @@ pub fn allocate_aligned(size: usize) -> *mut u8 { } pub unsafe fn free_aligned(ptr:

[GitHub] [arrow] paddyhoran closed pull request #7464: ARROW-9157: [Rust][Datafusion] create_physical_plan should take self as immutable reference

2020-06-21 Thread GitBox
paddyhoran closed pull request #7464: URL: https://github.com/apache/arrow/pull/7464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #7505: ARROW-9195: [Java] Fixed UNSAFE.get from bytearray usage

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7505: URL: https://github.com/apache/arrow/pull/7505#issuecomment-647157827 https://issues.apache.org/jira/browse/ARROW-9195 This is an automated message from the Apache Git

[GitHub] [arrow] projjal opened a new pull request #7505: ARROW-9195: [Java] Fixed UNSAFE.get from bytearray usage

2020-06-21 Thread GitBox
projjal opened a new pull request #7505: URL: https://github.com/apache/arrow/pull/7505 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647156126 https://issues.apache.org/jira/browse/ARROW-9193 This is an automated message from the Apache Git

[GitHub] [arrow] projjal commented on pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-21 Thread GitBox
projjal commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647155468 Hi @wesm can you take a look at this. It seems that the ParseTimestampStrptime function was recently changed from taking c-style null terminated string input to c++ style. If the

[GitHub] [arrow] projjal opened a new pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-21 Thread GitBox
projjal opened a new pull request #7504: URL: https://github.com/apache/arrow/pull/7504 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-21 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647084184 Revision: 10b4aaf1e61b0a91e02f02dd160e8b39b39fce89 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] xhochy commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-21 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647084057 @github-actions crossbow submit conda-linux-gcc-py36-cuda This is an automated message from the Apache Git