[jira] [Updated] (ARROW-12317) [Rust] JSON writer does not support time, date or interval types
[ https://issues.apache.org/jira/browse/ARROW-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12317: --- Labels: pull-request-available (was: ) > [Rust] JSON writer does not support time, date or interval types > > > Key: ARROW-12317 > URL: https://issues.apache.org/jira/browse/ARROW-12317 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andrew Lamb >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While working on https://issues.apache.org/jira/browse/ARROW-12267 , adding > support for writing Timestamp types, I noticed we were also lacking support > for other time types. Specifically, if you try to write an array with any of > the following types as JSON it will panic: > An example of adding support for timestamps is on > https://github.com/apache/arrow/pull/9968 > ``` > pub type Date32Array = PrimitiveArray; > pub type Date64Array = PrimitiveArray; > pub type Time32SecondArray = PrimitiveArray; > pub type Time32MillisecondArray = PrimitiveArray; > pub type Time64MicrosecondArray = PrimitiveArray; > pub type Time64NanosecondArray = PrimitiveArray; > pub type IntervalYearMonthArray = PrimitiveArray; > pub type IntervalDayTimeArray = PrimitiveArray; > pub type DurationSecondArray = PrimitiveArray; > pub type DurationMillisecondArray = PrimitiveArray; > pub type DurationMicrosecondArray = PrimitiveArray; > pub type DurationNanosecondArray = PrimitiveArray; > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11593) [Rust] Parquet does not support wasm32-unknown-unknown target
[ https://issues.apache.org/jira/browse/ARROW-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319022#comment-17319022 ] Dominik Moritz commented on ARROW-11593: That's awesome. Do you want to add a note to https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11615, which tracks DataFusion support for wasm? > [Rust] Parquet does not support wasm32-unknown-unknown target > - > > Key: ARROW-11593 > URL: https://issues.apache.org/jira/browse/ARROW-11593 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Dominik Moritz >Priority: Major > > The Arrow crate successfully compiles to WebAssembly (e.g. > https://github.com/domoritz/arrow-wasm) but the Parquet crate currently does > not support the`wasm32-unknown-unknown` target. > Try out the repository at > https://github.com/domoritz/parquet-wasm/commit/e877f9ad9c45c09f73d98fab2a8ad384a802b2e0. > The problem seems to be in liblz4, even if I do not include lz4 in the > feature flags. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-12269) [JS] Move to eslint
[ https://issues.apache.org/jira/browse/ARROW-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominik Moritz reassigned ARROW-12269: -- Assignee: Dominik Moritz > [JS] Move to eslint > --- > > Key: ARROW-12269 > URL: https://issues.apache.org/jira/browse/ARROW-12269 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Major > > Tslint is deprecated so we should switch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12269) [JS] Move to eslint
[ https://issues.apache.org/jira/browse/ARROW-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12269: --- Labels: pull-request-available (was: ) > [JS] Move to eslint > --- > > Key: ARROW-12269 > URL: https://issues.apache.org/jira/browse/ARROW-12269 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Tslint is deprecated so we should switch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results
[ https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318973#comment-17318973 ] Andy Grove commented on ARROW-12334: I'm now very confused about this issue. I have been working on debugging it and now it suddenly is working, so I don't know if it is an intermittent bug or not. When it works correctly, the query returns 4 rows and takes ~13 seconds for me. When it does not work it returns many times more rows and takes 3x as long. It would be good to get a second pair of eyes on this. > [Rust] [Ballista] Aggregate queries producing incorrect results > --- > > Key: ARROW-12334 > URL: https://issues.apache.org/jira/browse/ARROW-12334 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - Ballista >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 4.0.0 > > > I just ran benchmarks for the first time in a while and I see duplicate > entries for group by keys. > > For example, query 1 has "group by l_returnflag, l_linestatus" and I see > multiple results with l_returnflag = 'A' and l_linestatus = 'F'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11593) [Rust] Parquet does not support wasm32-unknown-unknown target
[ https://issues.apache.org/jira/browse/ARROW-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318970#comment-17318970 ] David Roher commented on ARROW-11593: - I just got a version of DataFusion working on wasm32-unknown-unknown – it required disabling both the LZ4 and ZSTD features on Parquet and tweaking the hash function: [https://github.com/apache/arrow/compare/master...droher:master] To add to [~AndyRedhead1974]'s point above, it would also be useful in a serverless context – for instance, Cloudflare Workers Unbound is in beta now and will allow WASM functions to run at unlimited CPU usage. in this context, DataFusion could be a serverless data lake engine like AWS Athena. Maybe it could even be useful as a Ballista worker. > [Rust] Parquet does not support wasm32-unknown-unknown target > - > > Key: ARROW-11593 > URL: https://issues.apache.org/jira/browse/ARROW-11593 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Dominik Moritz >Priority: Major > > The Arrow crate successfully compiles to WebAssembly (e.g. > https://github.com/domoritz/arrow-wasm) but the Parquet crate currently does > not support the`wasm32-unknown-unknown` target. > Try out the repository at > https://github.com/domoritz/parquet-wasm/commit/e877f9ad9c45c09f73d98fab2a8ad384a802b2e0. > The problem seems to be in liblz4, even if I do not include lz4 in the > feature flags. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318952#comment-17318952 ] Kouhei Sutou commented on ARROW-11135: -- I don't agree with the former. I agree with the latter. {{libgandiva_jni}} should not depend on other libraries (should be linked with other libraries statically). Could you try .jar at https://github.com/ursacomputing/crossbow/releases/tag/nightly-2021-04-09-0-github-gandiva-jar-osx ? We need to improve our release process to resolve them. The current our release process generates Java packages on release manager's environment: https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh The release manager for 3.0.0 used macOS. So arrow-gandiva 3.0.0 works only on macOS. We should build arrow-gandiva and native libraries (for macOS, Linux and Windows) for it on CI (we can use macOS, Linux and Windows on CI) and collect native libraries for all supported platforms into one arrow-gandiva.jar. Our release process should just pushes the built arrow-gandiva.jar instead of building arrow-gandiva.ja on release manager's machine. We'll release 4.0.0 soon. This improvement will not be included in 4.0.0 if no volunteers that work on this soon. > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0, 3.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12332) [Rust] [Ballista] Api server for scheduler
[ https://issues.apache.org/jira/browse/ARROW-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-12332: --- Summary: [Rust] [Ballista] Api server for scheduler (was: Api server for scheduler) > [Rust] [Ballista] Api server for scheduler > -- > > Key: ARROW-12332 > URL: https://issues.apache.org/jira/browse/ARROW-12332 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust - Ballista >Reporter: Sathis >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results
[ https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318951#comment-17318951 ] Andy Grove commented on ARROW-12334: I tracked down the PR that introduced the regression in the original repo and it was [https://github.com/ballista-compute/ballista/pull/574] > [Rust] [Ballista] Aggregate queries producing incorrect results > --- > > Key: ARROW-12334 > URL: https://issues.apache.org/jira/browse/ARROW-12334 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - Ballista >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 4.0.0 > > > I just ran benchmarks for the first time in a while and I see duplicate > entries for group by keys. > > For example, query 1 has "group by l_returnflag, l_linestatus" and I see > multiple results with l_returnflag = 'A' and l_linestatus = 'F'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-12313) [Rust] [Ballista] Benchmark documentation out of date
[ https://issues.apache.org/jira/browse/ARROW-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-12313. Resolution: Fixed Issue resolved by pull request 9990 [https://github.com/apache/arrow/pull/9990] > [Rust] [Ballista] Benchmark documentation out of date > - > > Key: ARROW-12313 > URL: https://issues.apache.org/jira/browse/ARROW-12313 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - Ballista >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The scheduler/executor were refactored and the documentation for the > benchmarks now needs updating. I plan on fixing this over the weekend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12335) [Rust] [Ballista] Bump DataFusion version
Andy Grove created ARROW-12335: -- Summary: [Rust] [Ballista] Bump DataFusion version Key: ARROW-12335 URL: https://issues.apache.org/jira/browse/ARROW-12335 Project: Apache Arrow Issue Type: Task Components: Rust - Ballista Reporter: Andy Grove Fix For: 4.0.0 Update Ballista to use latest DataFusion version -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12313) [Rust] [Ballista] Benchmark documentation out of date
[ https://issues.apache.org/jira/browse/ARROW-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12313: --- Labels: pull-request-available (was: ) > [Rust] [Ballista] Benchmark documentation out of date > - > > Key: ARROW-12313 > URL: https://issues.apache.org/jira/browse/ARROW-12313 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - Ballista >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The scheduler/executor were refactored and the documentation for the > benchmarks now needs updating. I plan on fixing this over the weekend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results
Andy Grove created ARROW-12334: -- Summary: [Rust] [Ballista] Aggregate queries producing incorrect results Key: ARROW-12334 URL: https://issues.apache.org/jira/browse/ARROW-12334 Project: Apache Arrow Issue Type: Bug Components: Rust - Ballista Reporter: Andy Grove Assignee: Andy Grove Fix For: 4.0.0 I just ran benchmarks for the first time in a while and I see duplicate entries for group by keys. For example, query 1 has "group by l_returnflag, l_linestatus" and I see multiple results with l_returnflag = 'A' and l_linestatus = 'F'. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-12274) [JS] Document how to run tests without building
[ https://issues.apache.org/jira/browse/ARROW-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-12274. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9983 [https://github.com/apache/arrow/pull/9983] > [JS] Document how to run tests without building > --- > > Key: ARROW-12274 > URL: https://issues.apache.org/jira/browse/ARROW-12274 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > https://github.com/apache/arrow/blob/master/js/DEVELOP.md does not document > that one can run `npm run test -- -t src`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12333) [JS] Remove jest-environment-node-debug and do not emit from typescript by default
[ https://issues.apache.org/jira/browse/ARROW-12333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12333: --- Labels: pull-request-available (was: ) > [JS] Remove jest-environment-node-debug and do not emit from typescript by > default > -- > > Key: ARROW-12333 > URL: https://issues.apache.org/jira/browse/ARROW-12333 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12333) [JS] Remove jest-environment-node-debug and do not emit from typescript by default
Dominik Moritz created ARROW-12333: -- Summary: [JS] Remove jest-environment-node-debug and do not emit from typescript by default Key: ARROW-12333 URL: https://issues.apache.org/jira/browse/ARROW-12333 Project: Apache Arrow Issue Type: Task Components: JavaScript Reporter: Dominik Moritz Assignee: Dominik Moritz -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-12281) [JS] Remove shx, trash, and rimraf
[ https://issues.apache.org/jira/browse/ARROW-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-12281. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9938 [https://github.com/apache/arrow/pull/9938] > [JS] Remove shx, trash, and rimraf > -- > > Key: ARROW-12281 > URL: https://issues.apache.org/jira/browse/ARROW-12281 > Project: Apache Arrow > Issue Type: Sub-task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We can use del instead -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated ARROW-11135: Affects Version/s: 3.0.0 > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0, 3.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318933#comment-17318933 ] Julian Hyde commented on ARROW-11135: - I think this issue boils down to two problems: * The install documentation should state that you need to install protobuf on macOS. That is the cause of the {{/usr/local/opt/protobuf/lib/libprotobuf.24.dylib}} error. * The artifacts in Maven Central only support macOS. They should support Linux and macOS. Do you agree? > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318931#comment-17318931 ] Julian Hyde edited comment on ARROW-11135 at 4/11/21, 7:33 PM: --- There are no missing packages. But the install instructions should probably say: * The Gandiva library only works on macOS, and requires that you manually install protobuf 2.5. By the way, I compared which files are in the 3.0.0 release jar (which works on macOS) and the 3.0.0-SNAPSHOT jar (which works on Linux). {noformat} $ diff -u <(tar tf ./arrow-gandiva-3.0.0-SNAPSHOT.jar | sort) <(tar tf ./arrow-gandiva-3.0.0.jar | sort) --- /dev/fd/63 2021-04-11 12:25:09.0 -0700 +++ /dev/fd/62 2021-04-11 12:25:09.0 -0700 @@ -11,7 +11,10 @@ META-INF/maven/org.apache.arrow.gandiva/arrow-gandiva/pom.xml Types.proto git.properties -libgandiva_jni.so +libgandiva_jni.300.0.0.dylib +libgandiva_jni.300.dylib +libgandiva_jni.a +libgandiva_jni.dylib org/ org/apache/ org/apache/arrow/ @@ -188,3 +191,8 @@ org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNode.class org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNodeOrBuilder.class org/apache/arrow/gandiva/ipc/GandivaTypes.class +release/ +release/libgandiva_jni.300.0.0.dylib +release/libgandiva_jni.300.dylib +release/libgandiva_jni.a +release/libgandiva_jni.dylib {noformat} It would be awesome if, in the next release, the jar contained ALL of those files, and then I suppose it would work on both Linux and macOS. was (Author: julianhyde): There are no missing packages. But the install instructions should probably say: * The Gandiva library only works on macOS, and requires that you manually install protobuf 2.5. By the way, I compared which files are in the 3.0.0 release jar (which works on macOS) and the 3.0.0-SNAPSHOT jar (which works on Linux). {noformat} $ diff -u <(tar tvf ./arrow-gandiva-3.0.0-SNAPSHOT.jar |awk '{print $NF}'|sort) <(tar tvf ./arrow-gandiva-3.0.0.jar |awk '{print $NF}'|sort) --- /dev/fd/63 2021-04-11 12:25:09.0 -0700 +++ /dev/fd/62 2021-04-11 12:25:09.0 -0700 @@ -11,7 +11,10 @@ META-INF/maven/org.apache.arrow.gandiva/arrow-gandiva/pom.xml Types.proto git.properties -libgandiva_jni.so +libgandiva_jni.300.0.0.dylib +libgandiva_jni.300.dylib +libgandiva_jni.a +libgandiva_jni.dylib org/ org/apache/ org/apache/arrow/ @@ -188,3 +191,8 @@ org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNode.class org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNodeOrBuilder.class org/apache/arrow/gandiva/ipc/GandivaTypes.class +release/ +release/libgandiva_jni.300.0.0.dylib +release/libgandiva_jni.300.dylib +release/libgandiva_jni.a +release/libgandiva_jni.dylib {noformat} It would be awesome if, in the next release, the jar contained ALL of those files, and then I suppose it would work on both Linux and macOS. > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors
[ https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318931#comment-17318931 ] Julian Hyde commented on ARROW-11135: - There are no missing packages. But the install instructions should probably say: * The Gandiva library only works on macOS, and requires that you manually install protobuf 2.5. By the way, I compared which files are in the 3.0.0 release jar (which works on macOS) and the 3.0.0-SNAPSHOT jar (which works on Linux). {noformat} $ diff -u <(tar tvf ./arrow-gandiva-3.0.0-SNAPSHOT.jar |awk '{print $NF}'|sort) <(tar tvf ./arrow-gandiva-3.0.0.jar |awk '{print $NF}'|sort) --- /dev/fd/63 2021-04-11 12:25:09.0 -0700 +++ /dev/fd/62 2021-04-11 12:25:09.0 -0700 @@ -11,7 +11,10 @@ META-INF/maven/org.apache.arrow.gandiva/arrow-gandiva/pom.xml Types.proto git.properties -libgandiva_jni.so +libgandiva_jni.300.0.0.dylib +libgandiva_jni.300.dylib +libgandiva_jni.a +libgandiva_jni.dylib org/ org/apache/ org/apache/arrow/ @@ -188,3 +191,8 @@ org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNode.class org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNodeOrBuilder.class org/apache/arrow/gandiva/ipc/GandivaTypes.class +release/ +release/libgandiva_jni.300.0.0.dylib +release/libgandiva_jni.300.dylib +release/libgandiva_jni.a +release/libgandiva_jni.dylib {noformat} It would be awesome if, in the next release, the jar contained ALL of those files, and then I suppose it would work on both Linux and macOS. > Using Maven Central artifacts as dependencies produce runtime errors > > > Key: ARROW-11135 > URL: https://issues.apache.org/jira/browse/ARROW-11135 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 2.0.0 >Reporter: Michael Mior >Priority: Major > > I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the > integration is working well, but I'm having issues . As [suggested on the > mailing > list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E], > using Dremio's public artifacts solves the problem. Between two Apache > projects however, there would be strong preference to use Apache artifacts as > a dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12332) Api server for scheduler
[ https://issues.apache.org/jira/browse/ARROW-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12332: --- Labels: pull-request-available (was: ) > Api server for scheduler > > > Key: ARROW-12332 > URL: https://issues.apache.org/jira/browse/ARROW-12332 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust - Ballista >Reporter: Sathis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12332) Api server for scheduler
Sathis created ARROW-12332: -- Summary: Api server for scheduler Key: ARROW-12332 URL: https://issues.apache.org/jira/browse/ARROW-12332 Project: Apache Arrow Issue Type: New Feature Components: Rust - Ballista Reporter: Sathis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12316) [C++] Switch default memory allocator from jemalloc to mimalloc
[ https://issues.apache.org/jira/browse/ARROW-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318918#comment-17318918 ] Neal Richardson commented on ARROW-12316: - [~jonkeane] can you attach your reports? > [C++] Switch default memory allocator from jemalloc to mimalloc > --- > > Key: ARROW-12316 > URL: https://issues.apache.org/jira/browse/ARROW-12316 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 4.0.0 > > > Benchmarking shows that mimalloc seems to be faster on real workflows (at > least on macOS, still collecting data on Ubuntu). We could switch the default > memory pool cases so that mimalloc is preferred. > cc [~jonkeane] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12316) [C++] Switch default memory allocator from jemalloc to mimalloc
[ https://issues.apache.org/jira/browse/ARROW-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318907#comment-17318907 ] Uwe Korn commented on ARROW-12316: -- [~npr] Where can I find these benchmarks? > [C++] Switch default memory allocator from jemalloc to mimalloc > --- > > Key: ARROW-12316 > URL: https://issues.apache.org/jira/browse/ARROW-12316 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Neal Richardson >Priority: Major > Fix For: 4.0.0 > > > Benchmarking shows that mimalloc seems to be faster on real workflows (at > least on macOS, still collecting data on Ubuntu). We could switch the default > memory pool cases so that mimalloc is preferred. > cc [~jonkeane] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12260) [Website] [Rust] Announce Ballista donation
[ https://issues.apache.org/jira/browse/ARROW-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318904#comment-17318904 ] Andy Grove commented on ARROW-12260: https://github.com/apache/arrow-site/pull/100 > [Website] [Rust] Announce Ballista donation > --- > > Key: ARROW-12260 > URL: https://issues.apache.org/jira/browse/ARROW-12260 > Project: Apache Arrow > Issue Type: Task > Components: Website >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > Once the IP clearance vote passes and the PR has been merged, we should > announce the donation on the Arrow blog. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:58 PM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! Yet not sure if this implementation is stable because the order is from less significant bits. But it seems to be easy to change !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10920) [Rust] Segmentation fault in Arrow Parquet writer with huge arrays
[ https://issues.apache.org/jira/browse/ARROW-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-10920: --- Fix Version/s: (was: 4.0.0) > [Rust] Segmentation fault in Arrow Parquet writer with huge arrays > -- > > Key: ARROW-10920 > URL: https://issues.apache.org/jira/browse/ARROW-10920 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andy Grove >Priority: Major > > I stumbled across this by chance. I am not too surprised that this fails but > I would expect it to fail gracefully and not with a segmentation fault. > > {code:java} > use std::fs::File; > use std::sync::Arc; > use arrow::array::StringBuilder; > use arrow::datatypes::{DataType, Field, Schema}; > use arrow::error::Result; > use arrow::record_batch::RecordBatch; > use parquet::arrow::ArrowWriter; > fn main() -> Result<()> { > let schema = Schema::new(vec![ > Field::new("c0", DataType::Utf8, false), > Field::new("c1", DataType::Utf8, true), > ]); > let batch_size = 250; > let repeat_count = 140; > let file = File::create("/tmp/test.parquet")?; > let mut writer = ArrowWriter::try_new(file, Arc::new(schema.clone()), > None).unwrap(); > let mut c0_builder = StringBuilder::new(batch_size); > let mut c1_builder = StringBuilder::new(batch_size); > println!("Start of loop"); > for i in 0..batch_size { > let c0_value = format!("{:032}", i); > let c1_value = c0_value.repeat(repeat_count); > c0_builder.append_value(_value)?; > c1_builder.append_value(_value)?; > } > println!("Finish building c0"); > let c0 = Arc::new(c0_builder.finish()); > println!("Finish building c1"); > let c1 = Arc::new(c1_builder.finish()); > println!("Creating RecordBatch"); > let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![c0, c1])?; > // write the batch to parquet > println!("Writing RecordBatch"); > writer.write().unwrap(); > println!("Closing writer"); > writer.close().unwrap(); > Ok(()) > } > {code} > output: > {code:java} > Start of loop > Finish building c0 > Finish building c1 > Creating RecordBatch > Writing RecordBatch > Segmentation fault (core dumped) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11625) [Rust] [DataFusion] Move SortExec partition check to constructor
[ https://issues.apache.org/jira/browse/ARROW-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-11625: --- Fix Version/s: (was: 4.0.0) > [Rust] [DataFusion] Move SortExec partition check to constructor > > > Key: ARROW-11625 > URL: https://issues.apache.org/jira/browse/ARROW-11625 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Andy Grove >Priority: Major > > SortExec has the following error check at execution time and this could be > moved into the try_new constructor so the error check happens at planning > time instead. > > {code:java} > if 1 != self.input.output_partitioning().partition_count() { > return Err(DataFusionError::Internal( > "SortExec requires a single input partition".to_owned(), > )); > } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11016) [Rust] Parquet ArrayReader should allow reading a subset of row groups
[ https://issues.apache.org/jira/browse/ARROW-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-11016: --- Fix Version/s: (was: 4.0.0) > [Rust] Parquet ArrayReader should allow reading a subset of row groups > -- > > Key: ARROW-11016 > URL: https://issues.apache.org/jira/browse/ARROW-11016 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Andy Grove >Priority: Major > > Parquet ArrayReader currently only supports reading an entire file from start > to finish and does not allow selectively reading a subset of row groups. This > prevents us from parallelizing work across threads when processing a single > parquet file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11094) [Rust] [DataFusion] Implement Sort-Merge Join
[ https://issues.apache.org/jira/browse/ARROW-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-11094: --- Fix Version/s: (was: 4.0.0) > [Rust] [DataFusion] Implement Sort-Merge Join > - > > Key: ARROW-11094 > URL: https://issues.apache.org/jira/browse/ARROW-11094 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust - DataFusion >Reporter: Andy Grove >Priority: Major > > The current hash join works well when one side of the join can be loaded into > memory but cannot scale beyond the available RAM. > The advantage of implementing SMJ (Sort-Merge Join) is that we can sort the > left and right partitions, and write the intermediate results to disk, and > then stream both sides of the join by merging these sorted partitions and we > do not need to load one side into memory. At most, we need to load all > batches from both sides that contain the current join key values. > In order to reduce memory pressure we will want to limit the concurrency of > these sort operations. > We would still want to default to hash join when we know that the build-side > can fit into memory since it is more efficient than using a sort-merge join. > [https://en.wikipedia.org/wiki/Sort-merge_join] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11020) [Rust] [DataFusion] Implement better tests for ParquetExec
[ https://issues.apache.org/jira/browse/ARROW-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-11020: --- Fix Version/s: (was: 4.0.0) > [Rust] [DataFusion] Implement better tests for ParquetExec > -- > > Key: ARROW-11020 > URL: https://issues.apache.org/jira/browse/ARROW-11020 > Project: Apache Arrow > Issue Type: Test > Components: Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > Implement better tests for ParquetExec -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10884) [Rust] [DataFusion] Benchmark crate does not have a SIMD feature
[ https://issues.apache.org/jira/browse/ARROW-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-10884: --- Fix Version/s: (was: 4.0.0) > [Rust] [DataFusion] Benchmark crate does not have a SIMD feature > > > Key: ARROW-10884 > URL: https://issues.apache.org/jira/browse/ARROW-10884 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > > The benchmarks run without SIMD by default. We need to add a feature to the > Cargo.toml to enable SIMD. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12313) [Rust] [Ballista] Benchmark documentation out of date
[ https://issues.apache.org/jira/browse/ARROW-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-12313: --- Summary: [Rust] [Ballista] Benchmark documentation out of date (was: [Rust] [Ballista] Benchmark docuementation out of date) > [Rust] [Ballista] Benchmark documentation out of date > - > > Key: ARROW-12313 > URL: https://issues.apache.org/jira/browse/ARROW-12313 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - Ballista >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 4.0.0 > > > The scheduler/executor were refactored and the documentation for the > benchmarks now needs updating. I plan on fixing this over the weekend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11059) [Rust] [DataFusion] Implement extensible configuration mechanism
[ https://issues.apache.org/jira/browse/ARROW-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated ARROW-11059: --- Fix Version/s: (was: 4.0.0) > [Rust] [DataFusion] Implement extensible configuration mechanism > > > Key: ARROW-11059 > URL: https://issues.apache.org/jira/browse/ARROW-11059 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > We are getting to the point where there are multiple settings we could add to > operators to fine-tune performance. Custom operators provided by crates that > extend DataFusion may also need this capability. > I propose that we add support for key-value configuration options so that we > don't need to plumb through each new configuration setting that we add. > For example. I am about to start on a "coalesce batches" operator and I would > like a setting such as "coalesce.batch.size". > For built-in settings like this we can provide information such as > documentation and default values and generate documentation from this. > For example, here is how Spark defines configs: > {code:java} > val PARQUET_VECTORIZED_READER_ENABLED = > buildConf("spark.sql.parquet.enableVectorizedReader") > .doc("Enables vectorized parquet decoding.") > .version("2.0.0") > .booleanConf > .createWithDefault(true) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:47 PM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! Yet not sure if this implementation is stable because the order is from less significant bits. But it seems to be easy to change !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! Yet not sure if this implementation is stable. !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:40 PM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! yet no sure if this implementation is stable. !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:40 PM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! Yet not sure if this implementation is stable. !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! yet no sure if this implementation is stable. !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-12251) [Rust] [Ballista] Add Ballista tests to CI
[ https://issues.apache.org/jira/browse/ARROW-12251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-12251. Resolution: Fixed Issue resolved by pull request 9979 [https://github.com/apache/arrow/pull/9979] > [Rust] [Ballista] Add Ballista tests to CI > -- > > Key: ARROW-12251 > URL: https://issues.apache.org/jira/browse/ARROW-12251 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - Ballista >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Ballista is a standalone project (not part of the Arrow Rust workspace) and > therefore the tests will not run in CI without additional work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12331) [Rust] [Ballista] Make CI build work with snmalloc
Andy Grove created ARROW-12331: -- Summary: [Rust] [Ballista] Make CI build work with snmalloc Key: ARROW-12331 URL: https://issues.apache.org/jira/browse/ARROW-12331 Project: Apache Arrow Issue Type: Improvement Components: Rust - Ballista Reporter: Andy Grove Fix For: 4.0.0 Ballista was added to CI in [https://github.com/apache/arrow/pull/9979] but is building without default features due to snmalloc requiring cmake. An alternative approach would be to build with cc instead of cmake. See the above PR for conversation about this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark
[ https://issues.apache.org/jira/browse/ARROW-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-12330: --- Labels: pull-request-available (was: ) > [Developer] Restore values in counters column of Archery benchmark > -- > > Key: ARROW-12330 > URL: https://issues.apache.org/jira/browse/ARROW-12330 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Affects Versions: 3.0.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The issue is that ARROW-11189 always suppressed values in {{counters}} column > of Archery benchmark > {code:java} > % archery benchmark diff --benchmark-filter="SetBitsTo" --output=head2.json > HEAD HEAD~1 > ... > --- > Benchmark Time CPU Iterations UserCounters... > --- > SetBitsTo/28.15 ns 8.15 ns 81991087 > bytes_per_second=234.044M/s > SetBitsTo/16 7.78 ns 7.78 ns 89928878 > bytes_per_second=1.91429G/s > SetBitsTo/1024 13.9 ns 13.9 ns 50372172 > bytes_per_second=68.6182G/s > SetBitsTo/131072 3508 ns 3508 ns 199335 > bytes_per_second=34.7944G/s > -- > Non-regressions: (4) > -- > benchmark baselinecontender change % counters > SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975 {} > SetBitsTo/2 230.566 MiB/sec 234.044 MiB/sec 1.509 {} > SetBitsTo/131072 34.722 GiB/sec 34.794 GiB/sec 0.207 {} >SetBitsTo/1024 68.593 GiB/sec 68.618 GiB/sec 0.037 {} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark
[ https://issues.apache.org/jira/browse/ARROW-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated ARROW-12330: - Description: The issue is that ARROW-11189 always suppressed values in {{counters}} column of Archery benchmark {code:java} % archery benchmark diff --benchmark-filter="SetBitsTo" --output=head2.json HEAD HEAD~1 ... --- Benchmark Time CPU Iterations UserCounters... --- SetBitsTo/28.15 ns 8.15 ns 81991087 bytes_per_second=234.044M/s SetBitsTo/16 7.78 ns 7.78 ns 89928878 bytes_per_second=1.91429G/s SetBitsTo/1024 13.9 ns 13.9 ns 50372172 bytes_per_second=68.6182G/s SetBitsTo/131072 3508 ns 3508 ns 199335 bytes_per_second=34.7944G/s -- Non-regressions: (4) -- benchmark baselinecontender change % counters SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975 {} SetBitsTo/2 230.566 MiB/sec 234.044 MiB/sec 1.509 {} SetBitsTo/131072 34.722 GiB/sec 34.794 GiB/sec 0.207 {} SetBitsTo/1024 68.593 GiB/sec 68.618 GiB/sec 0.037 {} {code} was: The issue is that ARROW-11189 always suppressed values in {{counters}} column of Archery benchmark {code} % archery benchmark run --benchmark-filter="SetBitsTo" --output=head2.json HEAD HEAD~1 ... --- Benchmark Time CPU Iterations UserCounters... --- SetBitsTo/28.15 ns 8.15 ns 81991087 bytes_per_second=234.044M/s SetBitsTo/16 7.78 ns 7.78 ns 89928878 bytes_per_second=1.91429G/s SetBitsTo/1024 13.9 ns 13.9 ns 50372172 bytes_per_second=68.6182G/s SetBitsTo/131072 3508 ns 3508 ns 199335 bytes_per_second=34.7944G/s -- Non-regressions: (4) -- benchmark baselinecontender change % counters SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975 {} SetBitsTo/2 230.566 MiB/sec 234.044 MiB/sec 1.509 {} SetBitsTo/131072 34.722 GiB/sec 34.794 GiB/sec 0.207 {} SetBitsTo/1024 68.593 GiB/sec 68.618 GiB/sec 0.037 {} {code} > [Developer] Restore values in counters column of Archery benchmark > -- > > Key: ARROW-12330 > URL: https://issues.apache.org/jira/browse/ARROW-12330 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Affects Versions: 3.0.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Minor > Fix For: 4.0.0 > > > The issue is that ARROW-11189 always suppressed values in {{counters}} column > of Archery benchmark > {code:java} > % archery benchmark diff --benchmark-filter="SetBitsTo" --output=head2.json > HEAD HEAD~1 > ... > --- > Benchmark Time CPU Iterations UserCounters... > --- > SetBitsTo/28.15 ns 8.15 ns 81991087 > bytes_per_second=234.044M/s > SetBitsTo/16 7.78 ns 7.78 ns 89928878 > bytes_per_second=1.91429G/s > SetBitsTo/1024 13.9 ns 13.9 ns 50372172 > bytes_per_second=68.6182G/s > SetBitsTo/131072 3508 ns 3508 ns 199335 > bytes_per_second=34.7944G/s > -- > Non-regressions: (4) > -- > benchmark baselinecontender change % counters > SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975 {} > SetBitsTo/2 230.566 MiB/sec 234.044 MiB/sec 1.509 {} > SetBitsTo/131072 34.722 GiB/sec 34.794 GiB/sec 0.207 {} >SetBitsTo/1024 68.593 GiB/sec 68.618 GiB/sec 0.037 {} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark
[ https://issues.apache.org/jira/browse/ARROW-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki reassigned ARROW-12330: Assignee: Kazuaki Ishizaki > [Developer] Restore values in counters column of Archery benchmark > -- > > Key: ARROW-12330 > URL: https://issues.apache.org/jira/browse/ARROW-12330 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Affects Versions: 3.0.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Minor > Fix For: 4.0.0 > > > The issue is that ARROW-11189 always suppressed values in {{counters}} column > of Archery benchmark > {code} > % archery benchmark run --benchmark-filter="SetBitsTo" --output=head2.json > HEAD HEAD~1 > ... > --- > Benchmark Time CPU Iterations UserCounters... > --- > SetBitsTo/28.15 ns 8.15 ns 81991087 > bytes_per_second=234.044M/s > SetBitsTo/16 7.78 ns 7.78 ns 89928878 > bytes_per_second=1.91429G/s > SetBitsTo/1024 13.9 ns 13.9 ns 50372172 > bytes_per_second=68.6182G/s > SetBitsTo/131072 3508 ns 3508 ns 199335 > bytes_per_second=34.7944G/s > -- > Non-regressions: (4) > -- > benchmark baselinecontender change % counters > SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975 {} > SetBitsTo/2 230.566 MiB/sec 234.044 MiB/sec 1.509 {} > SetBitsTo/131072 34.722 GiB/sec 34.794 GiB/sec 0.207 {} >SetBitsTo/1024 68.593 GiB/sec 68.618 GiB/sec 0.037 {} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark
Kazuaki Ishizaki created ARROW-12330: Summary: [Developer] Restore values in counters column of Archery benchmark Key: ARROW-12330 URL: https://issues.apache.org/jira/browse/ARROW-12330 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Affects Versions: 3.0.0 Reporter: Kazuaki Ishizaki Fix For: 4.0.0 The issue is that ARROW-11189 always suppressed values in {{counters}} column of Archery benchmark {code} % archery benchmark run --benchmark-filter="SetBitsTo" --output=head2.json HEAD HEAD~1 ... --- Benchmark Time CPU Iterations UserCounters... --- SetBitsTo/28.15 ns 8.15 ns 81991087 bytes_per_second=234.044M/s SetBitsTo/16 7.78 ns 7.78 ns 89928878 bytes_per_second=1.91429G/s SetBitsTo/1024 13.9 ns 13.9 ns 50372172 bytes_per_second=68.6182G/s SetBitsTo/131072 3508 ns 3508 ns 199335 bytes_per_second=34.7944G/s -- Non-regressions: (4) -- benchmark baselinecontender change % counters SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975 {} SetBitsTo/2 230.566 MiB/sec 234.044 MiB/sec 1.509 {} SetBitsTo/131072 34.722 GiB/sec 34.794 GiB/sec 0.207 {} SetBitsTo/1024 68.593 GiB/sec 68.618 GiB/sec 0.037 {} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10744) [Python] Enable wheel deployment for Mac OS 11 Big Sur
[ https://issues.apache.org/jira/browse/ARROW-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318777#comment-17318777 ] Ismaël Mejía commented on ARROW-10744: -- Is support for Mac OS ARM64 part of this ticket or tracked by a different one? > [Python] Enable wheel deployment for Mac OS 11 Big Sur > -- > > Key: ARROW-10744 > URL: https://issues.apache.org/jira/browse/ARROW-10744 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: David de L. >Priority: Major > > It is currently quite tricky to get pyarrow to build on latest Mac > distributions. > Since GitHub runners > [support|https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners#supported-runners-and-hardware-resources] > Mac 11.0 Big Sur, could wheels be built for this OS in CD? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12318) [Rust][DataFusion] Add support for AVG(Timestamp) types
[ https://issues.apache.org/jira/browse/ARROW-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318712#comment-17318712 ] Andrew Lamb commented on ARROW-12318: - [~Dandandan] notes that PostgreSQL doesn't support SUM or AVG for timestamps: https://www.postgresql.org/docs/13/functions-aggregate.html so perhaps we should not support it in DataFusion either :thinking_face: > [Rust][DataFusion] Add support for AVG(Timestamp) types > --- > > Key: ARROW-12318 > URL: https://issues.apache.org/jira/browse/ARROW-12318 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: Andrew Lamb >Priority: Minor > > This is a follow on to ARROW-12277 > Background: Support for Min/Max/Sum/Count were added for > DataType::Timestamp(*) types in https://github.com/apache/arrow/pull/9970. > This ticket tracks adding support for Avg, which is slightly more involved as > currently Avg assumes the output type is always F64, and in this case I think > Avg(timestamp) should also be (timestamp). We should double check what > postgres does in this case and follow its example -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It rocks! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks (at least with uniform distributed data)! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks (at least with uniform distributed data)! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license at this point, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks (at least with uniform distributed data)! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license, see https://github.com/travisdowns/sort-bench/issues/1 > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks (at least with uniform distributed data)! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow. What do you think? I've added issue to his repo to add license, see https://github.com/travisdowns/sort-bench/issues/1 was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks (at least with uniform distributed data)! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow, which will require some more benchmarking, testing and code polishing. What do you think? > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:33 AM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks (at least with uniform distributed data)! !all_random_wholeRange.png|height=350,width=350! There is no license file in his repo, so I cannot share my experiments. There might be several ways to proceed. It looks it would be good to ask Travis to contribute to Arrow, which will require some more benchmarking, testing and code polishing. What do you think? was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks! !all_random_wholeRange.png|height=350,width=350! > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-12267) [Rust] JSON writer does not support timestamp types
[ https://issues.apache.org/jira/browse/ARROW-12267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Lamb resolved ARROW-12267. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 9968 [https://github.com/apache/arrow/pull/9968] > [Rust] JSON writer does not support timestamp types > --- > > Key: ARROW-12267 > URL: https://issues.apache.org/jira/browse/ARROW-12267 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Andrew Lamb >Assignee: Andrew Lamb >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Looks like the json writer.rs code in arrow doesn't support writing out > timestamps. When I tried to write out a `TimestampNanosecondArray` I got the > following error: > ``` > thread 'influxdb_ioxd::http::tests::test_query_json' panicked at 'Unsupported > datatype: Timestamp( > Nanosecond, > None, > )', > /Users/alamb/.cargo/git/checkouts/arrow-3a9cfebb6b7b2bdc/3e825a7/rust/arrow/src/json/writer.rs:326:13 > note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:25 AM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks! !all_random_wholeRange.png|height=350,width=350! was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks! !all_random_wholeRange.png|height=250,width=250!! > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:24 AM: Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks! !all_random_wholeRange.png|height=250,width=250!! was (Author: klykov): Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks! !all_random_wholeRange.png! > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707 ] Kirill Lykov commented on ARROW-10899: -- Thanks for the reference to the blog, I read all of his posts. I've checked with my benchmarks Travis' final radix_sort7 version, see below. It kind of rocks! !all_random_wholeRange.png! > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-10899) [C++] Investigate radix sort for integer arrays
[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Lykov updated ARROW-10899: - Attachment: all_random_wholeRange.png > [C++] Investigate radix sort for integer arrays > --- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png, all_random_wholeRange.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-12306) [Rust] Read CSV format text from stdin or memory
[ https://issues.apache.org/jira/browse/ARROW-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318668#comment-17318668 ] Siwei commented on ARROW-12306: --- Ok.I will do it. > [Rust] Read CSV format text from stdin or memory > > > Key: ARROW-12306 > URL: https://issues.apache.org/jira/browse/ARROW-12306 > Project: Apache Arrow > Issue Type: Wish > Components: Rust - DataFusion >Reporter: Siwei >Priority: Minor > > Hello, > I'm building a command line tool that can run SQL queries on text files (csv, > json-line ..) . But the `CsvExec` in datafusion can only read csv text from > files currently. I have checked its inner implantation the csv reader in > arrow, anything impl `Read` could be a valid input. > > Should this feature ( read csv from stdin) come with datafusion ? Or I just > make it into my own crate. -- This message was sent by Atlassian Jira (v8.3.4#803005)