date:20210411

[jira] [Updated] (ARROW-12317) [Rust] JSON writer does not support time, date or interval types

2021-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12317:
---
Labels: pull-request-available  (was: )

> [Rust] JSON writer does not support time, date or interval types
> 
>
> Key: ARROW-12317
> URL: https://issues.apache.org/jira/browse/ARROW-12317
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While working on https://issues.apache.org/jira/browse/ARROW-12267 , adding 
> support for writing Timestamp types, I noticed we were also lacking support 
> for other time types. Specifically, if you try to write an array with any of 
> the following types as JSON it will panic:
> An example of adding support for timestamps is on 
> https://github.com/apache/arrow/pull/9968
> ```
> pub type Date32Array = PrimitiveArray;
> pub type Date64Array = PrimitiveArray;
> pub type Time32SecondArray = PrimitiveArray;
> pub type Time32MillisecondArray = PrimitiveArray;
> pub type Time64MicrosecondArray = PrimitiveArray;
> pub type Time64NanosecondArray = PrimitiveArray;
> pub type IntervalYearMonthArray = PrimitiveArray;
> pub type IntervalDayTimeArray = PrimitiveArray;
> pub type DurationSecondArray = PrimitiveArray;
> pub type DurationMillisecondArray = PrimitiveArray;
> pub type DurationMicrosecondArray = PrimitiveArray;
> pub type DurationNanosecondArray = PrimitiveArray;
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11593) [Rust] Parquet does not support wasm32-unknown-unknown target

2021-04-11 Thread Dominik Moritz (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319022#comment-17319022
 ] 

Dominik Moritz commented on ARROW-11593:


That's awesome. Do you want to add a note to 
https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11615, which tracks 
DataFusion support for wasm?

> [Rust] Parquet does not support wasm32-unknown-unknown target
> -
>
> Key: ARROW-11593
> URL: https://issues.apache.org/jira/browse/ARROW-11593
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Dominik Moritz
>Priority: Major
>
> The Arrow crate successfully compiles to WebAssembly (e.g. 
> https://github.com/domoritz/arrow-wasm) but the Parquet crate currently does 
> not support the`wasm32-unknown-unknown` target. 
> Try out the repository at 
> https://github.com/domoritz/parquet-wasm/commit/e877f9ad9c45c09f73d98fab2a8ad384a802b2e0.
>  The problem seems to be in liblz4, even if I do not include lz4 in the 
> feature flags.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-12269) [JS] Move to eslint

2021-04-11 Thread Dominik Moritz (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz reassigned ARROW-12269:
--

Assignee: Dominik Moritz

> [JS] Move to eslint
> ---
>
> Key: ARROW-12269
> URL: https://issues.apache.org/jira/browse/ARROW-12269
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>
> Tslint is deprecated so we should switch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12269) [JS] Move to eslint

2021-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12269:
---
Labels: pull-request-available  (was: )

> [JS] Move to eslint
> ---
>
> Key: ARROW-12269
> URL: https://issues.apache.org/jira/browse/ARROW-12269
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tslint is deprecated so we should switch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-11 Thread Andy Grove (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318973#comment-17318973
 ] 

Andy Grove commented on ARROW-12334:


I'm now very confused about this issue. I have been working on debugging it and 
now it suddenly is working, so I don't know if it is an intermittent bug or 
not. When it works correctly, the query returns 4 rows and takes ~13 seconds 
for me. When it does not work it returns many times more rows and takes 3x as 
long.

It would be good to get a second pair of eyes on this.

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 4.0.0
>
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11593) [Rust] Parquet does not support wasm32-unknown-unknown target

2021-04-11 Thread David Roher (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318970#comment-17318970
 ] 

David Roher commented on ARROW-11593:
-

I just got a version of DataFusion working on wasm32-unknown-unknown – it 
required disabling both the LZ4 and ZSTD features on Parquet and tweaking the 
hash function: [https://github.com/apache/arrow/compare/master...droher:master]

To add to [~AndyRedhead1974]'s point above, it would also be useful in a 
serverless context – for instance, Cloudflare Workers Unbound is in beta now 
and will allow WASM functions to run at unlimited CPU usage. in this context, 
DataFusion could be a serverless data lake engine like AWS Athena. Maybe it 
could even be useful as a Ballista worker.

> [Rust] Parquet does not support wasm32-unknown-unknown target
> -
>
> Key: ARROW-11593
> URL: https://issues.apache.org/jira/browse/ARROW-11593
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Dominik Moritz
>Priority: Major
>
> The Arrow crate successfully compiles to WebAssembly (e.g. 
> https://github.com/domoritz/arrow-wasm) but the Parquet crate currently does 
> not support the`wasm32-unknown-unknown` target. 
> Try out the repository at 
> https://github.com/domoritz/parquet-wasm/commit/e877f9ad9c45c09f73d98fab2a8ad384a802b2e0.
>  The problem seems to be in liblz4, even if I do not include lz4 in the 
> feature flags.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors

2021-04-11 Thread Kouhei Sutou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318952#comment-17318952
 ] 

Kouhei Sutou commented on ARROW-11135:
--

I don't agree with the former.

I agree with the latter.

{{libgandiva_jni}} should not depend on other libraries (should be linked with 
other libraries statically). Could you try .jar at 
https://github.com/ursacomputing/crossbow/releases/tag/nightly-2021-04-09-0-github-gandiva-jar-osx
 ?

We need to improve our release process to resolve them. The current our release 
process generates Java packages on release manager's environment: 
https://github.com/apache/arrow/blob/master/dev/release/01-perform.sh
The release manager for 3.0.0 used macOS. So arrow-gandiva 3.0.0 works only on 
macOS.

We should build arrow-gandiva and native libraries (for macOS, Linux and 
Windows) for it on CI (we can use macOS, Linux and Windows on CI) and collect 
native libraries for all supported platforms into one arrow-gandiva.jar. Our 
release process should just pushes the built arrow-gandiva.jar instead of 
building arrow-gandiva.ja on release manager's machine.

We'll release 4.0.0 soon. This improvement will not be included in 4.0.0 if no 
volunteers that work on this soon.

> Using Maven Central artifacts as dependencies produce runtime errors
> 
>
> Key: ARROW-11135
> URL: https://issues.apache.org/jira/browse/ARROW-11135
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Michael Mior
>Priority: Major
>
> I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the 
> integration is working well, but I'm having issues . As [suggested on the 
> mailing 
> list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E],
>  using Dremio's public artifacts solves the problem. Between two Apache 
> projects however, there would be strong preference to use Apache artifacts as 
> a dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12332) [Rust] [Ballista] Api server for scheduler

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-12332:
---
Summary: [Rust] [Ballista] Api server for scheduler  (was: Api server for 
scheduler)

> [Rust] [Ballista] Api server for scheduler
> --
>
> Key: ARROW-12332
> URL: https://issues.apache.org/jira/browse/ARROW-12332
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - Ballista
>Reporter: Sathis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-11 Thread Andy Grove (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318951#comment-17318951
 ] 

Andy Grove commented on ARROW-12334:


I tracked down the PR that introduced the regression in the original repo and 
it was [https://github.com/ballista-compute/ballista/pull/574]

> [Rust] [Ballista] Aggregate queries producing incorrect results
> ---
>
> Key: ARROW-12334
> URL: https://issues.apache.org/jira/browse/ARROW-12334
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 4.0.0
>
>
> I just ran benchmarks for the first time in a while and I see duplicate 
> entries for group by keys.
>  
> For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
> multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-12313) [Rust] [Ballista] Benchmark documentation out of date

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-12313.

Resolution: Fixed

Issue resolved by pull request 9990
[https://github.com/apache/arrow/pull/9990]

> [Rust] [Ballista] Benchmark documentation out of date
> -
>
> Key: ARROW-12313
> URL: https://issues.apache.org/jira/browse/ARROW-12313
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The scheduler/executor were refactored and the documentation for the 
> benchmarks now needs updating. I plan on fixing this over the weekend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-12335) [Rust] [Ballista] Bump DataFusion version

2021-04-11 Thread Andy Grove (Jira)

Andy Grove created ARROW-12335:
--

 Summary: [Rust] [Ballista] Bump DataFusion version
 Key: ARROW-12335
 URL: https://issues.apache.org/jira/browse/ARROW-12335
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust - Ballista
Reporter: Andy Grove
 Fix For: 4.0.0


Update Ballista to use latest DataFusion version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12313) [Rust] [Ballista] Benchmark documentation out of date

2021-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12313:
---
Labels: pull-request-available  (was: )

> [Rust] [Ballista] Benchmark documentation out of date
> -
>
> Key: ARROW-12313
> URL: https://issues.apache.org/jira/browse/ARROW-12313
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The scheduler/executor were refactored and the documentation for the 
> benchmarks now needs updating. I plan on fixing this over the weekend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-12334) [Rust] [Ballista] Aggregate queries producing incorrect results

2021-04-11 Thread Andy Grove (Jira)

Andy Grove created ARROW-12334:
--

 Summary: [Rust] [Ballista] Aggregate queries producing incorrect 
results
 Key: ARROW-12334
 URL: https://issues.apache.org/jira/browse/ARROW-12334
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - Ballista
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 4.0.0


I just ran benchmarks for the first time in a while and I see duplicate entries 
for group by keys.

 

For example, query 1 has "group by l_returnflag, l_linestatus" and I see 
multiple results with l_returnflag = 'A' and l_linestatus = 'F'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-12274) [JS] Document how to run tests without building

2021-04-11 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-12274.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9983
[https://github.com/apache/arrow/pull/9983]

> [JS] Document how to run tests without building
> ---
>
> Key: ARROW-12274
> URL: https://issues.apache.org/jira/browse/ARROW-12274
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/blob/master/js/DEVELOP.md does not document 
> that one can run `npm run test -- -t src`. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12333) [JS] Remove jest-environment-node-debug and do not emit from typescript by default

2021-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12333:
---
Labels: pull-request-available  (was: )

> [JS] Remove jest-environment-node-debug and do not emit from typescript by 
> default
> --
>
> Key: ARROW-12333
> URL: https://issues.apache.org/jira/browse/ARROW-12333
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-12333) [JS] Remove jest-environment-node-debug and do not emit from typescript by default

2021-04-11 Thread Dominik Moritz (Jira)

Dominik Moritz created ARROW-12333:
--

 Summary: [JS] Remove jest-environment-node-debug and do not emit 
from typescript by default
 Key: ARROW-12333
 URL: https://issues.apache.org/jira/browse/ARROW-12333
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Dominik Moritz
Assignee: Dominik Moritz






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-12281) [JS] Remove shx, trash, and rimraf

2021-04-11 Thread Kouhei Sutou (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-12281.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9938
[https://github.com/apache/arrow/pull/9938]

> [JS] Remove shx, trash, and rimraf
> --
>
> Key: ARROW-12281
> URL: https://issues.apache.org/jira/browse/ARROW-12281
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We can use del instead



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors

2021-04-11 Thread Julian Hyde (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated ARROW-11135:

Affects Version/s: 3.0.0

> Using Maven Central artifacts as dependencies produce runtime errors
> 
>
> Key: ARROW-11135
> URL: https://issues.apache.org/jira/browse/ARROW-11135
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Michael Mior
>Priority: Major
>
> I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the 
> integration is working well, but I'm having issues . As [suggested on the 
> mailing 
> list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E],
>  using Dremio's public artifacts solves the problem. Between two Apache 
> projects however, there would be strong preference to use Apache artifacts as 
> a dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors

2021-04-11 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318933#comment-17318933
 ] 

Julian Hyde commented on ARROW-11135:
-

I think this issue boils down to two problems:
* The install documentation should state that you need to install protobuf on 
macOS. That is the cause of the 
{{/usr/local/opt/protobuf/lib/libprotobuf.24.dylib}} error.
* The artifacts in Maven Central only support macOS. They should support Linux 
and macOS.

Do you agree?

> Using Maven Central artifacts as dependencies produce runtime errors
> 
>
> Key: ARROW-11135
> URL: https://issues.apache.org/jira/browse/ARROW-11135
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Michael Mior
>Priority: Major
>
> I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the 
> integration is working well, but I'm having issues . As [suggested on the 
> mailing 
> list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E],
>  using Dremio's public artifacts solves the problem. Between two Apache 
> projects however, there would be strong preference to use Apache artifacts as 
> a dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors

2021-04-11 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318931#comment-17318931
 ] 

Julian Hyde edited comment on ARROW-11135 at 4/11/21, 7:33 PM:
---

There are no missing packages. But the install instructions should probably say:
* The Gandiva library only works on macOS, and requires that you manually 
install protobuf 2.5.

By the way, I compared which files are in the 3.0.0 release jar (which works on 
macOS) and the 3.0.0-SNAPSHOT jar (which works on Linux).

{noformat}
$ diff -u <(tar tf ./arrow-gandiva-3.0.0-SNAPSHOT.jar | sort) <(tar tf 
./arrow-gandiva-3.0.0.jar 
 | sort) 
--- /dev/fd/63  2021-04-11 12:25:09.0 -0700
+++ /dev/fd/62  2021-04-11 12:25:09.0 -0700
@@ -11,7 +11,10 @@
 META-INF/maven/org.apache.arrow.gandiva/arrow-gandiva/pom.xml
 Types.proto
 git.properties
-libgandiva_jni.so
+libgandiva_jni.300.0.0.dylib
+libgandiva_jni.300.dylib
+libgandiva_jni.a
+libgandiva_jni.dylib
 org/
 org/apache/
 org/apache/arrow/
@@ -188,3 +191,8 @@
 org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNode.class
 org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNodeOrBuilder.class
 org/apache/arrow/gandiva/ipc/GandivaTypes.class
+release/
+release/libgandiva_jni.300.0.0.dylib
+release/libgandiva_jni.300.dylib
+release/libgandiva_jni.a
+release/libgandiva_jni.dylib
{noformat}

It would be awesome if, in the next release, the jar contained ALL of those 
files, and then I suppose it would work on both Linux and macOS.


was (Author: julianhyde):
There are no missing packages. But the install instructions should probably say:
* The Gandiva library only works on macOS, and requires that you manually 
install protobuf 2.5.

By the way, I compared which files are in the 3.0.0 release jar (which works on 
macOS) and the 3.0.0-SNAPSHOT jar (which works on Linux).

{noformat}
$ diff -u <(tar tvf ./arrow-gandiva-3.0.0-SNAPSHOT.jar |awk '{print $NF}'|sort) 
<(tar tvf ./arrow-gandiva-3.0.0.jar |awk '{print $NF}'|sort) 
--- /dev/fd/63  2021-04-11 12:25:09.0 -0700
+++ /dev/fd/62  2021-04-11 12:25:09.0 -0700
@@ -11,7 +11,10 @@
 META-INF/maven/org.apache.arrow.gandiva/arrow-gandiva/pom.xml
 Types.proto
 git.properties
-libgandiva_jni.so
+libgandiva_jni.300.0.0.dylib
+libgandiva_jni.300.dylib
+libgandiva_jni.a
+libgandiva_jni.dylib
 org/
 org/apache/
 org/apache/arrow/
@@ -188,3 +191,8 @@
 org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNode.class
 org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNodeOrBuilder.class
 org/apache/arrow/gandiva/ipc/GandivaTypes.class
+release/
+release/libgandiva_jni.300.0.0.dylib
+release/libgandiva_jni.300.dylib
+release/libgandiva_jni.a
+release/libgandiva_jni.dylib
{noformat}

It would be awesome if, in the next release, the jar contained ALL of those 
files, and then I suppose it would work on both Linux and macOS.

> Using Maven Central artifacts as dependencies produce runtime errors
> 
>
> Key: ARROW-11135
> URL: https://issues.apache.org/jira/browse/ARROW-11135
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Michael Mior
>Priority: Major
>
> I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the 
> integration is working well, but I'm having issues . As [suggested on the 
> mailing 
> list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E],
>  using Dremio's public artifacts solves the problem. Between two Apache 
> projects however, there would be strong preference to use Apache artifacts as 
> a dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11135) Using Maven Central artifacts as dependencies produce runtime errors

2021-04-11 Thread Julian Hyde (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318931#comment-17318931
 ] 

Julian Hyde commented on ARROW-11135:
-

There are no missing packages. But the install instructions should probably say:
* The Gandiva library only works on macOS, and requires that you manually 
install protobuf 2.5.

By the way, I compared which files are in the 3.0.0 release jar (which works on 
macOS) and the 3.0.0-SNAPSHOT jar (which works on Linux).

{noformat}
$ diff -u <(tar tvf ./arrow-gandiva-3.0.0-SNAPSHOT.jar |awk '{print $NF}'|sort) 
<(tar tvf ./arrow-gandiva-3.0.0.jar |awk '{print $NF}'|sort) 
--- /dev/fd/63  2021-04-11 12:25:09.0 -0700
+++ /dev/fd/62  2021-04-11 12:25:09.0 -0700
@@ -11,7 +11,10 @@
 META-INF/maven/org.apache.arrow.gandiva/arrow-gandiva/pom.xml
 Types.proto
 git.properties
-libgandiva_jni.so
+libgandiva_jni.300.0.0.dylib
+libgandiva_jni.300.dylib
+libgandiva_jni.a
+libgandiva_jni.dylib
 org/
 org/apache/
 org/apache/arrow/
@@ -188,3 +191,8 @@
 org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNode.class
 org/apache/arrow/gandiva/ipc/GandivaTypes$TreeNodeOrBuilder.class
 org/apache/arrow/gandiva/ipc/GandivaTypes.class
+release/
+release/libgandiva_jni.300.0.0.dylib
+release/libgandiva_jni.300.dylib
+release/libgandiva_jni.a
+release/libgandiva_jni.dylib
{noformat}

It would be awesome if, in the next release, the jar contained ALL of those 
files, and then I suppose it would work on both Linux and macOS.

> Using Maven Central artifacts as dependencies produce runtime errors
> 
>
> Key: ARROW-11135
> URL: https://issues.apache.org/jira/browse/ARROW-11135
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 2.0.0
>Reporter: Michael Mior
>Priority: Major
>
> I'm working on connecting Arrow/Gandiva with Apache Calcite. Overall the 
> integration is working well, but I'm having issues . As [suggested on the 
> mailing 
> list|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E],
>  using Dremio's public artifacts solves the problem. Between two Apache 
> projects however, there would be strong preference to use Apache artifacts as 
> a dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12332) Api server for scheduler

2021-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12332:
---
Labels: pull-request-available  (was: )

> Api server for scheduler
> 
>
> Key: ARROW-12332
> URL: https://issues.apache.org/jira/browse/ARROW-12332
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - Ballista
>Reporter: Sathis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-12332) Api server for scheduler

2021-04-11 Thread Sathis (Jira)

Sathis created ARROW-12332:
--

 Summary: Api server for scheduler
 Key: ARROW-12332
 URL: https://issues.apache.org/jira/browse/ARROW-12332
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - Ballista
Reporter: Sathis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12316) [C++] Switch default memory allocator from jemalloc to mimalloc

2021-04-11 Thread Neal Richardson (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318918#comment-17318918
 ] 

Neal Richardson commented on ARROW-12316:
-

[~jonkeane] can you attach your reports?

> [C++] Switch default memory allocator from jemalloc to mimalloc
> ---
>
> Key: ARROW-12316
> URL: https://issues.apache.org/jira/browse/ARROW-12316
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 4.0.0
>
>
> Benchmarking shows that mimalloc seems to be faster on real workflows (at 
> least on macOS, still collecting data on Ubuntu). We could switch the default 
> memory pool cases so that mimalloc is preferred. 
> cc [~jonkeane] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12316) [C++] Switch default memory allocator from jemalloc to mimalloc

2021-04-11 Thread Uwe Korn (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318907#comment-17318907
 ] 

Uwe Korn commented on ARROW-12316:
--

[~npr] Where can I find these benchmarks?

> [C++] Switch default memory allocator from jemalloc to mimalloc
> ---
>
> Key: ARROW-12316
> URL: https://issues.apache.org/jira/browse/ARROW-12316
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 4.0.0
>
>
> Benchmarking shows that mimalloc seems to be faster on real workflows (at 
> least on macOS, still collecting data on Ubuntu). We could switch the default 
> memory pool cases so that mimalloc is preferred. 
> cc [~jonkeane] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12260) [Website] [Rust] Announce Ballista donation

2021-04-11 Thread Andy Grove (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318904#comment-17318904
 ] 

Andy Grove commented on ARROW-12260:


https://github.com/apache/arrow-site/pull/100

> [Website] [Rust] Announce Ballista donation
> ---
>
> Key: ARROW-12260
> URL: https://issues.apache.org/jira/browse/ARROW-12260
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> Once the IP clearance vote passes and the PR has been merged, we should 
> announce the donation on the Arrow blog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:58 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 

Yet not sure if this implementation is stable because the order is from less 
significant bits. But it seems to be easy to change

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10920) [Rust] Segmentation fault in Arrow Parquet writer with huge arrays

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-10920:
---
Fix Version/s: (was: 4.0.0)

> [Rust] Segmentation fault in Arrow Parquet writer with huge arrays
> --
>
> Key: ARROW-10920
> URL: https://issues.apache.org/jira/browse/ARROW-10920
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
>
> I stumbled across this by chance. I am not too surprised that this fails but 
> I would expect it to fail gracefully and not with a segmentation fault.
>  
> {code:java}
>  use std::fs::File;
> use std::sync::Arc;
> use arrow::array::StringBuilder;
> use arrow::datatypes::{DataType, Field, Schema};
> use arrow::error::Result;
> use arrow::record_batch::RecordBatch;
> use parquet::arrow::ArrowWriter;
> fn main() -> Result<()> {
> let schema = Schema::new(vec![
> Field::new("c0", DataType::Utf8, false),
> Field::new("c1", DataType::Utf8, true),
> ]);
> let batch_size = 250;
> let repeat_count = 140;
> let file = File::create("/tmp/test.parquet")?;
> let mut writer = ArrowWriter::try_new(file, Arc::new(schema.clone()), 
> None).unwrap();
> let mut c0_builder = StringBuilder::new(batch_size);
> let mut c1_builder = StringBuilder::new(batch_size);
> println!("Start of loop");
> for i in 0..batch_size {
> let c0_value = format!("{:032}", i);
> let c1_value = c0_value.repeat(repeat_count);
> c0_builder.append_value(_value)?;
> c1_builder.append_value(_value)?;
> }
> println!("Finish building c0");
> let c0 = Arc::new(c0_builder.finish());
> println!("Finish building c1");
> let c1 = Arc::new(c1_builder.finish());
> println!("Creating RecordBatch");
> let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![c0, c1])?;
> // write the batch to parquet
> println!("Writing RecordBatch");
> writer.write().unwrap();
> println!("Closing writer");
> writer.close().unwrap();
> Ok(())
> }
> {code}
> output:
> {code:java}
> Start of loop
> Finish building c0
> Finish building c1
> Creating RecordBatch
> Writing RecordBatch
> Segmentation fault (core dumped)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11625) [Rust] [DataFusion] Move SortExec partition check to constructor

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-11625:
---
Fix Version/s: (was: 4.0.0)

> [Rust] [DataFusion] Move SortExec partition check to constructor
> 
>
> Key: ARROW-11625
> URL: https://issues.apache.org/jira/browse/ARROW-11625
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> SortExec has the following error check at execution time and this could be 
> moved into the try_new constructor so the error check happens at planning 
> time instead.
>  
> {code:java}
> if 1 != self.input.output_partitioning().partition_count() {
> return Err(DataFusionError::Internal(
> "SortExec requires a single input partition".to_owned(),
> ));
> } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11016) [Rust] Parquet ArrayReader should allow reading a subset of row groups

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-11016:
---
Fix Version/s: (was: 4.0.0)

> [Rust] Parquet ArrayReader should allow reading a subset of row groups
> --
>
> Key: ARROW-11016
> URL: https://issues.apache.org/jira/browse/ARROW-11016
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Andy Grove
>Priority: Major
>
> Parquet ArrayReader currently only supports reading an entire file from start 
> to finish and does not allow selectively reading a subset of row groups. This 
> prevents us from parallelizing work across threads when processing a single 
> parquet file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11094) [Rust] [DataFusion] Implement Sort-Merge Join

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-11094:
---
Fix Version/s: (was: 4.0.0)

> [Rust] [DataFusion] Implement Sort-Merge Join
> -
>
> Key: ARROW-11094
> URL: https://issues.apache.org/jira/browse/ARROW-11094
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> The current hash join works well when one side of the join can be loaded into 
> memory but cannot scale beyond the available RAM.
> The advantage of implementing SMJ (Sort-Merge Join) is that we can sort the 
> left and right partitions, and write the intermediate results to disk, and 
> then stream both sides of the join by merging these sorted partitions and we 
> do not need to load one side into memory. At most, we need to load all 
> batches from both sides that contain the current join key values.
> In order to reduce memory pressure we will want to limit the concurrency of 
> these sort operations.
> We would still want to default to hash join when we know that the build-side 
> can fit into memory since it is more efficient than using a sort-merge join.
> [https://en.wikipedia.org/wiki/Sort-merge_join]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11020) [Rust] [DataFusion] Implement better tests for ParquetExec

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-11020:
---
Fix Version/s: (was: 4.0.0)

> [Rust] [DataFusion] Implement better tests for ParquetExec
> --
>
> Key: ARROW-11020
> URL: https://issues.apache.org/jira/browse/ARROW-11020
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> Implement better tests for ParquetExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10884) [Rust] [DataFusion] Benchmark crate does not have a SIMD feature

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-10884:
---
Fix Version/s: (was: 4.0.0)

> [Rust] [DataFusion] Benchmark crate does not have a SIMD feature
> 
>
> Key: ARROW-10884
> URL: https://issues.apache.org/jira/browse/ARROW-10884
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Minor
>
> The benchmarks run without SIMD by default. We need to add a feature to the 
> Cargo.toml to enable SIMD.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12313) [Rust] [Ballista] Benchmark documentation out of date

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-12313:
---
Summary: [Rust] [Ballista] Benchmark documentation out of date  (was: 
[Rust] [Ballista] Benchmark docuementation out of date)

> [Rust] [Ballista] Benchmark documentation out of date
> -
>
> Key: ARROW-12313
> URL: https://issues.apache.org/jira/browse/ARROW-12313
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 4.0.0
>
>
> The scheduler/executor were refactored and the documentation for the 
> benchmarks now needs updating. I plan on fixing this over the weekend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11059) [Rust] [DataFusion] Implement extensible configuration mechanism

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-11059:
---
Fix Version/s: (was: 4.0.0)

> [Rust] [DataFusion] Implement extensible configuration mechanism
> 
>
> Key: ARROW-11059
> URL: https://issues.apache.org/jira/browse/ARROW-11059
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> We are getting to the point where there are multiple settings we could add to 
> operators to fine-tune performance. Custom operators provided by crates that 
> extend DataFusion may also need this capability.
> I propose that we add support for key-value configuration options so that we 
> don't need to plumb through each new configuration setting that we add.
> For example. I am about to start on a "coalesce batches" operator and I would 
> like a setting such as "coalesce.batch.size".
> For built-in settings like this we can provide information such as 
> documentation and default values and generate documentation from this.
> For example, here is how Spark defines configs:
> {code:java}
>   val PARQUET_VECTORIZED_READER_ENABLED =
> buildConf("spark.sql.parquet.enableVectorizedReader")
>   .doc("Enables vectorized parquet decoding.")
>   .version("2.0.0")
>   .booleanConf
>   .createWithDefault(true) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:47 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 

Yet not sure if this implementation is stable because the order is from less 
significant bits. But it seems to be easy to change

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 
Yet not sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:40 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! yet no sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:40 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 
Yet not sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! yet no sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-12251) [Rust] [Ballista] Add Ballista tests to CI

2021-04-11 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-12251.

Resolution: Fixed

Issue resolved by pull request 9979
[https://github.com/apache/arrow/pull/9979]

> [Rust] [Ballista] Add Ballista tests to CI
> --
>
> Key: ARROW-12251
> URL: https://issues.apache.org/jira/browse/ARROW-12251
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - Ballista
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ballista is a standalone project (not part of the Arrow Rust workspace) and 
> therefore the tests will not run in CI without additional work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-12331) [Rust] [Ballista] Make CI build work with snmalloc

2021-04-11 Thread Andy Grove (Jira)

Andy Grove created ARROW-12331:
--

 Summary: [Rust] [Ballista] Make CI build work with snmalloc
 Key: ARROW-12331
 URL: https://issues.apache.org/jira/browse/ARROW-12331
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - Ballista
Reporter: Andy Grove
 Fix For: 4.0.0


Ballista was added to CI in [https://github.com/apache/arrow/pull/9979] but is 
building without default features due to snmalloc requiring cmake.

An alternative approach would be to build with cc instead of cmake. See the 
above PR for conversation about this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark

2021-04-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-12330:
---
Labels: pull-request-available  (was: )

> [Developer] Restore values in counters column of Archery benchmark
> --
>
> Key: ARROW-12330
> URL: https://issues.apache.org/jira/browse/ARROW-12330
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 3.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue is that ARROW-11189 always suppressed values in {{counters}} column 
> of Archery benchmark
> {code:java}
> % archery benchmark diff --benchmark-filter="SetBitsTo" --output=head2.json 
> HEAD HEAD~1
> ...
> ---
> Benchmark Time CPU   Iterations UserCounters...
> ---
> SetBitsTo/28.15 ns 8.15 ns 81991087 
> bytes_per_second=234.044M/s
> SetBitsTo/16   7.78 ns 7.78 ns 89928878 
> bytes_per_second=1.91429G/s
> SetBitsTo/1024 13.9 ns 13.9 ns 50372172 
> bytes_per_second=68.6182G/s
> SetBitsTo/131072   3508 ns 3508 ns   199335 
> bytes_per_second=34.7944G/s
> --
> Non-regressions: (4)
> --
> benchmark baselinecontender  change % counters
>  SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975   {}
>   SetBitsTo/2  230.566 MiB/sec  234.044 MiB/sec 1.509   {}
>  SetBitsTo/131072   34.722 GiB/sec   34.794 GiB/sec 0.207   {}
>SetBitsTo/1024   68.593 GiB/sec   68.618 GiB/sec 0.037   {}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark

2021-04-11 Thread Kazuaki Ishizaki (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated ARROW-12330:
-
Description: 
The issue is that ARROW-11189 always suppressed values in {{counters}} column 
of Archery benchmark
{code:java}
% archery benchmark diff --benchmark-filter="SetBitsTo" --output=head2.json 
HEAD HEAD~1
...
---
Benchmark Time CPU   Iterations UserCounters...
---
SetBitsTo/28.15 ns 8.15 ns 81991087 
bytes_per_second=234.044M/s
SetBitsTo/16   7.78 ns 7.78 ns 89928878 
bytes_per_second=1.91429G/s
SetBitsTo/1024 13.9 ns 13.9 ns 50372172 
bytes_per_second=68.6182G/s
SetBitsTo/131072   3508 ns 3508 ns   199335 
bytes_per_second=34.7944G/s
--
Non-regressions: (4)
--
benchmark baselinecontender  change % counters
 SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975   {}
  SetBitsTo/2  230.566 MiB/sec  234.044 MiB/sec 1.509   {}
 SetBitsTo/131072   34.722 GiB/sec   34.794 GiB/sec 0.207   {}
   SetBitsTo/1024   68.593 GiB/sec   68.618 GiB/sec 0.037   {}
{code}

  was:
The issue is that ARROW-11189 always suppressed values in {{counters}} column 
of Archery benchmark

{code}
% archery benchmark run --benchmark-filter="SetBitsTo" --output=head2.json HEAD 
HEAD~1
...
---
Benchmark Time CPU   Iterations UserCounters...
---
SetBitsTo/28.15 ns 8.15 ns 81991087 
bytes_per_second=234.044M/s
SetBitsTo/16   7.78 ns 7.78 ns 89928878 
bytes_per_second=1.91429G/s
SetBitsTo/1024 13.9 ns 13.9 ns 50372172 
bytes_per_second=68.6182G/s
SetBitsTo/131072   3508 ns 3508 ns   199335 
bytes_per_second=34.7944G/s
--
Non-regressions: (4)
--
benchmark baselinecontender  change % counters
 SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975   {}
  SetBitsTo/2  230.566 MiB/sec  234.044 MiB/sec 1.509   {}
 SetBitsTo/131072   34.722 GiB/sec   34.794 GiB/sec 0.207   {}
   SetBitsTo/1024   68.593 GiB/sec   68.618 GiB/sec 0.037   {}
{code}


> [Developer] Restore values in counters column of Archery benchmark
> --
>
> Key: ARROW-12330
> URL: https://issues.apache.org/jira/browse/ARROW-12330
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 3.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 4.0.0
>
>
> The issue is that ARROW-11189 always suppressed values in {{counters}} column 
> of Archery benchmark
> {code:java}
> % archery benchmark diff --benchmark-filter="SetBitsTo" --output=head2.json 
> HEAD HEAD~1
> ...
> ---
> Benchmark Time CPU   Iterations UserCounters...
> ---
> SetBitsTo/28.15 ns 8.15 ns 81991087 
> bytes_per_second=234.044M/s
> SetBitsTo/16   7.78 ns 7.78 ns 89928878 
> bytes_per_second=1.91429G/s
> SetBitsTo/1024 13.9 ns 13.9 ns 50372172 
> bytes_per_second=68.6182G/s
> SetBitsTo/131072   3508 ns 3508 ns   199335 
> bytes_per_second=34.7944G/s
> --
> Non-regressions: (4)
> --
> benchmark baselinecontender  change % counters
>  SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975   {}
>   SetBitsTo/2  230.566 MiB/sec  234.044 MiB/sec 1.509   {}
>  SetBitsTo/131072   34.722 GiB/sec   34.794 GiB/sec 0.207   {}
>SetBitsTo/1024   68.593 GiB/sec   68.618 GiB/sec 0.037   {}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark

2021-04-11 Thread Kazuaki Ishizaki (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki reassigned ARROW-12330:


Assignee: Kazuaki Ishizaki

> [Developer] Restore values in counters column of Archery benchmark
> --
>
> Key: ARROW-12330
> URL: https://issues.apache.org/jira/browse/ARROW-12330
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Affects Versions: 3.0.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 4.0.0
>
>
> The issue is that ARROW-11189 always suppressed values in {{counters}} column 
> of Archery benchmark
> {code}
> % archery benchmark run --benchmark-filter="SetBitsTo" --output=head2.json 
> HEAD HEAD~1
> ...
> ---
> Benchmark Time CPU   Iterations UserCounters...
> ---
> SetBitsTo/28.15 ns 8.15 ns 81991087 
> bytes_per_second=234.044M/s
> SetBitsTo/16   7.78 ns 7.78 ns 89928878 
> bytes_per_second=1.91429G/s
> SetBitsTo/1024 13.9 ns 13.9 ns 50372172 
> bytes_per_second=68.6182G/s
> SetBitsTo/131072   3508 ns 3508 ns   199335 
> bytes_per_second=34.7944G/s
> --
> Non-regressions: (4)
> --
> benchmark baselinecontender  change % counters
>  SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975   {}
>   SetBitsTo/2  230.566 MiB/sec  234.044 MiB/sec 1.509   {}
>  SetBitsTo/131072   34.722 GiB/sec   34.794 GiB/sec 0.207   {}
>SetBitsTo/1024   68.593 GiB/sec   68.618 GiB/sec 0.037   {}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-12330) [Developer] Restore values in counters column of Archery benchmark

2021-04-11 Thread Kazuaki Ishizaki (Jira)

Kazuaki Ishizaki created ARROW-12330:


 Summary: [Developer] Restore values in counters column of Archery 
benchmark
 Key: ARROW-12330
 URL: https://issues.apache.org/jira/browse/ARROW-12330
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Affects Versions: 3.0.0
Reporter: Kazuaki Ishizaki
 Fix For: 4.0.0


The issue is that ARROW-11189 always suppressed values in {{counters}} column 
of Archery benchmark

{code}
% archery benchmark run --benchmark-filter="SetBitsTo" --output=head2.json HEAD 
HEAD~1
...
---
Benchmark Time CPU   Iterations UserCounters...
---
SetBitsTo/28.15 ns 8.15 ns 81991087 
bytes_per_second=234.044M/s
SetBitsTo/16   7.78 ns 7.78 ns 89928878 
bytes_per_second=1.91429G/s
SetBitsTo/1024 13.9 ns 13.9 ns 50372172 
bytes_per_second=68.6182G/s
SetBitsTo/131072   3508 ns 3508 ns   199335 
bytes_per_second=34.7944G/s
--
Non-regressions: (4)
--
benchmark baselinecontender  change % counters
 SetBitsTo/161.877 GiB/sec1.914 GiB/sec 1.975   {}
  SetBitsTo/2  230.566 MiB/sec  234.044 MiB/sec 1.509   {}
 SetBitsTo/131072   34.722 GiB/sec   34.794 GiB/sec 0.207   {}
   SetBitsTo/1024   68.593 GiB/sec   68.618 GiB/sec 0.037   {}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10744) [Python] Enable wheel deployment for Mac OS 11 Big Sur

2021-04-11 Thread Jira



[ 
https://issues.apache.org/jira/browse/ARROW-10744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318777#comment-17318777
 ] 

Ismaël Mejía commented on ARROW-10744:
--

Is support for Mac OS ARM64 part of this ticket or tracked by a different one?

> [Python] Enable wheel deployment for Mac OS 11 Big Sur
> --
>
> Key: ARROW-10744
> URL: https://issues.apache.org/jira/browse/ARROW-10744
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: David de L.
>Priority: Major
>
> It is currently quite tricky to get pyarrow to build on latest Mac 
> distributions.
> Since GitHub runners 
> [support|https://docs.github.com/en/free-pro-team@latest/actions/reference/specifications-for-github-hosted-runners#supported-runners-and-hardware-resources]
>  Mac 11.0 Big Sur, could wheels be built for this OS in CD?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12318) [Rust][DataFusion] Add support for AVG(Timestamp) types

2021-04-11 Thread Andrew Lamb (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318712#comment-17318712
 ] 

Andrew Lamb commented on ARROW-12318:
-

[~Dandandan] notes that PostgreSQL doesn't support SUM or AVG for timestamps: 
https://www.postgresql.org/docs/13/functions-aggregate.html


so perhaps we should not support it in DataFusion either :thinking_face:

> [Rust][DataFusion] Add support for AVG(Timestamp) types
> ---
>
> Key: ARROW-12318
> URL: https://issues.apache.org/jira/browse/ARROW-12318
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: Andrew Lamb
>Priority: Minor
>
> This is a follow on to ARROW-12277
> Background: Support for Min/Max/Sum/Count were added for 
> DataType::Timestamp(*) types in https://github.com/apache/arrow/pull/9970.
> This ticket tracks adding support for Avg, which is slightly more involved as 
> currently Avg assumes the output type is always F64, and in this case I think 
> Avg(timestamp) should also be (timestamp). We should double check what 
> postgres does in this case and follow its example



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow, which will require some more benchmarking, testing and 
code polishing. What do you think?

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:33 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow, which will require some more benchmarking, testing and 
code polishing. What do you think?


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=350,width=350!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARROW-12267) [Rust] JSON writer does not support timestamp types

2021-04-11 Thread Andrew Lamb (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-12267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Lamb resolved ARROW-12267.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9968
[https://github.com/apache/arrow/pull/9968]

> [Rust] JSON writer does not support timestamp types
> ---
>
> Key: ARROW-12267
> URL: https://issues.apache.org/jira/browse/ARROW-12267
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andrew Lamb
>Assignee: Andrew Lamb
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Looks like the json writer.rs code in arrow doesn't support writing out 
> timestamps. When I tried to write out a `TimestampNanosecondArray` I got the 
> following error:
> ```
> thread 'influxdb_ioxd::http::tests::test_query_json' panicked at 'Unsupported 
> datatype: Timestamp(
> Nanosecond,
> None,
> )', 
> /Users/alamb/.cargo/git/checkouts/arrow-3a9cfebb6b7b2bdc/3e825a7/rust/arrow/src/json/writer.rs:326:13
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:25 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=350,width=350!


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=250,width=250!!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:24 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=250,width=250!!


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318707#comment-17318707
 ] 

Kirill Lykov commented on ARROW-10899:
--

Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)



 [ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Lykov updated ARROW-10899:
-
Attachment: all_random_wholeRange.png

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12306) [Rust] Read CSV format text from stdin or memory

2021-04-11 Thread Siwei (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318668#comment-17318668
 ] 

Siwei commented on ARROW-12306:
---

Ok.I will do it.

> [Rust] Read CSV format text from stdin or memory
> 
>
> Key: ARROW-12306
> URL: https://issues.apache.org/jira/browse/ARROW-12306
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Rust - DataFusion
>Reporter: Siwei
>Priority: Minor
>
> Hello,
> I'm building a command line tool that can run SQL queries on text files (csv, 
> json-line ..) . But the `CsvExec` in datafusion can only read csv text from 
> files currently. I have checked its inner implantation the csv reader in 
> arrow, anything impl `Read` could be a valid input.
>  
> Should this feature ( read csv from stdin) come with datafusion ? Or I just 
> make it into my own crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

56 matches

Mail list logo