[jira] [Updated] (ARROW-10263) [C++][Compute] Improve numerical stability of variances merging

2020-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-10263:
---
Labels: pull-request-available  (was: )

> [C++][Compute] Improve numerical stability of variances merging
> ---
>
> Key: ARROW-10263
> URL: https://issues.apache.org/jira/browse/ARROW-10263
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Yibo Cai
>Assignee: Yibo Cai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For chunked array, variance kernel needs to merge variances.
> Tested with two single value chunk, [400800490], [400800400]. 
> The merged variance is 3872. If treated as single array with two values, the 
> variance is 3904, same as numpy outputs.
> So current merging method is not stable in extreme cases when chunks are very 
> short and with approximate mean values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10278) [Cmake] Failures when building Arrow unittests from source

2020-10-11 Thread Andrew Wieteska (Jira)
Andrew Wieteska created ARROW-10278:
---

 Summary: [Cmake] Failures when building Arrow unittests from source
 Key: ARROW-10278
 URL: https://issues.apache.org/jira/browse/ARROW-10278
 Project: Apache Arrow
  Issue Type: Test
  Components: C++
Reporter: Andrew Wieteska


I've started to get errors while building the unit tests from source.

Following the developer docs, I run this:

 
{code:java}
cd arrow/cpp/debug 
cmake -DCMAKE_BUILD_TYPE=Debug -DARROW_BUILD_TESTS=ON .. 
make unittest
{code}
On current master I get a number of failures:

 
{code:java}
The following tests FAILED:
 1 - arrow-array-test (Failed)
 2 - arrow-buffer-test (Failed)
 4 - arrow-misc-test (Failed)
 6 - arrow-scalar-test (Failed)
 7 - arrow-type-test (Failed)
 8 - arrow-table-test (Failed)
 9 - arrow-tensor-test (Failed)
 10 - arrow-sparse-tensor-test (Failed)
 11 - arrow-stl-test (Failed)
 12 - arrow-json-integration-test (Failed)
 13 - arrow-concatenate-test (Failed)
 14 - arrow-diff-test (Failed)
 15 - arrow-c-bridge-test (Failed)
 17 - arrow-io-compressed-test (Failed)
 19 - arrow-io-memory-test (Failed)
 20 - arrow-utility-test (Failed)
 21 - arrow-threading-utility-test (Failed)
 23 - arrow-compute-scalar-test (Failed)
 24 - arrow-compute-vector-test (Failed)
 26 - arrow-feather-test (Failed)
 27 - arrow-ipc-json-simple-test (Failed)
 28 - arrow-ipc-read-write-test (Failed)
 29 - arrow-ipc-tensor-test (Failed)
 30 - arrow-json-test (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/unittest.dir/build.make:76: CMakeFiles/unittest] Error 
8
make[2]: *** [CMakeFiles/Makefile2:572: CMakeFiles/unittest.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:579: CMakeFiles/unittest.dir/rule] Error 2
make: *** [Makefile:246: unittest] Error 2
 
{code}
Scrolling up I see that these all fail with this message:

 
{code:java}
18/30 Test #23: arrow-compute-scalar-test ***Failed 0.11 sec
Running arrow-compute-scalar-test, redirecting output into 
/home/andrew/git_repo/arrow/cpp/debug/build/test-logs/arrow-compute-scalar-test.txt
 (attempt 1/1)
/home/andrew/git_repo/arrow/cpp/debug/debug/arrow-compute-scalar-test: symbol 
lookup error: 
/home/andrew/git_repo/arrow/cpp/debug/debug/arrow-compute-scalar-test: 
undefined symbol: _ZN5arrow6Status14AddContextLineEPKciS2_
cat: 
/home/andrew/git_repo/arrow/cpp/debug/build/test-logs/arrow-compute-scalar-test.txt.raw:
 No such file or directory
~/git_repo/arrow/cpp/debug/src/arrow/compute/kernels
 
{code}
I appreciate any comments/ideas on how to fix this!
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10277) [C++] Support comparing scalars approximately

2020-10-11 Thread Liya Fan (Jira)
Liya Fan created ARROW-10277:


 Summary: [C++] Support comparing scalars approximately
 Key: ARROW-10277
 URL: https://issues.apache.org/jira/browse/ARROW-10277
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Liya Fan
Assignee: Liya Fan


As discussed in 
[https://github.com/apache/arrow/pull/7748#discussion_r469997286,] we need to 
compare scalars approximately in some scenarios. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Description: 
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

Any help would be appreciated. Thanks

Spark Version: 2.4.5.

 

 

  was:
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

Any help would be appreciated. Thanks

Spark Version: 2.4.5

 

 


> Armv7 orc and flight not supported for build. Compat error on using with spark
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and flight flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature. I have attached images below
> Any help would be appreciated. Thanks
> Spark Version: 2.4.5.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Description: 
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

Any help would be appreciated. Thanks

Spark Version: 2.4.5

 

 

  was:
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

Any help would be appreciated. Thanks

 

 


> Armv7 orc and flight not supported for build. Compat error on using with spark
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and flight flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature. I have attached images below
> Any help would be appreciated. Thanks
> Spark Version: 2.4.5
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Description: 
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

 

Any help would be appreciated

 

 

  was:
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature.

 

Any help would be appreciated

 

 


> Armv7 orc and flight not supported for build. Compat error on using with spark
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and flight flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature. I have attached images below
>  
> Any help would be appreciated
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Description: 
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

Any help would be appreciated. Thanks

 

 

  was:
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature. I have attached images below

 

Any help would be appreciated

 

 


> Armv7 orc and flight not supported for build. Compat error on using with spark
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and flight flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature. I have attached images below
> Any help would be appreciated. Thanks
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error on using with spark

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Summary: Armv7 orc and flight not supported for build. Compat error on 
using with spark  (was: Armv7 orc and flight not supported for build. Compat 
error)

> Armv7 orc and flight not supported for build. Compat error on using with spark
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and flight flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature.
>  
> Any help would be appreciated
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Description: 
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and flight flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature.

 

Any help would be appreciated

 

 

  was:
I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and parquet flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature.

 

Any help would be appreciated

 

 


> Armv7 orc and flight not supported for build. Compat error
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and flight flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature.
>  
> Any help would be appreciated
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10276) Armv7 orc and flight not supported for build. Compat error

2020-10-11 Thread utsav (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

utsav updated ARROW-10276:
--
Summary: Armv7 orc and flight not supported for build. Compat error  (was: 
Armv7 orc and parquet not supported for build. Compat error)

> Armv7 orc and flight not supported for build. Compat error
> --
>
> Key: ARROW-10276
> URL: https://issues.apache.org/jira/browse/ARROW-10276
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: utsav
>Priority: Major
> Attachments: arrow_armv7, arrow_compat_error
>
>
> I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
> tried to use it for the raspberry pi 3 without luck in previous posts.
> I figured out how to successfully build it for armv7 using the script below 
> but cannot use orc and parquet flags. People had looked into it in ARROW-8420 
> but I don't know if they faced these issues.
> I tried converting a spark dataframe to pandas using pyarrow but now it 
> complains about a compat feature.
>  
> Any help would be appreciated
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10276) Armv7 orc and parquet not supported for build. Compat error

2020-10-11 Thread utsav (Jira)
utsav created ARROW-10276:
-

 Summary: Armv7 orc and parquet not supported for build. Compat 
error
 Key: ARROW-10276
 URL: https://issues.apache.org/jira/browse/ARROW-10276
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.17.0
Reporter: utsav
 Attachments: arrow_armv7, arrow_compat_error

I'm using a Arm Cortex A9 processor on the Xilinx Pynq Z2 board. People have 
tried to use it for the raspberry pi 3 without luck in previous posts.

I figured out how to successfully build it for armv7 using the script below but 
cannot use orc and parquet flags. People had looked into it in ARROW-8420 but I 
don't know if they faced these issues.

I tried converting a spark dataframe to pandas using pyarrow but now it 
complains about a compat feature.

 

Any help would be appreciated

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-10271.

Resolution: Fixed

Issue resolved by pull request 8433
[https://github.com/apache/arrow/pull/8433]

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Assignee: Neville Dipale
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-10251) [Rust] [DataFusion] MemTable::load() should load partitions in parallel

2020-10-11 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved ARROW-10251.

Fix Version/s: (was: 3.0.0)
   2.0.0
   Resolution: Fixed

Issue resolved by pull request 8428
[https://github.com/apache/arrow/pull/8428]

> [Rust] [DataFusion] MemTable::load() should load partitions in parallel
> ---
>
> Key: ARROW-10251
> URL: https://issues.apache.org/jira/browse/ARROW-10251
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> MemTable::load() should load partitions in parallel using async tasks, rather 
> than loading one partition at a time.
> Also, we should make batch size configurable. It is currently hard-coded to 
> 1024*1024 which can be quite inefficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-10251) [Rust] [DataFusion] MemTable::load() should load partitions in parallel

2020-10-11 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reassigned ARROW-10251:
--

Assignee: Andy Grove

> [Rust] [DataFusion] MemTable::load() should load partitions in parallel
> ---
>
> Key: ARROW-10251
> URL: https://issues.apache.org/jira/browse/ARROW-10251
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 3.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> MemTable::load() should load partitions in parallel using async tasks, rather 
> than loading one partition at a time.
> Also, we should make batch size configurable. It is currently hard-coded to 
> 1024*1024 which can be quite inefficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8038) [C++][Packaging] Add OpenSSL / encryption support to C++ packages

2020-10-11 Thread Akshay (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211935#comment-17211935
 ] 

Akshay commented on ARROW-8038:
---

Hello sir , [~wesm]

I am new to this parquet file format , I was trying to encrypt some parquet 
files using the given example code for encryption in github 
repository of apache/arrow/cpp/examples after installing the dependencies using 
the cmake file i was able to perform read operations on the parquet file. 
But when it is coming to encryption it compiles successfully but gives a 
runtime error of "Build without SSL".

I wanted to know is it related to this issue that binding of OpenSSl/encryption 
support with c++ package for parquet haven't been done yet that's why the error 
is coming ?

And if so , is there any other way i can try and test encryption on parquet 
files in c++ ! 

> [C++][Packaging] Add OpenSSL / encryption support to C++ packages
> -
>
> Key: ARROW-8038
> URL: https://issues.apache.org/jira/browse/ARROW-8038
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> This is an umbrella issue for tackling encryption support in the various 
> packaging targets (Linux deb/rpm, Homebrew, conda, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10275) [Rust] [Datafusion] GROUP BY with a high cardinality doesn't seem to finish

2020-10-11 Thread Josh Taylor (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Taylor updated ARROW-10275:

Description: 
Group by with a high cardinality (columns with lots of unique values) don't 
seem to finish.

I've tried with both datafusion-cli and this:

[https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]

When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
stall. I've tried with limit but it doesn't work either.

My parquet file: 
[https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing]

datafusion-cli:
{code:java}
CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 'demo.parquet';
select O_ORDERKEY from something group by O_ORDERKEY;
{code}
 

  was:
Group by with a high cardinality (columns with lots of unique values) don't 
seem to finish.

I've tried with both datafusion-cli and this:

[https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]

When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
stall. I've tried with limit but it doesn't work either.

My parquet file: 
[https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing]

datafusion-cli:

 
{code:java}
CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 'demo.parquet';
select O_ORDERKEY from something limit 20
{code}
 


> [Rust] [Datafusion] GROUP BY with a high cardinality doesn't seem to finish
> ---
>
> Key: ARROW-10275
> URL: https://issues.apache.org/jira/browse/ARROW-10275
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 2.0.0
> Environment: Ubuntu 20.04
>Reporter: Josh Taylor
>Priority: Minor
>
> Group by with a high cardinality (columns with lots of unique values) don't 
> seem to finish.
> I've tried with both datafusion-cli and this:
> [https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]
> When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
> stall. I've tried with limit but it doesn't work either.
> My parquet file: 
> [https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing]
> datafusion-cli:
> {code:java}
> CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 'demo.parquet';
> select O_ORDERKEY from something group by O_ORDERKEY;
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10275) [Rust] [Datafusion] GROUP BY with a high cardinality doesn't seem to finish

2020-10-11 Thread Josh Taylor (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Taylor updated ARROW-10275:

Description: 
Group by with a high cardinality (columns with lots of unique values) don't 
seem to finish.

I've tried with both datafusion-cli and this:

[https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]

When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
stall. I've tried with limit but it doesn't work either.

My parquet file: 
[https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing]

datafusion-cli:

 
{code:java}
CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 'demo.parquet';
select O_ORDERKEY from something limit 20
{code}
 

  was:
Group by with a high cardinality (columns with lots of unique values) don't 
seem to finish.

I've tried with both datafusion-cli and this:

[https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]

When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
stall. I've tried with limit but it doesn't work either.

My parquet file: 
https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing


> [Rust] [Datafusion] GROUP BY with a high cardinality doesn't seem to finish
> ---
>
> Key: ARROW-10275
> URL: https://issues.apache.org/jira/browse/ARROW-10275
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 2.0.0
> Environment: Ubuntu 20.04
>Reporter: Josh Taylor
>Priority: Minor
>
> Group by with a high cardinality (columns with lots of unique values) don't 
> seem to finish.
> I've tried with both datafusion-cli and this:
> [https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]
> When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
> stall. I've tried with limit but it doesn't work either.
> My parquet file: 
> [https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing]
> datafusion-cli:
>  
> {code:java}
> CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 'demo.parquet';
> select O_ORDERKEY from something limit 20
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10274) [Rust] arithmetic without SIMD does unnecesary copy

2020-10-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211853#comment-17211853
 ] 

Jorge Leitão commented on ARROW-10274:
--

> Maybe we could directly write the arithmetic result to a mutable buffer and 
> prevent this redundant copy?

Yes :)

> [Rust] arithmetic without SIMD does unnecesary copy
> ---
>
> Key: ARROW-10274
> URL: https://issues.apache.org/jira/browse/ARROW-10274
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Ritchie
>Priority: Minor
>
> The arithmetic kernels that don't use SIMD create a `vec` in memory and later 
> copy that data into a Buffer. Maybe we could directly write the arithmetic 
> result to a mutable buffer and prevent this redundant copy?
>  
>  
> {code:java}
> let values = (0..left.len())
> .map(|i| op(left.value(i), right.value(i))) 
> .collect::>();
>  
>   
> let data = ArrayData::new(
>   T::get_data_type(),
> left.len(),
> None,
> null_bit_buffer,
> 0,
> vec![Buffer::from(values.to_byte_slice())],
> vec![],
> );{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-10271:
---
Labels: pull-request-available  (was: )

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Assignee: Neville Dipale
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10274) [Rust] arithmetic without SIMD does unnecesary copy

2020-10-11 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-10274:
---
Component/s: Rust

> [Rust] arithmetic without SIMD does unnecesary copy
> ---
>
> Key: ARROW-10274
> URL: https://issues.apache.org/jira/browse/ARROW-10274
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Ritchie
>Priority: Minor
>
> The arithmetic kernels that don't use SIMD create a `vec` in memory and later 
> copy that data into a Buffer. Maybe we could directly write the arithmetic 
> result to a mutable buffer and prevent this redundant copy?
>  
>  
> {code:java}
> let values = (0..left.len())
> .map(|i| op(left.value(i), right.value(i))) 
> .collect::>();
>  
>   
> let data = ArrayData::new(
>   T::get_data_type(),
> left.len(),
> None,
> null_bit_buffer,
> 0,
> vec![Buffer::from(values.to_byte_slice())],
> vec![],
> );{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-10112) [Rust] Implement conversion of ArrowArray to array::Array

2020-10-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Leitão reassigned ARROW-10112:


Assignee: Jorge Leitão

> [Rust] Implement conversion of ArrowArray to array::Array
> -
>
> Key: ARROW-10112
> URL: https://issues.apache.org/jira/browse/ARROW-10112
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Jorge Leitão
>Assignee: Jorge Leitão
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-10113) [Rust] Implement conversion of array::Array to ArrowArray

2020-10-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Leitão reassigned ARROW-10113:


Assignee: Jorge Leitão

> [Rust] Implement conversion of array::Array to ArrowArray
> -
>
> Key: ARROW-10113
> URL: https://issues.apache.org/jira/browse/ARROW-10113
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Jorge Leitão
>Assignee: Jorge Leitão
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10275) [Rust] [Datafusion] GROUP BY with a high cardinality doesn't seem to finish

2020-10-11 Thread Josh Taylor (Jira)
Josh Taylor created ARROW-10275:
---

 Summary: [Rust] [Datafusion] GROUP BY with a high cardinality 
doesn't seem to finish
 Key: ARROW-10275
 URL: https://issues.apache.org/jira/browse/ARROW-10275
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Affects Versions: 2.0.0
 Environment: Ubuntu 20.04
Reporter: Josh Taylor


Group by with a high cardinality (columns with lots of unique values) don't 
seem to finish.

I've tried with both datafusion-cli and this:

[https://github.com/joshuataylor/parquet-group-by/blob/main/src/main.rs]

When doing O_ORDERKEY there are ~15 000 000 unique records, so it seems to 
stall. I've tried with limit but it doesn't work either.

My parquet file: 
https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211839#comment-17211839
 ] 

Neville Dipale commented on ARROW-10271:


I was planning on doing a pass to check if there's dependencies that we could 
bump. I'm aware of the packed_simd_2 change, and was planning on addressing it.

While we use an old nightly (call it a six-monthly at this stage), this issue 
will definitely break a lot of code for users.

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Assignee: Neville Dipale
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-10271:
---
Component/s: Rust

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Assignee: Neville Dipale
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale reassigned ARROW-10271:
--

Assignee: Neville Dipale

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Assignee: Neville Dipale
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-10271:
---
Fix Version/s: 2.0.0

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Priority: Blocker
> Fix For: 2.0.0
>
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-10271:
---
Priority: Blocker  (was: Major)

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Priority: Blocker
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-10271) [Rust] packed_simd is broken and continued under a new project

2020-10-11 Thread Neville Dipale (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-10271:
---
Affects Version/s: 1.0.1

> [Rust] packed_simd is broken and continued under a new project
> --
>
> Key: ARROW-10271
> URL: https://issues.apache.org/jira/browse/ARROW-10271
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Ritchie
>Priority: Major
>
> The dependency doesn't compile on newer versions of nightly. This is also 
> known by the (new) project maintainers. Due to complications they continued 
> the project under a new name: `packed_simd_2`.
>  
> packed_simd = { version = "0.3.4", package = "packed_simd_2" }
>  
> See:
> https://github.com/rust-lang/packed_simd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)