[jira] [Created] (ARROW-12437) [Rust] [Ballista] Ballista plans must not include RepartitionExec

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12437:
--

 Summary: [Rust] [Ballista] Ballista plans must not include 
RepartitionExec
 Key: ARROW-12437
 URL: https://issues.apache.org/jira/browse/ARROW-12437
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - Ballista
Reporter: Andy Grove


Ballista plans must not include RepartitionExec because it results in incorrect 
results. Ballista needs to manage its own repartitioning in a distributed-aware 
way later on. For now we just need to configure the DataFusion context to 
disable repartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12436) [Rust][Ballista] Add watch capabilities to config backend trait

2021-04-17 Thread Ximo Guanter (Jira)
Ximo Guanter created ARROW-12436:


 Summary: [Rust][Ballista] Add watch capabilities to config backend 
trait
 Key: ARROW-12436
 URL: https://issues.apache.org/jira/browse/ARROW-12436
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust - Ballista
Reporter: Ximo Guanter


[arrow/lib.rs at 66aa3e7c365a8d4c4eca6e23668f2988e714b493 · apache/arrow 
(github.com)|https://github.com/apache/arrow/blob/66aa3e7c365a8d4c4eca6e23668f2988e714b493/rust/ballista/rust/scheduler/src/lib.rs#L183]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12435) [Rust][DataFusion] Remove unnecessary references to namespace in executor

2021-04-17 Thread Ximo Guanter (Jira)
Ximo Guanter created ARROW-12435:


 Summary: [Rust][DataFusion] Remove unnecessary references to 
namespace in executor
 Key: ARROW-12435
 URL: https://issues.apache.org/jira/browse/ARROW-12435
 Project: Apache Arrow
  Issue Type: Task
  Components: Rust - Ballista
Reporter: Ximo Guanter


There is no need to support multiple executor clusters from a scheduler, so the 
namespace of an executor is implicitly defined by the scheduler it connects to. 
See [https://the-asf.slack.com/archives/C01QUFS30TD/p1618679585211100] for more 
context



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12434) [Rust] [Ballista] Show executed plans with metrics

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12434:
--

 Summary: [Rust] [Ballista] Show executed plans with metrics
 Key: ARROW-12434
 URL: https://issues.apache.org/jira/browse/ARROW-12434
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - Ballista
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 5.0.0


Show executed plans with metrics to help with debugging and performance tuning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12433) [Rust] Builds failing due to new flatbuffer release introducing const generics

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12433:
--

 Summary: [Rust] Builds failing due to new flatbuffer release 
introducing const generics
 Key: ARROW-12433
 URL: https://issues.apache.org/jira/browse/ARROW-12433
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Andy Grove


I filed [https://github.com/google/flatbuffers/issues/6572] but for now we 
should pin the dependency to 0.8.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12432) [Rust] [DataFusion] Add metrics for SortExec

2021-04-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-12432:
--

 Summary: [Rust] [DataFusion] Add metrics for SortExec
 Key: ARROW-12432
 URL: https://issues.apache.org/jira/browse/ARROW-12432
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - DataFusion
Reporter: Andy Grove
 Fix For: 5.0.0


Add metrics for SortExec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12431) [Python] pa.array mask inverted when type is binary and value to be converted in numpy array

2021-04-17 Thread Daniel Nugent (Jira)
Daniel Nugent created ARROW-12431:
-

 Summary: [Python] pa.array mask inverted when type is binary and 
value to be converted in numpy array
 Key: ARROW-12431
 URL: https://issues.apache.org/jira/browse/ARROW-12431
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Daniel Nugent


{code:python}
Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)   

[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import pyarrow as pa
>>>
>>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([False]))

[
  null
]
>>> pa.array(np.array([b'\x00']),type=pa.binary(1), mask = np.array([True]))

[
  00
]
>>> pa.array([b'\x00'],type=pa.binary(1), mask = np.array([False]))

[
  00
]
>>> pa.__version__
'3.0.0'
>>> np.__version__
'1.20.1'
{code}

Happens both with FixedSizeBinary and variable sized binary (I was working with 
FixedSizeBinary). Does not happen for integers (presumably other types, didn't 
exhaustively check)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)