[jira] [Created] (ARROW-10036) [Rust] [DataFusion] Test that the final schema is expected in integration tests

2020-09-17 Thread Jorge (Jira)
Jorge created ARROW-10036:
-

 Summary: [Rust] [DataFusion] Test that the final schema is 
expected in integration tests
 Key: ARROW-10036
 URL: https://issues.apache.org/jira/browse/ARROW-10036
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Jorge


Currently, our integration tests convert a Recordbatch to a string, which we 
use for testing, but they do not test that the final schema matches our 
expectations.

We should add a test for this, which includes:
 # field name
 # field type
 # field nulability

for every field in the schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10035) [C++] Bump versions of vendored code

2020-09-17 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-10035:
--

 Summary: [C++] Bump versions of vendored code
 Key: ARROW-10035
 URL: https://issues.apache.org/jira/browse/ARROW-10035
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10034) [Rust] Master build broken

2020-09-17 Thread Andy Grove (Jira)
Andy Grove created ARROW-10034:
--

 Summary: [Rust] Master build broken
 Key: ARROW-10034
 URL: https://issues.apache.org/jira/browse/ARROW-10034
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 2.0.0


I merged quite a few PRs today. There was a conflict and I need to revert one 
of them. I am working on it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10033) ArrowReaderProperties creates thread pool, even when use_threads=False and pre_buffer=False

2020-09-17 Thread Adam Hooper (Jira)
Adam Hooper created ARROW-10033:
---

 Summary: ArrowReaderProperties creates thread pool, even when 
use_threads=False and pre_buffer=False
 Key: ARROW-10033
 URL: https://issues.apache.org/jira/browse/ARROW-10033
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 1.0.1
Reporter: Adam Hooper


`ArrowReaderProperties` has a `::arrow::io::AsyncContext async_context_;` 
member. Its ctor creates a thread pool.

Stack trace:

```
#0  arrow::internal::ThreadPool::ThreadPool (this=0x232fa90) at 
/src/apache-arrow-1.0.1/cpp/src/arrow/util/thread_pool.cc:121
#1  0x008e4747 in arrow::internal::ThreadPool::Make (threads=8)
at /src/apache-arrow-1.0.1/cpp/src/arrow/util/thread_pool.cc:246
#2  0x008e48c9 in arrow::internal::ThreadPool::MakeEternal (threads=8)
at /src/apache-arrow-1.0.1/cpp/src/arrow/util/thread_pool.cc:252
#3  0x008a20ac in arrow::io::internal::MakeIOThreadPool () at 
/src/apache-arrow-1.0.1/cpp/src/arrow/io/interfaces.cc:326
#4  0x008a21dd in arrow::io::internal::GetIOThreadPool () at 
/src/apache-arrow-1.0.1/cpp/src/arrow/io/interfaces.cc:334
#5  0x008a064f in arrow::io::AsyncContext::AsyncContext (
this=0xea6bb0 
)
at /src/apache-arrow-1.0.1/cpp/src/arrow/io/interfaces.cc:49
#6  0x0048893e in parquet::ArrowReaderProperties::ArrowReaderProperties 
(
this=0xea6b60 
, 
use_threads=false)
at /src/apache-arrow-1.0.1/cpp/src/parquet/properties.h:579
#7  0x005e1b98 in parquet::default_arrow_reader_properties () at 
/src/apache-arrow-1.0.1/cpp/src/parquet/properties.cc:53
#8  0x00414843 in parquet::arrow::FileReaderBuilder::FileReaderBuilder 
(this=0x7fffb31f0c60)
at /src/apache-arrow-1.0.1/cpp/src/parquet/arrow/reader.cc:930
#9  0x00414b10 in parquet::arrow::OpenFile (file=..., pool=0xea6cf0 
, reader=0x7fffb31f0e08)
at /src/apache-arrow-1.0.1/cpp/src/parquet/arrow/reader.cc:957
```

As a caller, I expect `use_threads=False` to prevent the creation of threads. 
(Maybe there should be an exception if `pre_buffer && !use_threads`?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10032) [Documentation] C++ Windows docs are out of date

2020-09-17 Thread David Li (Jira)
David Li created ARROW-10032:


 Summary: [Documentation] C++ Windows docs are out of date
 Key: ARROW-10032
 URL: https://issues.apache.org/jira/browse/ARROW-10032
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: David Li


* The recommended VM does not include the C++ compiler - we should link to the 
build tools and describe which of them needs installation
 * Boost: the b2 script now requires --with not -with flags

Even with this:
 * The developer prompt can't find cl.exe (the compiler)
 * The PowerShell prompt can't use conda (it complains a config file isn't 
signed)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10031) Support Java benchmark in Ursabot

2020-09-17 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-10031:


 Summary: Support Java benchmark in Ursabot
 Key: ARROW-10031
 URL: https://issues.apache.org/jira/browse/ARROW-10031
 Project: Apache Arrow
  Issue Type: New Feature
  Components: CI, Java
Affects Versions: 2.0.0
Reporter: Kazuaki Ishizaki
Assignee: Kazuaki Ishizaki


Based on [the 
suggestion|https://mail-archives.apache.org/mod_mbox/arrow-dev/202008.mbox/%3ccabnn7+q35j7qwshjbx8omdewkt+f1p_m7r1_f6szs4dqc+l...@mail.gmail.com%3e],
 Ursabot will support Java benchmarks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10030) [Rust] Support fromIter and toIter

2020-09-17 Thread Jorge (Jira)
Jorge created ARROW-10030:
-

 Summary: [Rust] Support fromIter and toIter
 Key: ARROW-10030
 URL: https://issues.apache.org/jira/browse/ARROW-10030
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Jorge


Proposal for comments: 
https://docs.google.com/document/d/1d6rV1WmvIH6uW-bcHKrYBSyPddrpXH8Q4CtVfFHtI04/edit?usp=sharing

 

(dump of the proposal:)

Rust Arrow supports two main computational models:
 # Batch Operations, that leverage some form of vectorization
 # Element-by-element operations, that emerge in more complex operations

This document concerns element-by-element operations, that are the most common 
operations outside of the library.
h2. Element-by-element operations

These operations are programmatically written as:
 # Downcast the array to its specific type
 # Initialize buffers
 # Iterate over indices and perform the operation, appending to the buffers 
accordingly
 # Create ArrayData with the required null bitmap, buffers, childs, etc.
 # return ArrayRef from ArrayData

 

We can split this process in 3 parts:
 # Initialization (1 and 2)
 # Iteration (3)
 # Finalization (4 and 5)

Currently, the API that we offer to our users is:
 # as_any() to downcast the array based on its DataType
 # Builders for all types, that users can initialize, matching the downcasted 
array
 # Iterate
 # Use for i in (0..array.len())
 # Use Array::value(i) and Array::is_valid(i)/is_null(i)`
 # use builder.append_value(new_value) or builder.append_null()


 # Finish the builder and wrap the result in an Arc

This API has some issues:
 # value(i) +is unsafe+, even though it is not marked as such
 # builders are usually slow due to the checks that they need to perform
 # The API is not intuitive

h2. Proposal

This proposal aims at improving this API in 2 specific ways:
 * Implement IntoIterator Iterator and Iterator>
 * Implement FromIterator and Item=Option

so that users can write:

 
{code:java}
let array = Int32Array::from(vec![Some(0), None, Some(2), None, Some(4)]);
// to and from iter, with a +1
let result: Int32Array = array

    .iter()

    .map(|e| if let Some(r) = e { Some(r + 1) } else { None })

    .collect();
let expected = Int32Array::from(vec![Some(1), None, Some(3), None, Some(5)]); 
assert_eq!(result, expected);
{code}
 

This results in an API that is:
 # efficient, as it is our responsibility to create `FromIterator` that are 
efficient in populating the buffers/child etc from an iterator
 # Safe, as it does not allow segfaults
 # Simple, as users do not need to worry about Builders, buffers, etc, only 
native Rust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)