[arrow-rs] branch llvm-cov created (now 2a3d561c9)

2022-07-31 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch llvm-cov
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


  at 2a3d561c9 Check if llvm-cov will run on CI

This branch includes the following new commits:

 new 2a3d561c9 Check if llvm-cov will run on CI

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-rs] 01/01: Check if llvm-cov will run on CI

2022-07-31 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch llvm-cov
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 2a3d561c9e79381230ff9bf4d5670f4e549d5e74
Author: Wakahisa 
AuthorDate: Mon Aug 1 00:10:05 2022 +0200

Check if llvm-cov will run on CI
---
 .github/workflows/rust.yml | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/.github/workflows/rust.yml b/.github/workflows/rust.yml
index 8464a22b6..bd63efe02 100644
--- a/.github/workflows/rust.yml
+++ b/.github/workflows/rust.yml
@@ -76,24 +76,22 @@ jobs:
 arch: [ amd64 ]
 rust: [ stable ]
 steps:
-  - uses: actions/checkout@v2
+  - uses: actions/checkout@v3
 with:
   submodules: true
   - name: Setup Rust toolchain
 run: |
-  rustup toolchain install ${{ matrix.rust }}
+  rustup toolchain install ${{ matrix.rust }} --component 
llvm-tools-preview
   rustup default ${{ matrix.rust }}
   - name: Cache Cargo
 uses: actions/cache@v3
 with:
   path: /home/runner/.cargo
   key: cargo-coverage-cache3-
+  - name: Install cargo-llvm-cov
+uses: taiki-e/install-action@cargo-llvm-cov
   - name: Run coverage
-run: |
-  rustup toolchain install stable
-  rustup default stable
-  cargo install --version 0.18.2 cargo-tarpaulin
-  cargo tarpaulin --all --out Xml
+run: cargo llvm-cov --all-features --workspace --lcov --output-path 
lcov.info
   - name: Report coverage
 continue-on-error: true
 run: bash <(curl -s https://codecov.io/bash)



[arrow-rs] branch master updated: Make `Schema::fields` and `Schema::metadata` `pub` (#2239)

2022-07-31 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 3032a521c Make `Schema::fields` and `Schema::metadata` `pub` (#2239)
3032a521c is described below

commit 3032a521c9691d4569a9d277046304bd4e4098fb
Author: Andrew Lamb 
AuthorDate: Sun Jul 31 18:05:00 2022 -0400

Make `Schema::fields` and `Schema::metadata` `pub` (#2239)
---
 arrow/src/datatypes/schema.rs |  4 ++--
 arrow/tests/schema.rs | 46 +++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arrow/src/datatypes/schema.rs b/arrow/src/datatypes/schema.rs
index 1574b1654..f1f28d611 100644
--- a/arrow/src/datatypes/schema.rs
+++ b/arrow/src/datatypes/schema.rs
@@ -33,11 +33,11 @@ use super::Field;
 /// memory layout.
 #[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
 pub struct Schema {
-pub(crate) fields: Vec,
+pub fields: Vec,
 /// A map of key-value pairs containing additional meta data.
 #[serde(skip_serializing_if = "HashMap::is_empty")]
 #[serde(default)]
-pub(crate) metadata: HashMap,
+pub metadata: HashMap,
 }
 
 impl Schema {
diff --git a/arrow/tests/schema.rs b/arrow/tests/schema.rs
new file mode 100644
index 0..ff544b689
--- /dev/null
+++ b/arrow/tests/schema.rs
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::datatypes::{DataType, Field, Schema};
+use std::collections::HashMap;
+/// The tests in this file ensure a `Schema` can be manipulated
+/// outside of the arrow crate
+
+#[test]
+fn schema_destructure() {
+let meta = [("foo".to_string(), "baz".to_string())]
+.into_iter()
+.collect::>();
+
+let field = Field::new("c1", DataType::Utf8, false);
+let schema = Schema::new(vec![field]).with_metadata(meta);
+
+// Destructuring a Schema allows rewriting fields and metadata
+// without copying
+//
+// Model this usecase below:
+
+let Schema {
+mut fields,
+metadata,
+} = schema;
+fields.push(Field::new("c2", DataType::Utf8, false));
+
+let new_schema = Schema::new(fields).with_metadata(metadata);
+
+assert_eq!(new_schema.fields().len(), 2);
+}



[arrow-rs] branch master updated: fix the doc of value_length (#1957)

2022-06-28 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new d6fc77870 fix the doc of value_length (#1957)
d6fc77870 is described below

commit d6fc77870974e8d468689aab94179e738072314e
Author: Remzi Yang <59198230+haoyang...@users.noreply.github.com>
AuthorDate: Wed Jun 29 12:21:05 2022 +0800

fix the doc of value_length (#1957)

Signed-off-by: remzi <1371656737...@gmail.com>
---
 arrow/src/array/array_list.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arrow/src/array/array_list.rs b/arrow/src/array/array_list.rs
index 709e4e7ba..36ad30715 100644
--- a/arrow/src/array/array_list.rs
+++ b/arrow/src/array/array_list.rs
@@ -381,9 +381,9 @@ impl FixedSizeListArray {
 self.value_offset_at(self.data.offset() + i)
 }
 
-/// Returns the length for value at index `i`.
+/// Returns the length for an element.
 ///
-/// Note this doesn't do any bound checking, for performance reason.
+/// All elements have the same length as the array is a fixed size.
 #[inline]
 pub const fn value_length() -> i32 {
 self.length



[arrow-rs] branch master updated: Update indexmap dependency (#1929)

2022-06-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 27963e758 Update indexmap dependency (#1929)
27963e758 is described below

commit 27963e758cf14b437c6ba40016f5ac732a4bca6d
Author: Raphael Taylor-Davies <1781103+tustv...@users.noreply.github.com>
AuthorDate: Thu Jun 23 21:03:13 2022 +0100

Update indexmap dependency (#1929)
---
 arrow/Cargo.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arrow/Cargo.toml b/arrow/Cargo.toml
index 136a2ae02..944dda9eb 100644
--- a/arrow/Cargo.toml
+++ b/arrow/Cargo.toml
@@ -41,7 +41,7 @@ bench = false
 serde = { version = "1.0", default-features = false }
 serde_derive = { version = "1.0", default-features = false }
 serde_json = { version = "1.0", default-features = false, features = 
["preserve_order"] }
-indexmap = { version = "1.6", default-features = false, features = ["std"] }
+indexmap = { version = "1.9", default-features = false, features = ["std"] }
 rand = { version = "0.8", default-features = false, features =  ["std", 
"std_rng"], optional = true }
 num = { version = "0.4", default-features = false, features = ["std"] }
 half = { version = "2.0", default-features = false }



[arrow-rs] branch master updated: Add ArrowWriter doctest (#1927) (#1930)

2022-06-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new f8afc1424 Add ArrowWriter doctest (#1927) (#1930)
f8afc1424 is described below

commit f8afc1424a729c390df6b69e585db7274498106b
Author: Raphael Taylor-Davies <1781103+tustv...@users.noreply.github.com>
AuthorDate: Thu Jun 23 20:03:24 2022 +0100

Add ArrowWriter doctest (#1927) (#1930)
---
 parquet/src/arrow/arrow_writer/mod.rs | 21 +
 1 file changed, 21 insertions(+)

diff --git a/parquet/src/arrow/arrow_writer/mod.rs 
b/parquet/src/arrow/arrow_writer/mod.rs
index 83f1bc70b..a18098ff1 100644
--- a/parquet/src/arrow/arrow_writer/mod.rs
+++ b/parquet/src/arrow/arrow_writer/mod.rs
@@ -48,6 +48,27 @@ mod levels;
 /// to produce row groups with `max_row_group_size` rows. Any remaining rows 
will be
 /// flushed on close, leading the final row group in the output file to 
potentially
 /// contain fewer than `max_row_group_size` rows
+///
+/// ```
+/// # use std::sync::Arc;
+/// # use bytes::Bytes;
+/// # use arrow::array::{ArrayRef, Int64Array};
+/// # use arrow::record_batch::RecordBatch;
+/// # use parquet::arrow::{ArrowReader, ArrowWriter, ParquetFileArrowReader};
+/// let col = Arc::new(Int64Array::from_iter_values([1, 2, 3])) as ArrayRef;
+/// let to_write = RecordBatch::try_from_iter([("col", col)]).unwrap();
+///
+/// let mut buffer = Vec::new();
+/// let mut writer = ArrowWriter::try_new( buffer, to_write.schema(), 
None).unwrap();
+/// writer.write(_write).unwrap();
+/// writer.close().unwrap();
+///
+/// let mut reader = 
ParquetFileArrowReader::try_new(Bytes::from(buffer)).unwrap();
+/// let mut reader = reader.get_record_reader(1024).unwrap();
+/// let read = reader.next().unwrap().unwrap();
+///
+/// assert_eq!(to_write, read);
+/// ```
 pub struct ArrowWriter {
 /// Underlying Parquet writer
 writer: SerializedFileWriter,



[arrow-rs] branch master updated (fcf655e19 -> 9bcd052bd)

2022-06-13 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from fcf655e19 Zero copy page decoding from bytes (#1810)
 add 9bcd052bd Omit validity buffer in PrimitiveArray::from_iter when all 
values are valid (#1859)

No new revisions were added by this update.

Summary of changes:
 arrow/src/array/array.rs   |  4 +++-
 arrow/src/array/array_primitive.rs | 30 +++---
 2 files changed, 26 insertions(+), 8 deletions(-)



[arrow-rs] branch master updated: Remove simd and avx512 bitwise kernels in favor of autovectorization (#1830)

2022-06-12 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new fb697ce43 Remove simd and avx512 bitwise kernels in favor of 
autovectorization (#1830)
fb697ce43 is described below

commit fb697ce4351fae39ebac810508ecc31583c6cdd7
Author: Jörn Horstmann 
AuthorDate: Sun Jun 12 19:09:02 2022 +0200

Remove simd and avx512 bitwise kernels in favor of autovectorization (#1830)

* Remove simd and avx512 bitwise kernels since they are actually slightly 
slower than the autovectorized version

* Add notes about target-cpu to README
---
 arrow/Cargo.toml|   1 -
 arrow/README.md |  14 ++
 arrow/benches/buffer_bit_ops.rs |  61 ++--
 arrow/src/arch/avx512.rs|  73 --
 arrow/src/arch/mod.rs   |  22 ---
 arrow/src/buffer/ops.rs | 307 +---
 arrow/src/lib.rs|   4 -
 7 files changed, 69 insertions(+), 413 deletions(-)

diff --git a/arrow/Cargo.toml b/arrow/Cargo.toml
index ebcdd9e7a..3f69888d5 100644
--- a/arrow/Cargo.toml
+++ b/arrow/Cargo.toml
@@ -61,7 +61,6 @@ bitflags = "1.2.1"
 
 [features]
 default = ["csv", "ipc", "test_utils"]
-avx512 = []
 csv = ["csv_crate"]
 ipc = ["flatbuffers"]
 simd = ["packed_simd"]
diff --git a/arrow/README.md b/arrow/README.md
index 67de57ff0..28240e77d 100644
--- a/arrow/README.md
+++ b/arrow/README.md
@@ -100,3 +100,17 @@ cargo run --example read_csv
 ```
 
 [arrow]: https://arrow.apache.org/
+
+
+## Performance
+
+Most of the compute kernels benefit a lot from being optimized for a specific 
CPU target.
+This is especially so on x86-64 since without specifying a target the compiler 
can only assume support for SSE2 vector instructions.
+One of the following values as `-Ctarget-cpu=value` in `RUSTFLAGS` can 
therefore improve performance significantly:
+
+ - `native`: Target the exact features of the cpu that the build is running on.
+   This should give the best performance when building and running locally, 
but should be used carefully for example when building in a CI pipeline or when 
shipping pre-compiled software. 
+ - `x86-64-v3`: Includes AVX2 support and is close to the intel `haswell` 
architecture released in 2013 and should be supported by any recent Intel or 
Amd cpu.
+ - `x86-64-v4`: Includes AVX512 support available on intel `skylake` server 
and `icelake`/`tigerlake`/`rocketlake` laptop and desktop processors.
+
+These flags should be used in addition to the `simd` feature, since they will 
also affect the code generated by the simd library. 
\ No newline at end of file
diff --git a/arrow/benches/buffer_bit_ops.rs b/arrow/benches/buffer_bit_ops.rs
index 063f39c92..6c6bb0463 100644
--- a/arrow/benches/buffer_bit_ops.rs
+++ b/arrow/benches/buffer_bit_ops.rs
@@ -17,11 +17,14 @@
 
 #[macro_use]
 extern crate criterion;
-use criterion::Criterion;
+
+use criterion::{Criterion, Throughput};
 
 extern crate arrow;
 
-use arrow::buffer::{Buffer, MutableBuffer};
+use arrow::buffer::{
+buffer_bin_and, buffer_bin_or, buffer_unary_not, Buffer, MutableBuffer,
+};
 
 ///  Helper function to create arrays
 fn create_buffer(size: usize) -> Buffer {
@@ -42,17 +45,59 @@ fn bench_buffer_or(left: , right: ) {
 criterion::black_box((left | right).unwrap());
 }
 
+fn bench_buffer_not(buffer: ) {
+criterion::black_box(!buffer);
+}
+
+fn bench_buffer_and_with_offsets(
+left: ,
+left_offset: usize,
+right: ,
+right_offset: usize,
+len: usize,
+) {
+criterion::black_box(buffer_bin_and(left, left_offset, right, 
right_offset, len));
+}
+
+fn bench_buffer_or_with_offsets(
+left: ,
+left_offset: usize,
+right: ,
+right_offset: usize,
+len: usize,
+) {
+criterion::black_box(buffer_bin_or(left, left_offset, right, right_offset, 
len));
+}
+
+fn bench_buffer_not_with_offsets(buffer: , offset: usize, len: usize) {
+criterion::black_box(buffer_unary_not(buffer, offset, len));
+}
+
 fn bit_ops_benchmark(c:  Criterion) {
 let left = create_buffer(512 * 10);
 let right = create_buffer(512 * 10);
 
-c.bench_function("buffer_bit_ops and", |b| {
-b.iter(|| bench_buffer_and(, ))
-});
+c.benchmark_group("buffer_binary_ops")
+.throughput(Throughput::Bytes(3 * left.len() as u64))
+.bench_function("and", |b| b.iter(|| bench_buffer_and(, )))
+.bench_function("or", |b| b.iter(|| bench_buffer_or(, )))
+.bench_function("and_with_offset", |b| {
+b.iter(|| {
+bench_buffer_and_with_offsets(, 1, , 2, left.len() 
* 8 - 5)
+})
+})
+.bench_function("or_with_offset", |b| {
+b.iter(|

[arrow-rs] branch master updated: Read and skip validity buffer of UnionType Array for V4 ipc message (#1789)

2022-06-05 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 73d552a7c Read and skip validity buffer of UnionType Array for V4 ipc 
message (#1789)
73d552a7c is described below

commit 73d552a7cc794d0e3eaa3e5333e5bc1c98deeb45
Author: Liang-Chi Hsieh 
AuthorDate: Sun Jun 5 02:00:44 2022 -0700

Read and skip validity buffer of UnionType Array for V4 ipc message (#1789)

* Read valididy buffer for V4 ipc message

* Add unit test

* Fix clippy
---
 arrow-flight/src/utils.rs  |  1 +
 arrow/src/ipc/reader.rs| 31 --
 arrow/src/ipc/writer.rs| 48 ++
 .../flight_client_scenarios/integration_test.rs|  1 +
 .../flight_server_scenarios/integration_test.rs| 10 -
 5 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/arrow-flight/src/utils.rs b/arrow-flight/src/utils.rs
index 77526917f..dda3fc7fe 100644
--- a/arrow-flight/src/utils.rs
+++ b/arrow-flight/src/utils.rs
@@ -71,6 +71,7 @@ pub fn flight_data_to_arrow_batch(
 schema,
 dictionaries_by_id,
 None,
+(),
 )
 })?
 }
diff --git a/arrow/src/ipc/reader.rs b/arrow/src/ipc/reader.rs
index 03a960c4c..868098327 100644
--- a/arrow/src/ipc/reader.rs
+++ b/arrow/src/ipc/reader.rs
@@ -52,6 +52,7 @@ fn read_buffer(buf: ::Buffer, a_data: &[u8]) -> Buffer {
 /// - check if the bit width of non-64-bit numbers is 64, and
 /// - read the buffer as 64-bit (signed integer or float), and
 /// - cast the 64-bit array to the appropriate data type
+#[allow(clippy::too_many_arguments)]
 fn create_array(
 nodes: &[ipc::FieldNode],
 field: ,
@@ -60,6 +61,7 @@ fn create_array(
 dictionaries_by_id: ,
 mut node_index: usize,
 mut buffer_index: usize,
+metadata: ::MetadataVersion,
 ) -> Result<(ArrayRef, usize, usize)> {
 use DataType::*;
 let data_type = field.data_type();
@@ -106,6 +108,7 @@ fn create_array(
 dictionaries_by_id,
 node_index,
 buffer_index,
+metadata,
 )?;
 node_index = triple.1;
 buffer_index = triple.2;
@@ -128,6 +131,7 @@ fn create_array(
 dictionaries_by_id,
 node_index,
 buffer_index,
+metadata,
 )?;
 node_index = triple.1;
 buffer_index = triple.2;
@@ -153,6 +157,7 @@ fn create_array(
 dictionaries_by_id,
 node_index,
 buffer_index,
+metadata,
 )?;
 node_index = triple.1;
 buffer_index = triple.2;
@@ -201,6 +206,13 @@ fn create_array(
 
 let len = union_node.length() as usize;
 
+// In V4, union types has validity bitmap
+// In V5 and later, union types have no validity bitmap
+if metadata < ::MetadataVersion::V5 {
+read_buffer([buffer_index], data);
+buffer_index += 1;
+}
+
 let type_ids: Buffer =
 read_buffer([buffer_index], data)[..len].into();
 
@@ -226,6 +238,7 @@ fn create_array(
 dictionaries_by_id,
 node_index,
 buffer_index,
+metadata,
 )?;
 
 node_index = triple.1;
@@ -582,6 +595,7 @@ pub fn read_record_batch(
 schema: SchemaRef,
 dictionaries_by_id: ,
 projection: Option<&[usize]>,
+metadata: ::MetadataVersion,
 ) -> Result {
 let buffers = batch.buffers().ok_or_else(|| {
 ArrowError::IoError("Unable to get buffers from IPC 
RecordBatch".to_string())
@@ -607,6 +621,7 @@ pub fn read_record_batch(
 dictionaries_by_id,
 node_index,
 buffer_index,
+metadata,
 )?;
 node_index = triple.1;
 buffer_index = triple.2;
@@ -640,6 +655,7 @@ pub fn read_record_batch(
 dictionaries_by_id,
 node_index,
 buffer_index,
+metadata,
 )?;
 node_index = triple.1;
 buffer_index = triple.2;
@@ -656,6 +672,7 @@ pub fn read_dictionary(
 batch: ipc::DictionaryBatch,
 schema: ,
 dictionaries_by_id:  HashMap,
+metadata: ::MetadataVersion,
 ) -> Result<()> {
 if batch.isDelta() {
 return Err(ArrowError::IoError(
@@ -686,6 +703,7 @@ pub fn read_dictionary(
 Arc::new(schema),
 dictionaries_

[arrow-rs] branch master updated (2a12e5043 -> c1a91dc6d)

2022-06-01 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


from 2a12e5043 Revert "Pin nightly version to bypass packed_simd build 
error (#1743)" (#1771)
 add c1a91dc6d Improve ParquetFileArrowReader UX (#1773)

No new revisions were added by this update.

Summary of changes:
 parquet/src/arrow/arrow_reader.rs | 12 
 1 file changed, 12 insertions(+)



[arrow-rs] branch master updated: Support casting Utf8 to Boolean (#1738)

2022-05-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 486118cfa Support casting Utf8 to Boolean (#1738)
486118cfa is described below

commit 486118cfa9bc1435edc1745f4025f963712bf631
Author: Alex Qyoun-ae <4062971+mazterq...@users.noreply.github.com>
AuthorDate: Mon May 30 10:51:45 2022 +0400

Support casting Utf8 to Boolean (#1738)
---
 arrow/src/compute/kernels/cast.rs | 69 +++
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/arrow/src/compute/kernels/cast.rs 
b/arrow/src/compute/kernels/cast.rs
index 26aacff0b..93a8ebcb6 100644
--- a/arrow/src/compute/kernels/cast.rs
+++ b/arrow/src/compute/kernels/cast.rs
@@ -161,7 +161,7 @@ pub fn can_cast_types(from_type: , to_type: 
) -> bool {
 (Dictionary(_, value_type), _) => can_cast_types(value_type, to_type),
 (_, Dictionary(_, value_type)) => can_cast_types(from_type, 
value_type),
 
-(_, Boolean) => DataType::is_numeric(from_type),
+(_, Boolean) => DataType::is_numeric(from_type) || from_type == ,
 (Boolean, _) => DataType::is_numeric(to_type) || to_type == ,
 
 (Utf8, LargeUtf8) => true,
@@ -280,6 +280,8 @@ pub fn can_cast_types(from_type: , to_type: 
) -> bool {
 ///
 /// Behavior:
 /// * Boolean to Utf8: `true` => '1', `false` => `0`
+/// * Utf8 to boolean: `true`, `yes`, `on`, `1` => `true`, `false`, `no`, 
`off`, `0` => `false`,
+///   short variants are accepted, other strings return null or error
 /// * Utf8 to numeric: strings that can't be parsed to numbers return null, 
float strings
 ///   in integer casts return null
 /// * Numeric to boolean: 0 returns `false`, any other value returns `true`
@@ -293,7 +295,6 @@ pub fn can_cast_types(from_type: , to_type: 
) -> bool {
 /// Unsupported Casts
 /// * To or from `StructArray`
 /// * List to primitive
-/// * Utf8 to boolean
 /// * Interval and duration
 pub fn cast(array: , to_type: ) -> Result {
 cast_with_options(array, to_type, _CAST_OPTIONS)
@@ -396,6 +397,8 @@ macro_rules! cast_decimal_to_float {
 ///
 /// Behavior:
 /// * Boolean to Utf8: `true` => '1', `false` => `0`
+/// * Utf8 to boolean: `true`, `yes`, `on`, `1` => `true`, `false`, `no`, 
`off`, `0` => `false`,
+///   short variants are accepted, other strings return null or error
 /// * Utf8 to numeric: strings that can't be parsed to numbers return null, 
float strings
 ///   in integer casts return null
 /// * Numeric to boolean: 0 returns `false`, any other value returns `true`
@@ -409,7 +412,6 @@ macro_rules! cast_decimal_to_float {
 /// Unsupported Casts
 /// * To or from `StructArray`
 /// * List to primitive
-/// * Utf8 to boolean
 pub fn cast_with_options(
 array: ,
 to_type: ,
@@ -643,10 +645,7 @@ pub fn cast_with_options(
 Int64 => cast_numeric_to_bool::(array),
 Float32 => cast_numeric_to_bool::(array),
 Float64 => cast_numeric_to_bool::(array),
-Utf8 => Err(ArrowError::CastError(format!(
-"Casting from {:?} to {:?} not supported",
-from_type, to_type,
-))),
+Utf8 => cast_utf8_to_boolean(array, cast_options),
 _ => Err(ArrowError::CastError(format!(
 "Casting from {:?} to {:?} not supported",
 from_type, to_type,
@@ -1661,6 +1660,34 @@ fn cast_string_to_timestamp_ns(
 Ok(Arc::new(array) as ArrayRef)
 }
 
+/// Casts Utf8 to Boolean
+fn cast_utf8_to_boolean(from: , cast_options: ) -> 
Result {
+let array = as_string_array(from);
+
+let output_array = array
+.iter()
+.map(|value| match value {
+Some(value) => match value.to_ascii_lowercase().trim() {
+"t" | "tr" | "tru" | "true" | "y" | "ye" | "yes" | "on" | "1" 
=> {
+Ok(Some(true))
+}
+"f" | "fa" | "fal" | "fals" | "false" | "n" | "no" | "of" | 
"off"
+| "0" => Ok(Some(false)),
+invalid_value => match cast_options.safe {
+true => Ok(None),
+false => Err(ArrowError::CastError(format!(
+"Cannot cast string '{}' to value of Boolean type",
+invalid_value,
+))),
+},
+},
+None => Ok(None),
+})
+.collect::>()?;
+
+Ok(Arc::new(output_array))
+}
+
 /// Cast numeric types to Boolean
 ///
 /// Any zero value retu

[arrow-rs] branch master updated: Read/Write nested dictionaries under FixedSizeList in IPC (#1610)

2022-04-25 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new cbd0303c6 Read/Write nested dictionaries under FixedSizeList in IPC 
(#1610)
cbd0303c6 is described below

commit cbd0303c69d66d4c683fea29787c8d03c8942568
Author: Liang-Chi Hsieh 
AuthorDate: Sun Apr 24 23:15:57 2022 -0700

Read/Write nested dictionaries under FixedSizeList in IPC (#1610)

* Read/Write nested dictionaries under FixedSizeList in IPC

* Fix clippy
---
 arrow/src/ipc/reader.rs | 39 +++
 arrow/src/ipc/writer.rs | 16 ++--
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arrow/src/ipc/reader.rs b/arrow/src/ipc/reader.rs
index 33d608576..8a26167db 100644
--- a/arrow/src/ipc/reader.rs
+++ b/arrow/src/ipc/reader.rs
@@ -1573,4 +1573,43 @@ mod tests {
 offsets,
 );
 }
+
+#[test]
+fn test_roundtrip_stream_dict_of_fixed_size_list_of_dict() {
+let values = StringArray::from(vec![Some("a"), None, Some("c"), None]);
+let keys = Int8Array::from_iter_values([0, 0, 1, 2, 0, 1, 3, 1, 2]);
+let dict_array = DictionaryArraytry_new(, 
).unwrap();
+let dict_data = dict_array.data();
+
+let list_data_type = DataType::FixedSizeList(
+Box::new(Field::new_dict(
+"item",
+DataType::Dictionary(Box::new(DataType::Int8), 
Box::new(DataType::Utf8)),
+true,
+1,
+false,
+)),
+3,
+);
+let list_data = ArrayData::builder(list_data_type)
+.len(3)
+.add_child_data(dict_data.clone())
+.build()
+.unwrap();
+let list_array = FixedSizeListArray::from(list_data);
+
+let keys_for_dict = Int8Array::from_iter_values([0, 1, 0, 1, 1, 2, 0, 
1, 2]);
+let dict_dict_array =
+DictionaryArraytry_new(_for_dict, 
_array).unwrap();
+
+let schema = Arc::new(Schema::new(vec![Field::new(
+"f1",
+dict_dict_array.data_type().clone(),
+false,
+)]));
+let input_batch =
+RecordBatch::try_new(schema, 
vec![Arc::new(dict_dict_array)]).unwrap();
+let output_batch = roundtrip_ipc_stream(_batch);
+assert_eq!(input_batch, output_batch);
+}
 }
diff --git a/arrow/src/ipc/writer.rs b/arrow/src/ipc/writer.rs
index 1f73d16d2..efc878a12 100644
--- a/arrow/src/ipc/writer.rs
+++ b/arrow/src/ipc/writer.rs
@@ -27,7 +27,7 @@ use flatbuffers::FlatBufferBuilder;
 
 use crate::array::{
 as_large_list_array, as_list_array, as_map_array, as_struct_array, 
as_union_array,
-make_array, Array, ArrayData, ArrayRef,
+make_array, Array, ArrayData, ArrayRef, FixedSizeListArray,
 };
 use crate::buffer::{Buffer, MutableBuffer};
 use crate::datatypes::*;
@@ -147,7 +147,6 @@ impl IpcDataGenerator {
 dictionary_tracker:  DictionaryTracker,
 write_options: ,
 ) -> Result<()> {
-// TODO: Handle other nested types (FixedSizeList)
 match column.data_type() {
 DataType::Struct(fields) => {
 let s = as_struct_array(column);
@@ -181,6 +180,19 @@ impl IpcDataGenerator {
 write_options,
 )?;
 }
+DataType::FixedSizeList(field, _) => {
+let list = column
+.as_any()
+.downcast_ref::()
+.expect("Unable to downcast to fixed size list array");
+self.encode_dictionaries(
+field,
+(),
+encoded_dictionaries,
+dictionary_tracker,
+write_options,
+)?;
+}
 DataType::Map(field, _) => {
 let map_array = as_map_array(column);
 



[arrow-rs] branch master updated: Parquet: schema validation should allow scale == precision for decimal type (#1607)

2022-04-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 4e22b8901 Parquet: schema validation should allow scale == precision 
for decimal type (#1607)
4e22b8901 is described below

commit 4e22b890189762c22a1d33ad8fc9662c8582977c
Author: Chao Sun 
AuthorDate: Fri Apr 22 23:42:01 2022 -0700

Parquet: schema validation should allow scale == precision for decimal type 
(#1607)
---
 parquet/src/schema/types.rs | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/parquet/src/schema/types.rs b/parquet/src/schema/types.rs
index 8ae3c4c6e..b156bb671 100644
--- a/parquet/src/schema/types.rs
+++ b/parquet/src/schema/types.rs
@@ -467,13 +467,13 @@ impl<'a> PrimitiveTypeBuilder<'a> {
 return Err(general_err!("Invalid DECIMAL scale: {}", self.scale));
 }
 
-if self.scale >= self.precision {
+if self.scale > self.precision {
 return Err(general_err!(
-"Invalid DECIMAL: scale ({}) cannot be greater than or equal to 
precision \
+"Invalid DECIMAL: scale ({}) cannot be greater than precision \
  ({})",
-self.scale,
-self.precision
-));
+self.scale,
+self.precision
+));
 }
 
 // Check precision and scale based on physical type limitations.
@@ -1345,10 +1345,19 @@ mod tests {
 if let Err(e) = result {
 assert_eq!(
 format!("{}", e),
-"Parquet error: Invalid DECIMAL: scale (2) cannot be greater 
than or equal to precision (1)"
+"Parquet error: Invalid DECIMAL: scale (2) cannot be greater 
than precision (1)"
 );
 }
 
+// It is OK if precision == scale
+result = Type::primitive_type_builder("foo", PhysicalType::BYTE_ARRAY)
+.with_repetition(Repetition::REQUIRED)
+.with_converted_type(ConvertedType::DECIMAL)
+.with_precision(1)
+.with_scale(1)
+.build();
+assert!(result.is_ok());
+
 result = Type::primitive_type_builder("foo", PhysicalType::INT32)
 .with_repetition(Repetition::REQUIRED)
 .with_converted_type(ConvertedType::DECIMAL)



[arrow-datafusion] 01/01: add a Tablesource

2022-04-15 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit 724f4e3363289607fed44ce30e9a1992df55d58a
Author: Wakahisa 
AuthorDate: Mon Feb 14 22:50:05 2022 +0200

add a Tablesource

Tablesource contains more information about the source of the table.
It can be a relational table, file(s), in-memory or unspecified.
---
 datafusion/core/src/datasource/datasource.rs | 34 
 1 file changed, 34 insertions(+)

diff --git a/datafusion/core/src/datasource/datasource.rs 
b/datafusion/core/src/datasource/datasource.rs
index 1b59c857f..48a2dc09e 100644
--- a/datafusion/core/src/datasource/datasource.rs
+++ b/datafusion/core/src/datasource/datasource.rs
@@ -55,6 +55,35 @@ pub enum TableType {
 Temporary,
 }
 
+/// Indicates the source of this table for metadata/catalog purposes.
+#[derive(Debug, Clone, PartialEq)]
+pub enum TableSource {
+/// An ordinary physical table.
+Relational {
+///
+server: Option,
+///
+database: Option,
+///
+schema: Option,
+///
+table: String
+},
+/// A file on some file system
+File {
+///
+protocol: String,
+///
+path: String,
+///
+format: String,
+},
+/// A transient table.
+InMemory,
+/// An unspecified source, used as the default
+Unspecified,
+}
+
 /// Source table
 #[async_trait]
 pub trait TableProvider: Sync + Send {
@@ -70,6 +99,11 @@ pub trait TableProvider: Sync + Send {
 TableType::Base
 }
 
+/// The source of this table
+fn table_source() -> TableSource {
+TableSource::Unspecified
+}
+
 /// Create an ExecutionPlan that will scan the table.
 /// The table provider will be usually responsible of grouping
 /// the source data into partitions that can be efficiently



[arrow-datafusion] branch rdbms-changes updated (e6614aa8f -> 724f4e336)

2022-04-15 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


 discard e6614aa8f add a Tablesource
 add bed81eade MINOR: fix concat_ws corner bug (#2128)
 add 536210d73 fix df union all bug (#2108)
 add d54ba4e64 feat: 2061 create external table ddl table partition cols 
(#2099)
 add 88dd6ca3d Update sqlparser requirement from 0.15 to 0.16 (#2152)
 add a0d8b6633 cli: add cargo.lock (#2112)
 add fa5cef8c9 Fixed parquet path partitioning when only selecting 
partitioned columns (#2000)
 add 69ba713c4 #2109 schema infer max (#2139)
 add 5ae343404 [MINOR] after sqlparser update to 0.16, enable EXTRACT week. 
(#2157)
 add f99c2719a Update quarterly roadmap for Q2 (#2133)
 add 2a4a835bd fix:  incorrect memory usage track for sort (#2135)
 add ceffb2fca Reduce SortExec memory usage by void constructing single 
huge batch (#2132)
 add 823011590 Add IF NOT EXISTS to `CREATE TABLE` and `CREATE EXTERNAL 
TABLE` (#2143)
 add 38498b7bf Reduce repetition in Decimal binary kernels, upgrade to 
arrow 11.1 (#2107)
 add 8b09a5c6c Add CREATE DATABASE command to SQL (#2094)
 add b890190a6 Add Coalesce function (#1969)
 add 0c4ffd4f7 Add delimiter for create external table (#2162)
 add ea16c30ed [MINOR] ignore suspicious slow test in Ballista (#2167)
 add e5e8125a1 Serialize scalar UDFs in physical plan (#2130)
 add f0200b0a9 [CLI] Add show tables for datafusion-cli (#2137)
 add 0da1f370f minor: Avoid per cell evaluation in Coalesce, use zip in 
CaseWhen (#2171)
 add 6504d2a78 enable explain for ballista (#2163)
 add fa9e01641 Implement fast path of with_new_children() in ExecutionPlan 
(#2168)
 add ddf29f112 implement 'StringConcat' operator to support sql like 
"select 'aa' || 'b' " (#2142)
 add 9815ac6ec Handle merged schemas in parquet pruning (#2170)
 add 70f2b1a9b add ballista plugin manager and udf plugin (#2131)
 add 9cbde6d0e cli: update lockfile (#2178)
 add dec9adcbe Optimize the evaluation of `IN` for large lists using InSet 
(#2156)
 add a63751494 fix: Sort with a lot of repetition values (#2182)
 add 2d908405f fix 'not' expression will 'NULL' constants (#2144)
 add 41d2ff2aa Make PhysicalAggregateExprNode has repeated PhysicalExprNode 
(#2184)
 add 73ed545b7 refactor: simplify `prepare_select_exprs` (#2190)
 add 7558a5591 make nightly clippy happy (#2186)
 add c46c91ff3 Multiple row-layout support, part-1: Restructure code for 
clearness (#2189)
 add 28a6da3d2 MINOR: handle `NULL` in advance to avoid value copy in 
`string_concat` (#2183)
 add f3360d30b Remove tokio::spawn from WindowAggExec (#2201) (#2203)
 add ee95d41cc Add LogicalPlan::SubqueryAlias (#2172)
 add 6d75948b6 Use `filter` (filter_record_batch) instead of `take` to 
avoid using indices (#2218)
 add 231027274 feat: Support simple Arrays with Literals (#2194)
 add d81657de0 `case when` supports `NULL`  constant (#2197)
 add 7a6317a0e Add single line description of ExecutionPlan (#2216) (#2217)
 add f39692932 Make ParquetExec usable outside of a tokio runtime (#2201) 
(#2202)
 add 8058fbb38 Remove tokio::spawn from HashAggregateExec (#2201) (#2215)
 add 774b91bad minor refactor to avoid repeated code (#)
 add e7b08ed0e Range scan support for ParquetExec (#1990)
 add b1a28d077 update cli readme (#2220)
 add 8d5bb47f5 add sql level test for decimal data type (#2200)
 add d631a9ca2 chore: add `debug!` log in some execution operators (#2231)
 add 7e7b3ea02 minor: add editor config file (#2224)
 add 3d2e7b0bf Add type coercion rule for date + interval (#2235)
 new 724f4e336 add a Tablesource

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (e6614aa8f)
\
 N -- N -- N   refs/heads/rdbms-changes (724f4e336)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../integration_hiveserver2.sh => .editorconfig|   23 +-
 .github/workflows/rust.yml |8 +-
 .gitignore

[arrow-datafusion] branch rdbms-changes updated (307abcc -> e6614aa)

2022-04-01 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git.


 discard 307abcc  add a Tablesource
 add 12996ce  revise document of installing ballista pinned to specified 
version (#2034)
 add 503618f  chore: rearrange the code and add comment (#2037)
 add 74bf7ab  fix bug the optimizer rule filter push down (#2039)
 add c1f6269  Use SessionContext to parse Expr protobuf (#2024)
 add 7ed3be6  I think using info in formal code is better than using 
println. (#2020)
 add d02d969  use cargo-tomlfmt to check Cargo.toml formatting in CI (#2033)
 add 2dcdb1f  Minor: tune log level, lint (#2046)
 add 8de2a76  minor: format the annotation (#2047)
 add 5936edc  Refactor SessionContext, SessionState and SessionConfig to 
support multi-tenancy configurations - Part 2 (#2029)
 add f5c0cea  fix panic in register_catalog if default catalog not named 
"datafusion" and information schema enabled (#2050)
 add 2e6833c  Update to arrow/parquet 11.0 (#2048)
 add 59c6d93  Add `write_json`, `read_json`, `register_json`, and 
`JsonFormat` to `CREATE EXTERNAL TABLE` functionality (#2023)
 add afbeaa6  Allow `CatalogProvider::register_catalog` to return an error 
(#2052)
 add 29d0a65  [Ballista][Scheduler] Change log level for noisy logs (#2060)
 add 634252b  Qualified wildcard (#2012)
 add 257d030  Change the DataFusion explain plans to make it clearer in the 
predicate/filter (#2063)
 add 0194a27  Split datafusion-object-store module (#2065)
 add d3c45c2  [MINOR] fix doc in `EXTRACT(field FROM source) (#2074)
 add e8ed603  #2004 approx percentile with weight (#2031)
 add 04da6a6  [Bug][Datafusion] fix TaskContext session_config bug (#2070)
 add 122837d  *: fix #1727 (#2085)
 add 3d31915  Fix lost filters and projections in ParquetExec, CSVExec etc 
(#2077)
 add d644fae  Remove dependency of common for the storage crate (#2076)
 add 703c789  *: remove duplicate test (#2089)
 add 8159294  fix  issue#2058 file_format/json.rs attempt to subtract with 
overflow (#2066)
 add ff110d6  Short-circuit evaluation for `CaseWhen` (#2068)
 add 73ea6e1  [Ballista] Support Union in ballista. (#2098)
 add a09e1ae  add docs for approx functions (#2082)
 add 2598893  doc: separate and fix link for `extract` and `date_part` 
(#2104)
 add 2d6addd  Refactor SessionContext, BallistaContext to support 
multi-tenancy configurations - Part 3 (#2091)
 add 22fdca3  update zlib version to 1.2.12 (#2106)
 add 41b4e49  Reorganize the project folders (#2081)
 add b7d3bb1  Create jit-expression from datafusion expression (#2103)
 add 86df7ee  minor: replace array_equals in case evaluation with eq_dyn 
from arrow-rs (#2121)
 add 91673b3  Serialize timezone in timestamp scalar values (#2120)
 add 57a3a6a  minor: fix clippy on nightly rust (#2119)
 add f313e43  doc: update release schedule (#2110)
 add 9e3bec8  Fix case evaluation with NULLs (#2118)
 add 3063105  Minor: make disk_manager pub (#2126)
 add c43b9ab  issue#1967 ignore channel close (#2113)
 add f619d43  Minor add clarifying comment in parquet (#2127)
 add 4c2320e  JIT-compille DataFusion expression with column name (#2124)
 new e6614aa  add a Tablesource

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (307abcc)
\
 N -- N -- N   refs/heads/rdbms-changes (e6614aa)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .github/workflows/rust.yml |   57 +-
 Cargo.toml |   15 +-
 ballista-examples/Cargo.toml   |   16 +-
 .../bin/ballista-sql.rs => examples/test_sql.rs}   |   24 +-
 ballista-examples/src/bin/ballista-dataframe.rs|2 +-
 ballista-examples/src/bin/ballista-sql.rs  |2 +-
 ballista/rust/client/Cargo.toml|   10 +-
 ballista/rust/client/README.md |6 +-
 ballista/rust/client/src/context.rs|  275 +++--
 ball

[arrow-datafusion] 01/01: add a Tablesource

2022-04-01 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit e6614aa8ff84ffc6d36d19ae5eaa3e71602df949
Author: Wakahisa 
AuthorDate: Mon Feb 14 22:50:05 2022 +0200

add a Tablesource

Tablesource contains more information about the source of the table.
It can be a relational table, file(s), in-memory or unspecified.
---
 datafusion/core/src/datasource/datasource.rs | 34 
 1 file changed, 34 insertions(+)

diff --git a/datafusion/core/src/datasource/datasource.rs 
b/datafusion/core/src/datasource/datasource.rs
index 1b59c85..48a2dc0 100644
--- a/datafusion/core/src/datasource/datasource.rs
+++ b/datafusion/core/src/datasource/datasource.rs
@@ -55,6 +55,35 @@ pub enum TableType {
 Temporary,
 }
 
+/// Indicates the source of this table for metadata/catalog purposes.
+#[derive(Debug, Clone, PartialEq)]
+pub enum TableSource {
+/// An ordinary physical table.
+Relational {
+///
+server: Option,
+///
+database: Option,
+///
+schema: Option,
+///
+table: String
+},
+/// A file on some file system
+File {
+///
+protocol: String,
+///
+path: String,
+///
+format: String,
+},
+/// A transient table.
+InMemory,
+/// An unspecified source, used as the default
+Unspecified,
+}
+
 /// Source table
 #[async_trait]
 pub trait TableProvider: Sync + Send {
@@ -70,6 +99,11 @@ pub trait TableProvider: Sync + Send {
 TableType::Base
 }
 
+/// The source of this table
+fn table_source() -> TableSource {
+TableSource::Unspecified
+}
+
 /// Create an ExecutionPlan that will scan the table.
 /// The table provider will be usually responsible of grouping
 /// the source data into partitions that can be efficiently


[arrow-datafusion] 01/01: add a Tablesource

2022-03-18 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit 307abcc3cc63ddf589b985c335a06cdd5f25650c
Author: Wakahisa 
AuthorDate: Mon Feb 14 22:50:05 2022 +0200

add a Tablesource

Tablesource contains more information about the source of the table.
It can be a relational table, file(s), in-memory or unspecified.
---
 datafusion/src/datasource/datasource.rs | 34 +
 1 file changed, 34 insertions(+)

diff --git a/datafusion/src/datasource/datasource.rs 
b/datafusion/src/datasource/datasource.rs
index 1b59c85..48a2dc0 100644
--- a/datafusion/src/datasource/datasource.rs
+++ b/datafusion/src/datasource/datasource.rs
@@ -55,6 +55,35 @@ pub enum TableType {
 Temporary,
 }
 
+/// Indicates the source of this table for metadata/catalog purposes.
+#[derive(Debug, Clone, PartialEq)]
+pub enum TableSource {
+/// An ordinary physical table.
+Relational {
+///
+server: Option,
+///
+database: Option,
+///
+schema: Option,
+///
+table: String
+},
+/// A file on some file system
+File {
+///
+protocol: String,
+///
+path: String,
+///
+format: String,
+},
+/// A transient table.
+InMemory,
+/// An unspecified source, used as the default
+Unspecified,
+}
+
 /// Source table
 #[async_trait]
 pub trait TableProvider: Sync + Send {
@@ -70,6 +99,11 @@ pub trait TableProvider: Sync + Send {
 TableType::Base
 }
 
+/// The source of this table
+fn table_source() -> TableSource {
+TableSource::Unspecified
+}
+
 /// Create an ExecutionPlan that will scan the table.
 /// The table provider will be usually responsible of grouping
 /// the source data into partitions that can be efficiently


[arrow-datafusion] branch rdbms-changes created (now 307abcc)

2022-03-18 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch rdbms-changes
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git.


  at 307abcc  add a Tablesource

This branch includes the following new commits:

 new 307abcc  add a Tablesource

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[arrow-rs] branch master updated (6b0956a -> a2e629d)

2022-02-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 6b0956a  Publicly export arrow::array::MapBuilder (#1355)
 add a2e629d  Remove delimiter from csv Writer (#1342)

No new revisions were added by this update.

Summary of changes:
 arrow/src/csv/writer.rs | 5 -
 1 file changed, 5 deletions(-)


[arrow-rs] branch master updated (bae3087 -> 6b0956a)

2022-02-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from bae3087  Make bounds configurable in csv ReaderBuilder (#1341)
 add 6b0956a  Publicly export arrow::array::MapBuilder (#1355)

No new revisions were added by this update.

Summary of changes:
 arrow/src/array/mod.rs | 1 +
 1 file changed, 1 insertion(+)


[arrow-rs] branch master updated (57545b0 -> bae3087)

2022-02-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 57545b0  Refactor `StructArray::from` (#1360)
 add bae3087  Make bounds configurable in csv ReaderBuilder (#1341)

No new revisions were added by this update.

Summary of changes:
 arrow/src/csv/reader.rs | 34 --
 1 file changed, 32 insertions(+), 2 deletions(-)


[arrow-rs] branch master updated: Refactor `StructArray::from` (#1360)

2022-02-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 57545b0  Refactor `StructArray::from` (#1360)
57545b0 is described below

commit 57545b01f4a784b38a2fdaeda0cfefb4ccbdc5de
Author: Remzi Yang <59198230+haoyang...@users.noreply.github.com>
AuthorDate: Thu Feb 24 15:38:49 2022 +0800

Refactor `StructArray::from` (#1360)

* add async to default features

Signed-off-by: remzi <1371656737...@gmail.com>

* rewrite

Signed-off-by: remzi <1371656737...@gmail.com>

* update

Signed-off-by: remzi <1371656737...@gmail.com>
---
 arrow/src/array/array_struct.rs | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arrow/src/array/array_struct.rs b/arrow/src/array/array_struct.rs
index 316ffc6..b82ee03 100644
--- a/arrow/src/array/array_struct.rs
+++ b/arrow/src/array/array_struct.rs
@@ -108,10 +108,12 @@ impl StructArray {
 
 impl From for StructArray {
 fn from(data: ArrayData) -> Self {
-let mut boxed_fields = vec![];
-for cd in data.child_data() {
-boxed_fields.push(make_array(cd.clone()));
-}
+let boxed_fields = data
+.child_data()
+.iter()
+.map(|cd| make_array(cd.clone()))
+.collect();
+
 Self { data, boxed_fields }
 }
 }


[arrow-rs] branch master updated: Add with_datetime_format to csv WriterBuilder (#1347)

2022-02-23 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new ef95e52  Add with_datetime_format to csv WriterBuilder (#1347)
ef95e52 is described below

commit ef95e52c012a97facafbba9bc9eaa4ba3fcee8a3
Author: Sergey Glushchenko 
AuthorDate: Wed Feb 23 20:15:28 2022 +0100

Add with_datetime_format to csv WriterBuilder (#1347)
---
 arrow/src/csv/writer.rs | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arrow/src/csv/writer.rs b/arrow/src/csv/writer.rs
index 7752367..18e5c59 100644
--- a/arrow/src/csv/writer.rs
+++ b/arrow/src/csv/writer.rs
@@ -456,6 +456,12 @@ impl WriterBuilder {
 self
 }
 
+/// Set the CSV file's datetime format
+pub fn with_datetime_format(mut self, format: String) -> Self {
+self.datetime_format = Some(format);
+self
+}
+
 /// Set the CSV file's time format
 pub fn with_time_format(mut self, format: String) -> Self {
 self.time_format = Some(format);


svn commit: r52698 - in /release/arrow/arrow-rs-9.1.0: ./ apache-arrow-rs-9.1.0.tar.gz apache-arrow-rs-9.1.0.tar.gz.asc apache-arrow-rs-9.1.0.tar.gz.sha256 apache-arrow-rs-9.1.0.tar.gz.sha512

2022-02-22 Thread nevime
Author: nevime
Date: Tue Feb 22 17:16:30 2022
New Revision: 52698

Log:
Apache Arrow Rust 9.1.0

Added:
release/arrow/arrow-rs-9.1.0/
release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz   (with props)
release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.asc
release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha256
release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha512

Added: release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz
==
Binary file - no diff available.

Propchange: release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.asc
==
--- release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.asc (added)
+++ release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.asc Tue Feb 22 
17:16:30 2022
@@ -0,0 +1,7 @@
+-BEGIN PGP SIGNATURE-
+
+iHUEABYKAB0WIQQ5BfJU+eUEtA//bPYABIjXcX0/sgUCYhEidQAKCRAABIjXcX0/
+somcAQDZT4ZXRV8g+Lv6WMf5Sn8KiJYmicwC2B2oouNMeiWLtQEA7WL/zMR2KEM9
+9RhX08BC9ljw+PIalrHHlLeZakbUOwo=
+=Z23e
+-END PGP SIGNATURE-

Added: release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha256
==
--- release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha256 (added)
+++ release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha256 Tue Feb 22 
17:16:30 2022
@@ -0,0 +1 @@
+3a60df0d820e3be77a99644fe443e108ff161c3da5227234e5807489eaec9561  
apache-arrow-rs-9.1.0.tar.gz

Added: release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha512
==
--- release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha512 (added)
+++ release/arrow/arrow-rs-9.1.0/apache-arrow-rs-9.1.0.tar.gz.sha512 Tue Feb 22 
17:16:30 2022
@@ -0,0 +1 @@
+44adb67bf3559560fdeeadd5ae9188d022b818cb0da3d27cec7ad67d87660d0baab7eb4510c18bf3b8901f45b313de7b19553d42362b9b5867c29519432fcfd8
  apache-arrow-rs-9.1.0.tar.gz




svn commit: r52697 - /release/arrow/KEYS

2022-02-22 Thread nevime
Author: nevime
Date: Tue Feb 22 16:16:27 2022
New Revision: 52697

Log:
Insert Neville Dipale keys to release

Modified:
release/arrow/KEYS

Modified: release/arrow/KEYS
==
--- release/arrow/KEYS (original)
+++ release/arrow/KEYS Tue Feb 22 16:16:27 2022
@@ -1167,3 +1167,23 @@ HoHsSwWTuz2UvPmxhH0LwKHBBmPOZWVF/2iN+cGN
 0rT1eQ==
 =awom
 -END PGP PUBLIC KEY BLOCK-
+pub   ed25519 2022-02-19 [SC] [expires: 2024-02-19]
+  3905F254F9E504B40FFF6CF6000488D7717D3FB2
+uid   [ultimate] Neville Dipale 
+sig 3000488D7717D3FB2 2022-02-19  Neville Dipale 
+sub   cv25519 2022-02-19 [E] [expires: 2024-02-19]
+sig  000488D7717D3FB2 2022-02-19  Neville Dipale 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mDMEYhEgWBYJKwYBBAHaRw8BAQdAXN9r2gDzqnm3M14+5gjzOQGfE9Y7syUZPkZK
+IXFGigS0Ik5ldmlsbGUgRGlwYWxlIDxuZXZpbWVAYXBhY2hlLm9yZz6ImgQTFgoA
+QhYhBDkF8lT55QS0D/9s9gAEiNdxfT+yBQJiESBYAhsDBQkDwmcABQsJCAcCAyIC
+AQYVCgkICwIEFgIDAQIeBwIXgAAKCRAABIjXcX0/ssb7AP96RAhkNNRuaQa2uwbL
+jOSWZipmeW7flCxVKrEhntTIaAEA8oYIwNxuo73+zM9azRNCZbvvZIFlN+09qQMC
+xfkssAm4OARiESBYEgorBgEEAZdVAQUBAQdA2PqrNkrWXfOHuPrj1xeNfIG37fW8
+JXPzqy4/MaIUGSsDAQgHiH4EGBYKACYWIQQ5BfJU+eUEtA//bPYABIjXcX0/sgUC
+YhEgWAIbDAUJA8JnAAAKCRAABIjXcX0/sp36AQCS2vIDq364qtOQzWbotWgjgWH2
+yW1iX/b2CJSl0CZHTgD8CuqXjMk3WequwZhLb61ZqdeUWXvVqny4dxkSg3LFsQw=
+=4aGL
+-END PGP PUBLIC KEY BLOCK-




[arrow-rs] branch master updated: Arrow Rust + Conbench Integration (#1289)

2022-02-22 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b89f7e  Arrow Rust + Conbench Integration (#1289)
4b89f7e is described below

commit 4b89f7ee3549c24fa5997056b16a1cde60ce7043
Author: diana 
AuthorDate: Tue Feb 22 02:20:54 2022 -0700

Arrow Rust + Conbench Integration (#1289)

* Arrow Rust + Conbench Integration

* remove --src-dir
---
 conbench/.flake8  |   2 +
 conbench/.gitignore   | 130 
 conbench/.isort.cfg   |   2 +
 conbench/README.md| 251 ++
 conbench/_criterion.py|  98 +++
 conbench/benchmarks.json  |   8 ++
 conbench/benchmarks.py|  41 +++
 conbench/requirements-test.txt|   3 +
 conbench/requirements.txt |   1 +
 dev/release/rat_exclude_files.txt |   5 +
 10 files changed, 541 insertions(+)

diff --git a/conbench/.flake8 b/conbench/.flake8
new file mode 100644
index 000..e44b810
--- /dev/null
+++ b/conbench/.flake8
@@ -0,0 +1,2 @@
+[flake8]
+ignore = E501
diff --git a/conbench/.gitignore b/conbench/.gitignore
new file mode 100755
index 000..aa44ee2
--- /dev/null
+++ b/conbench/.gitignore
@@ -0,0 +1,130 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in 
version control.
+#   However, in case of collaboration, if having platform-specific 
dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that 
don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
diff --git a/conbench/.isort.cfg b/conbench/.isort.cfg
new file mode 100644
index 000..f238bf7
--- /dev/null
+++ b/conbench/.isort.cfg
@@ -0,0 +1,2 @@
+[settings]
+profile = black
diff --git a/conbench/README.md b/conbench/README.md
new file mode 100644
index 000..8c7f38c
--- /dev/null
+++ b/conbench/README.md
@@ -0,0 +1,251 @@
+
+
+# Arrow Rust + Conbench Integration
+
+
+## Quick start
+
+```
+$ cd ~/arrow-rs/conbench/
+$ conda create -y -n conbench python=3.9
+$ conda activate conbench
+(conbench) $ pip install -r requirements.txt
+(conbench) $ conbench arrow-rs
+```
+
+## Example output
+
+```
+{
+"batch_id": "b68c559358cc43a3aab02d893d2693f4",
+"context": {
+"benchmark_language": "Rust"
+},
+"github": {
+"commit": "ca33a0a50494f95840ade2e9509c3c3d4df35249",
+"repository": "https://github.com/dianaclarke/arrow-rs;
+},
+"info": {},
+"machine_info": {
+"architecture_name": "x86_64",
+"cpu_core_count": "8",
+"cpu_frequency_max_hz": "24",
+"cpu_l1d_cache_bytes": "65536",
+"cpu_l1i_cache_bytes": "131072",
+"cpu_l2_cache_bytes": "4194304",
+"cpu_l3_cache_bytes": "0",
+"cpu_model_name": "Apple M1",
+"cpu_thread_count": "8",
+"gpu_count": "0&qu

svn commit: r52639 - /dev/arrow/KEYS

2022-02-20 Thread nevime
Author: nevime
Date: Sun Feb 20 08:40:26 2022
New Revision: 52639

Log:
Add Neville Dipale keys

Modified:
dev/arrow/KEYS

Modified: dev/arrow/KEYS
==
--- dev/arrow/KEYS (original)
+++ dev/arrow/KEYS Sun Feb 20 08:40:26 2022
@@ -1263,3 +1263,23 @@ HoHsSwWTuz2UvPmxhH0LwKHBBmPOZWVF/2iN+cGN
 0rT1eQ==
 =awom
 -END PGP PUBLIC KEY BLOCK-
+pub   ed25519 2022-02-19 [SC] [expires: 2024-02-19]
+  3905F254F9E504B40FFF6CF6000488D7717D3FB2
+uid   [ultimate] Neville Dipale 
+sig 3000488D7717D3FB2 2022-02-19  Neville Dipale 
+sub   cv25519 2022-02-19 [E] [expires: 2024-02-19]
+sig  000488D7717D3FB2 2022-02-19  Neville Dipale 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mDMEYhEgWBYJKwYBBAHaRw8BAQdAXN9r2gDzqnm3M14+5gjzOQGfE9Y7syUZPkZK
+IXFGigS0Ik5ldmlsbGUgRGlwYWxlIDxuZXZpbWVAYXBhY2hlLm9yZz6ImgQTFgoA
+QhYhBDkF8lT55QS0D/9s9gAEiNdxfT+yBQJiESBYAhsDBQkDwmcABQsJCAcCAyIC
+AQYVCgkICwIEFgIDAQIeBwIXgAAKCRAABIjXcX0/ssb7AP96RAhkNNRuaQa2uwbL
+jOSWZipmeW7flCxVKrEhntTIaAEA8oYIwNxuo73+zM9azRNCZbvvZIFlN+09qQMC
+xfkssAm4OARiESBYEgorBgEEAZdVAQUBAQdA2PqrNkrWXfOHuPrj1xeNfIG37fW8
+JXPzqy4/MaIUGSsDAQgHiH4EGBYKACYWIQQ5BfJU+eUEtA//bPYABIjXcX0/sgUC
+YhEgWAIbDAUJA8JnAAAKCRAABIjXcX0/sp36AQCS2vIDq364qtOQzWbotWgjgWH2
+yW1iX/b2CJSl0CZHTgD8CuqXjMk3WequwZhLb61ZqdeUWXvVqny4dxkSg3LFsQw=
+=4aGL
+-END PGP PUBLIC KEY BLOCK-




svn commit: r52634 - in /dev/arrow/apache-arrow-rs-9.1.0-rc1: ./ apache-arrow-rs-9.1.0.tar.gz apache-arrow-rs-9.1.0.tar.gz.asc apache-arrow-rs-9.1.0.tar.gz.sha256 apache-arrow-rs-9.1.0.tar.gz.sha512

2022-02-19 Thread nevime
Author: nevime
Date: Sat Feb 19 17:02:05 2022
New Revision: 52634

Log:
Apache Arrow Rust 9.1.0 1

Added:
dev/arrow/apache-arrow-rs-9.1.0-rc1/
dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz   (with 
props)
dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.asc
dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha256
dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha512

Added: dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.asc
==
--- dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.asc (added)
+++ dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.asc Sat 
Feb 19 17:02:05 2022
@@ -0,0 +1,7 @@
+-BEGIN PGP SIGNATURE-
+
+iHUEABYKAB0WIQQ5BfJU+eUEtA//bPYABIjXcX0/sgUCYhEidQAKCRAABIjXcX0/
+somcAQDZT4ZXRV8g+Lv6WMf5Sn8KiJYmicwC2B2oouNMeiWLtQEA7WL/zMR2KEM9
+9RhX08BC9ljw+PIalrHHlLeZakbUOwo=
+=Z23e
+-END PGP SIGNATURE-

Added: dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha256
==
--- dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha256 
(added)
+++ dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha256 Sat 
Feb 19 17:02:05 2022
@@ -0,0 +1 @@
+3a60df0d820e3be77a99644fe443e108ff161c3da5227234e5807489eaec9561  
apache-arrow-rs-9.1.0.tar.gz

Added: dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha512
==
--- dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha512 
(added)
+++ dev/arrow/apache-arrow-rs-9.1.0-rc1/apache-arrow-rs-9.1.0.tar.gz.sha512 Sat 
Feb 19 17:02:05 2022
@@ -0,0 +1 @@
+44adb67bf3559560fdeeadd5ae9188d022b818cb0da3d27cec7ad67d87660d0baab7eb4510c18bf3b8901f45b313de7b19553d42362b9b5867c29519432fcfd8
  apache-arrow-rs-9.1.0.tar.gz




[arrow-rs] tag 9.1.0 created (now ecba7dc)

2022-02-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to tag 9.1.0
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


  at ecba7dc  (commit)
No new revisions were added by this update.


[arrow-rs] branch master updated (041b77d -> ecba7dc)

2022-02-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 041b77d  Update the document of function `MutableArrayData::extend` 
(#1336)
 add ecba7dc  Update versions and CHANGELOG for 9.1.0 release (#1325)

No new revisions were added by this update.

Summary of changes:
 CHANGELOG.md   | 65 +-
 arrow-flight/Cargo.toml|  4 +-
 arrow-pyarrow-integration-testing/Cargo.toml   |  4 +-
 arrow/Cargo.toml   |  2 +-
 arrow/README.md|  2 +-
 arrow/test/dependency/default-features/Cargo.toml  |  2 +-
 .../test/dependency/no-default-features/Cargo.toml |  2 +-
 arrow/test/dependency/simd/Cargo.toml  |  2 +-
 dev/release/update_change_log.sh   |  4 +-
 integration-testing/Cargo.toml |  2 +-
 parquet/Cargo.toml |  6 +-
 parquet_derive/Cargo.toml  |  4 +-
 parquet_derive/README.md   |  4 +-
 .../test/dependency/default-features/Cargo.toml|  2 +-
 parquet_derive_test/Cargo.toml |  6 +-
 15 files changed, 87 insertions(+), 24 deletions(-)


[arrow-rs] branch master updated (193b64c -> 041b77d)

2022-02-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 193b64c  Clean up DictionaryArray construction in test (#1314)
 add 041b77d  Update the document of function `MutableArrayData::extend` 
(#1336)

No new revisions were added by this update.

Summary of changes:
 arrow/src/array/transform/mod.rs | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)


[arrow-rs] branch master updated: Clean up DictionaryArray construction in test (#1314)

2022-02-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 193b64c  Clean up DictionaryArray construction in test (#1314)
193b64c is described below

commit 193b64c69f0560a1a01ae4c04004b81afb02fab6
Author: Andrew Lamb 
AuthorDate: Sat Feb 19 11:02:28 2022 -0500

Clean up DictionaryArray construction in test (#1314)
---
 arrow/src/array/array_dictionary.rs | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/arrow/src/array/array_dictionary.rs 
b/arrow/src/array/array_dictionary.rs
index 57153f1..7e82ad2 100644
--- a/arrow/src/array/array_dictionary.rs
+++ b/arrow/src/array/array_dictionary.rs
@@ -302,6 +302,7 @@ mod tests {
 use super::*;
 
 use crate::array::Int8Array;
+use crate::datatypes::Int16Type;
 use crate::{
 array::Int16DictionaryArray, array::PrimitiveDictionaryBuilder,
 datatypes::DataType,
@@ -472,29 +473,11 @@ mod tests {
 #[test]
 fn test_dictionary_iter() {
 // Construct a value array
-let value_data = ArrayData::builder(DataType::Int8)
-.len(8)
-.add_buffer(Buffer::from(
-&[10_i8, 11, 12, 13, 14, 15, 16, 17].to_byte_slice(),
-))
-.build()
-.unwrap();
-
-// Construct a buffer for value offsets, for the nested array:
-let keys = Buffer::from(&[2_i16, 3, 4].to_byte_slice());
+let values = Int8Array::from_iter_values([10_i8, 11, 12, 13, 14, 15, 
16, 17]);
+let keys = Int16Array::from_iter_values([2_i16, 3, 4]);
 
 // Construct a dictionary array from the above two
-let key_type = DataType::Int16;
-let value_type = DataType::Int8;
-let dict_data_type =
-DataType::Dictionary(Box::new(key_type), Box::new(value_type));
-let dict_data = ArrayData::builder(dict_data_type)
-.len(3)
-.add_buffer(keys)
-.add_child_data(value_data)
-.build()
-.unwrap();
-let dict_array = Int16DictionaryArray::from(dict_data);
+let dict_array = DictionaryArraytry_new(, 
).unwrap();
 
 let mut key_iter = dict_array.keys_iter();
 assert_eq!(2, key_iter.next().unwrap().unwrap());


[arrow-rs] branch master updated: Cleanup: remove some dead / test only code (#1331)

2022-02-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new c0351f8  Cleanup: remove some dead / test only code (#1331)
c0351f8 is described below

commit c0351f84e61172f8403c468868919ec01538ce09
Author: Andrew Lamb 
AuthorDate: Sat Feb 19 10:57:52 2022 -0500

Cleanup: remove some dead / test only code (#1331)
---
 arrow/src/array/data.rs   | 23 
 arrow/src/compute/util.rs | 93 ++-
 2 files changed, 35 insertions(+), 81 deletions(-)

diff --git a/arrow/src/array/data.rs b/arrow/src/array/data.rs
index d2db0d0..cbbc56a 100644
--- a/arrow/src/array/data.rs
+++ b/arrow/src/array/data.rs
@@ -198,29 +198,6 @@ pub(crate) fn new_buffers(data_type: , capacity: 
usize) -> [MutableBuff
 }
 }
 
-/// Ensures that at least `min_size` elements of type `data_type` can
-/// be stored in a buffer of `buffer_size`.
-///
-/// `buffer_index` is used in error messages to identify which buffer
-/// had the invalid index
-#[allow(dead_code)]
-fn ensure_size(
-data_type: ,
-min_size: usize,
-buffer_size: usize,
-buffer_index: usize,
-) -> Result<()> {
-// if min_size is zero, may not have buffers (e.g. NullArray)
-if min_size > 0 && buffer_size < min_size {
-Err(ArrowError::InvalidArgumentError(format!(
-"Need at least {} bytes in buffers[{}] in array of type {:?}, but 
got {}",
-buffer_size, buffer_index, data_type, min_size
-)))
-} else {
-Ok(())
-}
-}
-
 /// Maps 2 [`MutableBuffer`]s into a vector of [Buffer]s whose size depends on 
`data_type`.
 #[inline]
 pub(crate) fn into_buffers(
diff --git a/arrow/src/compute/util.rs b/arrow/src/compute/util.rs
index 3f168c1..62c3be6 100644
--- a/arrow/src/compute/util.rs
+++ b/arrow/src/compute/util.rs
@@ -18,7 +18,7 @@
 //! Common utilities for computation kernels.
 
 use crate::array::*;
-use crate::buffer::{buffer_bin_and, buffer_bin_or, Buffer};
+use crate::buffer::{buffer_bin_and, Buffer};
 use crate::datatypes::*;
 use crate::error::{ArrowError, Result};
 use num::{One, ToPrimitive, Zero};
@@ -58,41 +58,6 @@ pub(super) fn combine_option_bitmap(
 }
 }
 
-/// Compares the null bitmaps of two arrays using a bitwise `or` operation.
-///
-/// This function is useful when implementing operations on higher level 
arrays.
-#[allow(clippy::unnecessary_wraps)]
-#[allow(dead_code)]
-pub(super) fn compare_option_bitmap(
-left_data: ,
-right_data: ,
-len_in_bits: usize,
-) -> Result> {
-let left_offset_in_bits = left_data.offset();
-let right_offset_in_bits = right_data.offset();
-
-let left = left_data.null_buffer();
-let right = right_data.null_buffer();
-
-match left {
-None => match right {
-None => Ok(None),
-Some(r) => Ok(Some(r.bit_slice(right_offset_in_bits, 
len_in_bits))),
-},
-Some(l) => match right {
-None => Ok(Some(l.bit_slice(left_offset_in_bits, len_in_bits))),
-
-Some(r) => Ok(Some(buffer_bin_or(
-l,
-left_offset_in_bits,
-r,
-right_offset_in_bits,
-len_in_bits,
-))),
-},
-}
-}
-
 /// Takes/filters a list array's inner data using the offsets of the list 
array.
 ///
 /// Where a list array has indices `[0,2,5,10]`, taking indices of `[2,0]` 
returns
@@ -176,10 +141,44 @@ pub(super) mod tests {
 
 use std::sync::Arc;
 
+use crate::buffer::buffer_bin_or;
 use crate::datatypes::DataType;
 use crate::util::bit_util;
 use crate::{array::ArrayData, buffer::MutableBuffer};
 
+/// Compares the null bitmaps of two arrays using a bitwise `or` operation.
+///
+/// This function is useful when implementing operations on higher level 
arrays.
+pub(super) fn compare_option_bitmap(
+left_data: ,
+right_data: ,
+len_in_bits: usize,
+) -> Result> {
+let left_offset_in_bits = left_data.offset();
+let right_offset_in_bits = right_data.offset();
+
+let left = left_data.null_buffer();
+let right = right_data.null_buffer();
+
+match left {
+None => match right {
+None => Ok(None),
+Some(r) => Ok(Some(r.bit_slice(right_offset_in_bits, 
len_in_bits))),
+},
+Some(l) => match right {
+None => Ok(Some(l.bit_slice(left_offset_in_bits, 
len_in_bits))),
+
+Some(r) => Ok(Some(buffer_bin_or(
+l,
+left_offset_in_bits,
+r,
+right_offset_in_bits,
+len_in_bits,
+))),
+   

[arrow-rs] branch master updated: fix failing csv_writer bench (#1293)

2022-02-09 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 39f3f71  fix failing csv_writer bench (#1293)
39f3f71 is described below

commit 39f3f711876ff113545b1a2d7023f66de77bb731
Author: Andy Grove 
AuthorDate: Thu Feb 10 00:58:08 2022 -0700

fix failing csv_writer bench (#1293)
---
 arrow/benches/csv_writer.rs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arrow/benches/csv_writer.rs b/arrow/benches/csv_writer.rs
index 62c5da9..3ecf514 100644
--- a/arrow/benches/csv_writer.rs
+++ b/arrow/benches/csv_writer.rs
@@ -25,6 +25,7 @@ use arrow::array::*;
 use arrow::csv;
 use arrow::datatypes::*;
 use arrow::record_batch::RecordBatch;
+use std::env;
 use std::fs::File;
 use std::sync::Arc;
 
@@ -56,7 +57,8 @@ fn criterion_benchmark(c:  Criterion) {
 vec![Arc::new(c1), Arc::new(c2), Arc::new(c3), Arc::new(c4)],
 )
 .unwrap();
-let file = File::create("target/bench_write_csv.csv").unwrap();
+let path = env::temp_dir().join("bench_write_csv.csv");
+let file = File::create(path).unwrap();
 let mut writer = csv::Writer::new(file);
 let batches = vec![, , , , , , , , , , ];
 


[arrow-rs] branch master updated: JSON reader - empty nested list should not create child value (#826)

2021-10-13 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new e898de5  JSON reader - empty nested list should not create child value 
(#826)
e898de5 is described below

commit e898de57e4587c64387939f8a557bc5fa2dffeb8
Author: Wakahisa 
AuthorDate: Wed Oct 13 15:46:07 2021 +0200

JSON reader - empty nested list should not create child value (#826)

* JSON reader - empty nested list should not create child value

* PR review
---
 arrow/src/json/reader.rs | 41 ++
 arrow/src/json/writer.rs | 52 
 2 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/arrow/src/json/reader.rs b/arrow/src/json/reader.rs
index 9592b59..c2a2de9 100644
--- a/arrow/src/json/reader.rs
+++ b/arrow/src/json/reader.rs
@@ -1048,31 +1048,27 @@ impl Decoder {
 }
 DataType::Struct(fields) => {
 // extract list values, with non-lists converted to Value::Null
-let array_item_count = rows
-.iter()
-.map(|row| match row {
-Value::Array(values) => values.len(),
-_ => 1,
-})
-.sum();
+let array_item_count = cur_offset.to_usize().unwrap();
 let num_bytes = bit_util::ceil(array_item_count, 8);
 let mut null_buffer = 
MutableBuffer::from_len_zeroed(num_bytes);
 let mut struct_index = 0;
 let rows: Vec = rows
 .iter()
-.flat_map(|row| {
-if let Value::Array(values) = row {
-values.iter().for_each(|_| {
-bit_util::set_bit(
-null_buffer.as_slice_mut(),
-struct_index,
-);
+.flat_map(|row| match row {
+Value::Array(values) if !values.is_empty() => {
+values.iter().for_each(|value| {
+if !value.is_null() {
+bit_util::set_bit(
+null_buffer.as_slice_mut(),
+struct_index,
+);
+}
 struct_index += 1;
 });
 values.clone()
-} else {
-struct_index += 1;
-vec![Value::Null]
+}
+_ => {
+vec![]
 }
 })
 .collect();
@@ -2209,6 +2205,7 @@ mod tests {
 {"a": [{"b": true, "c": {"d": "c_text"}}, {"b": null, "c": {"d": 
"d_text"}}, {"b": true, "c": {"d": null}}]}
 {"a": null}
 {"a": []}
+{"a": [null]}
 "#;
 let mut reader = builder.build(Cursor::new(json_content)).unwrap();
 
@@ -2243,23 +2240,23 @@ mod tests {
 .null_bit_buffer(Buffer::from(vec![0b0011]))
 .build();
 let a_list = ArrayDataBuilder::new(a_field.data_type().clone())
-.len(5)
-.add_buffer(Buffer::from_slice_ref(&[0i32, 2, 3, 6, 6, 6]))
+.len(6)
+.add_buffer(Buffer::from_slice_ref(&[0i32, 2, 3, 6, 6, 6, 7]))
 .add_child_data(a)
-.null_bit_buffer(Buffer::from(vec![0b00010111]))
+.null_bit_buffer(Buffer::from(vec![0b00110111]))
 .build();
 let expected = make_array(a_list);
 
 // compare `a` with result from json reader
 let batch = reader.next().unwrap().unwrap();
 let read = batch.column(0);
-assert_eq!(read.len(), 5);
+assert_eq!(read.len(), 6);
 // compare the arrays the long way around, to better detect differences
 let read:  = 
read.as_any().downcast_ref::().unwrap();
 let expected = expected.as_any().downcast_ref::().unwrap();
 assert_eq!(
 read.data().buffers()[0],
-Buffer::from_slice_ref(&[0i32, 2, 3, 6, 6, 6])
+Buffer::from_slice_ref(&[0i32, 2, 3, 6, 6, 6, 7])
 );
 // compare list null buffers
 assert_eq!(read.data().null_buffer(), expected.data().null_buffer());
diff --git a/arrow/src/json/writer.rs b/arrow/src/json/wr

[arrow-rs] branch master updated: Fix null count when casting ListArray (#816)

2021-10-06 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new a835f2c  Fix null count when casting ListArray (#816)
a835f2c is described below

commit a835f2cd1c1c8f7aca092eeafab16f76a07f285f
Author: Andrew Lamb 
AuthorDate: Wed Oct 6 20:29:04 2021 -0400

Fix null count when casting ListArray (#816)
---
 arrow/src/compute/kernels/cast.rs | 38 ++
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/arrow/src/compute/kernels/cast.rs 
b/arrow/src/compute/kernels/cast.rs
index 593adec..a0847d1 100644
--- a/arrow/src/compute/kernels/cast.rs
+++ b/arrow/src/compute/kernels/cast.rs
@@ -1680,12 +1680,8 @@ fn cast_list_inner(
 let array_data = ArrayData::new(
 to_type.clone(),
 array.len(),
-Some(cast_array.null_count()),
-cast_array
-.data()
-.null_bitmap()
-.clone()
-.map(|bitmap| bitmap.bits),
+Some(data.null_count()),
+data.null_bitmap().clone().map(|bitmap| bitmap.bits),
 array.offset(),
 // reuse offset buffer
 data.buffers().to_vec(),
@@ -2025,7 +2021,6 @@ mod tests {
 
 #[test]
 fn test_cast_list_i32_to_list_u16() {
-// Construct a value array
 let value_data = Int32Array::from(vec![0, 0, 0, -1, -2, -1, 2, 
1])
 .data()
 .clone();
@@ -2033,6 +2028,7 @@ mod tests {
 let value_offsets = Buffer::from_slice_ref(&[0, 3, 6, 8]);
 
 // Construct a list array from the above two
+// [[0,0,0], [-1, -2, -1], [2, 1]]
 let list_data_type =
 DataType::List(Box::new(Field::new("item", DataType::Int32, 
true)));
 let list_data = ArrayData::builder(list_data_type)
@@ -2047,9 +2043,13 @@ mod tests {
 ::List(Box::new(Field::new("item", DataType::UInt16, 
true))),
 )
 .unwrap();
+
+// For the ListArray itself, there are no null values (as there were 
no nulls when they went in)
+//
 // 3 negative values should get lost when casting to unsigned,
 // 1 value should overflow
-assert_eq!(4, cast_array.null_count());
+assert_eq!(0, cast_array.null_count());
+
 // offsets should be the same
 assert_eq!(
 list_array.data().buffers().to_vec(),
@@ -2061,23 +2061,21 @@ mod tests {
 .downcast_ref::()
 .unwrap();
 assert_eq!(DataType::UInt16, array.value_type());
-assert_eq!(4, array.values().null_count());
 assert_eq!(3, array.value_length(0));
 assert_eq!(3, array.value_length(1));
 assert_eq!(2, array.value_length(2));
+
+// expect 4 nulls: negative numbers and overflow
 let values = array.values();
+assert_eq!(4, values.null_count());
 let u16arr = values.as_any().downcast_ref::().unwrap();
-assert_eq!(8, u16arr.len());
-assert_eq!(4, u16arr.null_count());
-
-assert_eq!(0, u16arr.value(0));
-assert_eq!(0, u16arr.value(1));
-assert_eq!(0, u16arr.value(2));
-assert!(!u16arr.is_valid(3));
-assert!(!u16arr.is_valid(4));
-assert!(!u16arr.is_valid(5));
-assert_eq!(2, u16arr.value(6));
-assert!(!u16arr.is_valid(7));
+
+let expected: UInt16Array =
+vec![Some(0), Some(0), Some(0), None, None, None, Some(2), None]
+.into_iter()
+.collect();
+
+assert_eq!(u16arr, );
 }
 
 #[test]


[arrow-datafusion] branch master updated: reduce ScalarValue from trait boilerplate with macro (#989)

2021-09-11 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
 new bb82ca1  reduce ScalarValue from trait boilerplate with macro (#989)
bb82ca1 is described below

commit bb82ca100b233653811d14a4cc18cad5e5bd7536
Author: QP Hou 
AuthorDate: Sat Sep 11 13:58:48 2021 -0700

reduce ScalarValue from trait boilerplate with macro (#989)

Co-authored-by: Jorge Leitao 

Co-authored-by: Jorge Leitao 
---
 datafusion/src/scalar.rs | 152 ---
 1 file changed, 24 insertions(+), 128 deletions(-)

diff --git a/datafusion/src/scalar.rs b/datafusion/src/scalar.rs
index 86d1765..77d4c82 100644
--- a/datafusion/src/scalar.rs
+++ b/datafusion/src/scalar.rs
@@ -1122,137 +1122,33 @@ impl ScalarValue {
 }
 }
 
-impl From for ScalarValue {
-fn from(value: f64) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Float64(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: f32) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Float32(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: i8) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Int8(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: i16) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Int16(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: i32) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Int32(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: i64) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Int64(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: bool) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::Boolean(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: u8) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::UInt8(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: u16) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::UInt16(value)
-}
-}
-
-impl From for ScalarValue {
-fn from(value: u32) -> Self {
-Some(value).into()
-}
-}
-
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::UInt32(value)
-}
-}
+macro_rules! impl_scalar {
+($ty:ty, $scalar:tt) => {
+impl From<$ty> for ScalarValue {
+fn from(value: $ty) -> Self {
+ScalarValue::$scalar(Some(value))
+}
+}
 
-impl From for ScalarValue {
-fn from(value: u64) -> Self {
-Some(value).into()
-}
+impl From> for ScalarValue {
+fn from(value: Option<$ty>) -> Self {
+ScalarValue::$scalar(value)
+}
+}
+};
 }
 
-impl From> for ScalarValue {
-fn from(value: Option) -> Self {
-ScalarValue::UInt64(value)
-}
-}
+impl_scalar!(f64, Float64);
+impl_scalar!(f32, Float32);
+impl_scalar!(i8, Int8);
+impl_scalar!(i16, Int16);
+impl_scalar!(i32, Int32);
+impl_scalar!(i64, Int64);
+impl_scalar!(bool, Boolean);
+impl_scalar!(u8, UInt8);
+impl_scalar!(u16, UInt16);
+impl_scalar!(u32, UInt32);
+impl_scalar!(u64, UInt64);
 
 impl From<> for ScalarValue {
 fn from(value: ) -> Self {


[arrow-rs] branch master updated: Added PartialEq to RecordBatch (#750)

2021-09-11 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e4e75b  Added PartialEq to RecordBatch (#750)
0e4e75b is described below

commit 0e4e75b7cc5ac8e934b5846df75612ce8e641bfb
Author: Matthew Turner 
AuthorDate: Sat Sep 11 12:52:23 2021 -0400

Added PartialEq to RecordBatch (#750)

* Added PartialEq to RecordBatch

* derive PartialEq and add tests
---
 arrow/src/record_batch.rs | 159 +-
 1 file changed, 158 insertions(+), 1 deletion(-)

diff --git a/arrow/src/record_batch.rs b/arrow/src/record_batch.rs
index bb4b301..b6e5566 100644
--- a/arrow/src/record_batch.rs
+++ b/arrow/src/record_batch.rs
@@ -37,7 +37,7 @@ use crate::error::{ArrowError, Result};
 /// serialization and computation functions, possibly incremental.
 /// See also [CSV reader](crate::csv::Reader) and
 /// [JSON reader](crate::json::Reader).
-#[derive(Clone, Debug)]
+#[derive(Clone, Debug, PartialEq)]
 pub struct RecordBatch {
 schema: SchemaRef,
 columns: Vec>,
@@ -741,4 +741,161 @@ mod tests {
 "Invalid argument error: batches[1] schema is different with 
argument schema.",
 );
 }
+
+#[test]
+fn record_batch_equality() {
+let id_arr1 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr1 = Int32Array::from(vec![5, 6, 7, 8]);
+let schema1 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("val", DataType::Int32, false),
+]);
+
+let id_arr2 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr2 = Int32Array::from(vec![5, 6, 7, 8]);
+let schema2 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("val", DataType::Int32, false),
+]);
+
+let batch1 = RecordBatch::try_new(
+Arc::new(schema1),
+vec![Arc::new(id_arr1), Arc::new(val_arr1)],
+)
+.unwrap();
+
+let batch2 = RecordBatch::try_new(
+Arc::new(schema2),
+vec![Arc::new(id_arr2), Arc::new(val_arr2)],
+)
+.unwrap();
+
+assert_eq!(batch1, batch2);
+}
+
+#[test]
+fn record_batch_vals_ne() {
+let id_arr1 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr1 = Int32Array::from(vec![5, 6, 7, 8]);
+let schema1 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("val", DataType::Int32, false),
+]);
+
+let id_arr2 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr2 = Int32Array::from(vec![1, 2, 3, 4]);
+let schema2 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("val", DataType::Int32, false),
+]);
+
+let batch1 = RecordBatch::try_new(
+Arc::new(schema1),
+vec![Arc::new(id_arr1), Arc::new(val_arr1)],
+)
+.unwrap();
+
+let batch2 = RecordBatch::try_new(
+Arc::new(schema2),
+vec![Arc::new(id_arr2), Arc::new(val_arr2)],
+)
+.unwrap();
+
+assert_ne!(batch1, batch2);
+}
+
+#[test]
+fn record_batch_column_names_ne() {
+let id_arr1 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr1 = Int32Array::from(vec![5, 6, 7, 8]);
+let schema1 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("val", DataType::Int32, false),
+]);
+
+let id_arr2 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr2 = Int32Array::from(vec![5, 6, 7, 8]);
+let schema2 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("num", DataType::Int32, false),
+]);
+
+let batch1 = RecordBatch::try_new(
+Arc::new(schema1),
+vec![Arc::new(id_arr1), Arc::new(val_arr1)],
+)
+.unwrap();
+
+let batch2 = RecordBatch::try_new(
+Arc::new(schema2),
+vec![Arc::new(id_arr2), Arc::new(val_arr2)],
+)
+.unwrap();
+
+assert_ne!(batch1, batch2);
+}
+
+#[test]
+fn record_batch_column_number_ne() {
+let id_arr1 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr1 = Int32Array::from(vec![5, 6, 7, 8]);
+let schema1 = Schema::new(vec![
+Field::new("id", DataType::Int32, false),
+Field::new("val", DataType::Int32, false),
+]);
+
+let id_arr2 = Int32Array::from(vec![1, 2, 3, 4]);
+let val_arr2 = Int32Array::from(vec![5, 6, 7, 8]);

[arrow-rs] branch master updated: fix: Handle slices in unary kernel (#739)

2021-09-02 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ae6910  fix: Handle slices in unary kernel (#739)
7ae6910 is described below

commit 7ae691049b89e2ae54c4315021f305560ff167b6
Author: Ben Chambers <35960+bjchamb...@users.noreply.github.com>
AuthorDate: Thu Sep 2 17:12:47 2021 -0700

fix: Handle slices in unary kernel (#739)
---
 arrow/src/buffer/immutable.rs  |  2 +-
 arrow/src/compute/kernels/arity.rs | 24 +++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arrow/src/buffer/immutable.rs b/arrow/src/buffer/immutable.rs
index c00af6e..f0aefd9 100644
--- a/arrow/src/buffer/immutable.rs
+++ b/arrow/src/buffer/immutable.rs
@@ -184,7 +184,7 @@ impl Buffer {
 /// If the offset is byte-aligned the returned buffer is a shallow clone,
 /// otherwise a new buffer is allocated and filled with a copy of the bits 
in the range.
 pub fn bit_slice(, offset: usize, len: usize) -> Self {
-if offset % 8 == 0 && len % 8 == 0 {
+if offset % 8 == 0 {
 return self.slice(offset / 8);
 }
 
diff --git a/arrow/src/compute/kernels/arity.rs 
b/arrow/src/compute/kernels/arity.rs
index 4aa7f3d..d7beae6 100644
--- a/arrow/src/compute/kernels/arity.rs
+++ b/arrow/src/compute/kernels/arity.rs
@@ -30,7 +30,10 @@ fn into_primitive_array_data(
 O::DATA_TYPE,
 array.len(),
 None,
-array.data_ref().null_buffer().cloned(),
+array
+.data_ref()
+.null_buffer()
+.map(|b| b.bit_slice(array.offset(), array.len())),
 0,
 vec![buffer],
 vec![],
@@ -72,3 +75,22 @@ where
 let data = into_primitive_array_data::<_, O>(array, buffer);
 PrimitiveArrayfrom(data)
 }
+
+#[cfg(test)]
+mod tests {
+use super::*;
+use crate::array::{as_primitive_array, Float64Array};
+
+#[test]
+fn test_unary_f64_slice() {
+let input =
+Float64Array::from(vec![Some(5.1f64), None, Some(6.8), None, 
Some(7.2)]);
+let input_slice = input.slice(1, 4);
+let input_slice:  = as_primitive_array(_slice);
+let result = unary(input_slice, |n| n.round());
+assert_eq!(
+result,
+Float64Array::from(vec![None, Some(7.0), None, Some(7.0)])
+)
+}
+}


[arrow-rs] branch master updated: Write boolean stats for boolean columns (not i32 stats) (#661)

2021-08-08 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 857dbaf  Write boolean stats for boolean columns (not i32 stats) (#661)
857dbaf is described below

commit 857dbafcbaa721f22ac485f38ccaff3faf8d2ab9
Author: Andrew Lamb 
AuthorDate: Sun Aug 8 08:32:47 2021 -0400

Write boolean stats for boolean columns (not i32 stats) (#661)
---
 parquet/src/column/writer.rs | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/parquet/src/column/writer.rs b/parquet/src/column/writer.rs
index 3cb17e1..af76c84 100644
--- a/parquet/src/column/writer.rs
+++ b/parquet/src/column/writer.rs
@@ -919,7 +919,7 @@ impl ColumnWriterImpl {
 };
 match self.descr.physical_type() {
 Type::INT32 => gen_stats_section!(i32, int32, min, max, distinct, 
nulls),
-Type::BOOLEAN => gen_stats_section!(i32, int32, min, max, 
distinct, nulls),
+Type::BOOLEAN => gen_stats_section!(bool, boolean, min, max, 
distinct, nulls),
 Type::INT64 => gen_stats_section!(i64, int64, min, max, distinct, 
nulls),
 Type::INT96 => gen_stats_section!(Int96, int96, min, max, 
distinct, nulls),
 Type::FLOAT => gen_stats_section!(f32, float, min, max, distinct, 
nulls),
@@ -1691,13 +1691,11 @@ mod tests {
 fn test_bool_statistics() {
 let stats = statistics_roundtrip::(&[true, false, false, 
true]);
 assert!(stats.has_min_max_set());
-// should it be BooleanStatistics??
-// https://github.com/apache/arrow-rs/issues/659
-if let Statistics::Int32(stats) = stats {
-assert_eq!(stats.min(), &0);
-assert_eq!(stats.max(), &1);
+if let Statistics::Boolean(stats) = stats {
+assert_eq!(stats.min(), );
+assert_eq!(stats.max(), );
 } else {
-panic!("expecting Statistics::Int32, got {:?}", stats);
+panic!("expecting Statistics::Boolean, got {:?}", stats);
 }
 }
 


[arrow-rs] branch master updated: allocate enough bytes when writing booleans (#658)

2021-08-08 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 75432ed  allocate enough bytes when writing booleans (#658)
75432ed is described below

commit 75432edb05ff001481df728607fc5b9be969c266
Author: Ben Chambers <35960+bjchamb...@users.noreply.github.com>
AuthorDate: Sun Aug 8 00:57:17 2021 -0700

allocate enough bytes when writing booleans (#658)

* allocate enough bytes when writing booleans

* round up to nearest multiple of 256
---
 parquet/src/arrow/arrow_writer.rs | 28 +++-
 parquet/src/data_type.rs  |  8 +++-
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/parquet/src/arrow/arrow_writer.rs 
b/parquet/src/arrow/arrow_writer.rs
index 4726734..7728cd4 100644
--- a/parquet/src/arrow/arrow_writer.rs
+++ b/parquet/src/arrow/arrow_writer.rs
@@ -227,7 +227,7 @@ fn write_leaves(
 ArrowDataType::FixedSizeList(_, _) | ArrowDataType::Union(_) => {
 Err(ParquetError::NYI(
 format!(
-"Attempting to write an Arrow type {:?} to parquet that is 
not yet implemented", 
+"Attempting to write an Arrow type {:?} to parquet that is 
not yet implemented",
 array.data_type()
 )
 ))
@@ -1200,6 +1200,32 @@ mod tests {
 }
 
 #[test]
+fn bool_large_single_column() {
+let values = Arc::new(
+[None, Some(true), Some(false)]
+.iter()
+.cycle()
+.copied()
+.take(200_000)
+.collect::(),
+);
+let schema =
+Schema::new(vec![Field::new("col", values.data_type().clone(), 
true)]);
+let expected_batch =
+RecordBatch::try_new(Arc::new(schema), vec![values]).unwrap();
+let file = get_temp_file("bool_large_single_column", &[]);
+
+let mut writer = ArrowWriter::try_new(
+file.try_clone().unwrap(),
+expected_batch.schema(),
+None,
+)
+.expect("Unable to write file");
+writer.write(_batch).unwrap();
+writer.close().unwrap();
+}
+
+#[test]
 fn i8_single_column() {
 required_and_optional::(0..SMALL_SIZE as i8, 
"i8_single_column");
 }
diff --git a/parquet/src/data_type.rs b/parquet/src/data_type.rs
index 127ba95..3573362 100644
--- a/parquet/src/data_type.rs
+++ b/parquet/src/data_type.rs
@@ -588,6 +588,7 @@ pub(crate) mod private {
 use crate::util::bit_util::{BitReader, BitWriter};
 use crate::util::memory::ByteBufferPtr;
 
+use arrow::util::bit_util::round_upto_power_of_2;
 use byteorder::ByteOrder;
 use std::convert::TryInto;
 
@@ -669,7 +670,12 @@ pub(crate) mod private {
 bit_writer:  BitWriter,
 ) -> Result<()> {
 if bit_writer.bytes_written() + values.len() / 8 >= 
bit_writer.capacity() {
-bit_writer.extend(256);
+let bits_available =
+(bit_writer.capacity() - bit_writer.bytes_written()) * 8;
+let bits_needed = values.len() - bits_available;
+let bytes_needed = (bits_needed + 7) / 8;
+let bytes_needed = round_upto_power_of_2(bytes_needed, 256);
+bit_writer.extend(bytes_needed);
 }
 for value in values {
 if !bit_writer.put_value(*value as u64, 1) {


[arrow-rs] branch master updated: Fix parquet string statistics generation (#643)

2021-08-08 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 4618ef5  Fix parquet string statistics generation (#643)
4618ef5 is described below

commit 4618ef539521a09a1e46246a29ea31807e98bb7c
Author: Andrew Lamb 
AuthorDate: Sun Aug 8 03:46:14 2021 -0400

Fix parquet string statistics generation (#643)

* Fix string statistics generation, add tests

* fix Int96 stats test

* Add notes for additional tickets
---
 parquet/src/column/writer.rs | 122 +++
 parquet/src/data_type.rs |  29 +-
 2 files changed, 134 insertions(+), 17 deletions(-)

diff --git a/parquet/src/column/writer.rs b/parquet/src/column/writer.rs
index d5b8457..3cb17e1 100644
--- a/parquet/src/column/writer.rs
+++ b/parquet/src/column/writer.rs
@@ -1688,6 +1688,128 @@ mod tests {
 }
 
 #[test]
+fn test_bool_statistics() {
+let stats = statistics_roundtrip::(&[true, false, false, 
true]);
+assert!(stats.has_min_max_set());
+// should it be BooleanStatistics??
+// https://github.com/apache/arrow-rs/issues/659
+if let Statistics::Int32(stats) = stats {
+assert_eq!(stats.min(), &0);
+assert_eq!(stats.max(), &1);
+} else {
+panic!("expecting Statistics::Int32, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_int32_statistics() {
+let stats = statistics_roundtrip::(&[-1, 3, -2, 2]);
+assert!(stats.has_min_max_set());
+if let Statistics::Int32(stats) = stats {
+assert_eq!(stats.min(), &-2);
+assert_eq!(stats.max(), &3);
+} else {
+panic!("expecting Statistics::Int32, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_int64_statistics() {
+let stats = statistics_roundtrip::(&[-1, 3, -2, 2]);
+assert!(stats.has_min_max_set());
+if let Statistics::Int64(stats) = stats {
+assert_eq!(stats.min(), &-2);
+assert_eq!(stats.max(), &3);
+} else {
+panic!("expecting Statistics::Int64, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_int96_statistics() {
+let input = vec![
+Int96::from(vec![1, 20, 30]),
+Int96::from(vec![3, 20, 10]),
+Int96::from(vec![0, 20, 30]),
+Int96::from(vec![2, 20, 30]),
+]
+.into_iter()
+.collect::>();
+
+let stats = statistics_roundtrip::();
+assert!(stats.has_min_max_set());
+if let Statistics::Int96(stats) = stats {
+assert_eq!(stats.min(), ::from(vec![0, 20, 30]));
+assert_eq!(stats.max(), ::from(vec![3, 20, 10]));
+} else {
+panic!("expecting Statistics::Int96, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_float_statistics() {
+let stats = statistics_roundtrip::(&[-1.0, 3.0, -2.0, 2.0]);
+assert!(stats.has_min_max_set());
+if let Statistics::Float(stats) = stats {
+assert_eq!(stats.min(), &-2.0);
+assert_eq!(stats.max(), &3.0);
+} else {
+panic!("expecting Statistics::Float, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_double_statistics() {
+let stats = statistics_roundtrip::(&[-1.0, 3.0, -2.0, 
2.0]);
+assert!(stats.has_min_max_set());
+if let Statistics::Double(stats) = stats {
+assert_eq!(stats.min(), &-2.0);
+assert_eq!(stats.max(), &3.0);
+} else {
+panic!("expecting Statistics::Double, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_byte_array_statistics() {
+let input = vec!["aawaa", "zz", "aaw", "m", "qrs"]
+.iter()
+.map(|| s.into())
+.collect::>();
+
+let stats = statistics_roundtrip::();
+assert!(stats.has_min_max_set());
+if let Statistics::ByteArray(stats) = stats {
+assert_eq!(stats.min(), ::from("aaw"));
+assert_eq!(stats.max(), ::from("zz"));
+} else {
+panic!("expecting Statistics::ByteArray, got {:?}", stats);
+}
+}
+
+#[test]
+fn test_fixed_len_byte_array_statistics() {
+let input = vec!["aawaa", "zz   ", "aaw  ", "m", "qrs  "]
+.iter()
+.map(|| {
+let b: ByteArray = s.into();
+b.into()
+})
+.collect::>();
+
+let stats = statistics_roun

[arrow-rs] branch master updated: Remove undefined behavior in `value` method of boolean and primitive arrays (#644)

2021-08-03 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bf1988  Remove undefined behavior in `value` method of boolean and 
primitive arrays (#644)
6bf1988 is described below

commit 6bf1988852f87da21a163203eec4c83a7b692901
Author: Daniël Heres 
AuthorDate: Tue Aug 3 09:11:24 2021 +0200

Remove undefined behavior in `value` method of boolean and primitive arrays 
(#644)

* Remove UB in `value`

* Add safety note
---
 arrow/src/array/array_boolean.rs|  6 --
 arrow/src/array/array_primitive.rs  |  6 ++
 arrow/src/array/array_string.rs | 25 +
 arrow/src/compute/kernels/comparison.rs | 11 ---
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/arrow/src/array/array_boolean.rs b/arrow/src/array/array_boolean.rs
index 5357614..9274e65 100644
--- a/arrow/src/array/array_boolean.rs
+++ b/arrow/src/array/array_boolean.rs
@@ -115,9 +115,11 @@ impl BooleanArray {
 
 /// Returns the boolean value at index `i`.
 ///
-/// Note this doesn't do any bound checking, for performance reason.
+/// Panics of offset `i` is out of bounds
 pub fn value(, i: usize) -> bool {
-debug_assert!(i < self.len());
+assert!(i < self.len());
+// Safety:
+// `i < self.len()
 unsafe { self.value_unchecked(i) }
 }
 }
diff --git a/arrow/src/array/array_primitive.rs 
b/arrow/src/array/array_primitive.rs
index 0765629..9c14f88 100644
--- a/arrow/src/array/array_primitive.rs
+++ b/arrow/src/array/array_primitive.rs
@@ -101,12 +101,10 @@ impl PrimitiveArray {
 
 /// Returns the primitive value at index `i`.
 ///
-/// Note this doesn't do any bound checking, for performance reason.
-/// # Safety
-/// caller must ensure that the passed in offset is less than the array 
len()
+/// Panics of offset `i` is out of bounds
 #[inline]
 pub fn value(, i: usize) -> T::Native {
-debug_assert!(i < self.len());
+assert!(i < self.len());
 unsafe { self.value_unchecked(i) }
 }
 
diff --git a/arrow/src/array/array_string.rs b/arrow/src/array/array_string.rs
index 0b48e57..2fa4c48 100644
--- a/arrow/src/array/array_string.rs
+++ b/arrow/src/array/array_string.rs
@@ -81,6 +81,7 @@ impl 
GenericStringArray {
 /// Returns the element at index
 /// # Safety
 /// caller is responsible for ensuring that index is within the array 
bounds
+#[inline]
 pub unsafe fn value_unchecked(, i: usize) ->  {
 let end = self.value_offsets().get_unchecked(i + 1);
 let start = self.value_offsets().get_unchecked(i);
@@ -103,28 +104,12 @@ impl 
GenericStringArray {
 }
 
 /// Returns the element at index `i` as 
+#[inline]
 pub fn value(, i: usize) ->  {
 assert!(i < self.data.len(), "StringArray out of bounds access");
-//Soundness: length checked above, offset buffer length is 1 larger 
than logical array length
-let end = unsafe { self.value_offsets().get_unchecked(i + 1) };
-let start = unsafe { self.value_offsets().get_unchecked(i) };
-
-// Soundness
-// pointer alignment & location is ensured by RawPtrBox
-// buffer bounds/offset is ensured by the value_offset invariants
-// ISSUE: utf-8 well formedness is not checked
-unsafe {
-// Safety of `to_isize().unwrap()`
-// `start` and `end` are , which is a generic type that 
implements the
-// OffsetSizeTrait. Currently, only i32 and i64 implement 
OffsetSizeTrait,
-// both of which should cleanly cast to isize on an architecture 
that supports
-// 32/64-bit offsets
-let slice = std::slice::from_raw_parts(
-self.value_data.as_ptr().offset(start.to_isize().unwrap()),
-(*end - *start).to_usize().unwrap(),
-);
-std::str::from_utf8_unchecked(slice)
-}
+// Safety:
+// `i < self.data.len()
+unsafe { self.value_unchecked(i) }
 }
 
 fn from_list(v: GenericListArray) -> Self {
diff --git a/arrow/src/compute/kernels/comparison.rs 
b/arrow/src/compute/kernels/comparison.rs
index f54d305..a899d5b 100644
--- a/arrow/src/compute/kernels/comparison.rs
+++ b/arrow/src/compute/kernels/comparison.rs
@@ -46,7 +46,10 @@ macro_rules! compare_op {
 let null_bit_buffer =
 combine_option_bitmap($left.data_ref(), $right.data_ref(), 
$left.len())?;
 
-let comparison = (0..$left.len()).map(|i| $op($left.value(i), 
$right.value(i)));
+// Safety:
+// `i < $left.len()` and $left.len() == $right.len()
+let comparison = (0..$left.len())
+.map(|i| unsafe { $o

[arrow-rs] branch master updated: update documentation (#648)

2021-08-02 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new fe1e1f6  update documentation (#648)
fe1e1f6 is described below

commit fe1e1f68eb78bf093b1f6faa62a0fddcf0a69f82
Author: Ruihang Xia 
AuthorDate: Tue Aug 3 01:32:41 2021 +0800

update documentation (#648)

Signed-off-by: Ruihang Xia 
---
 dev/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/README.md b/dev/README.md
index b4ea02b..f9d2070 100644
--- a/dev/README.md
+++ b/dev/README.md
@@ -30,8 +30,8 @@ We have provided a script to assist with verifying release 
candidates:
 bash dev/release/verify-release-candidate.sh 0.7.0 0
 ```
 
-Currently this only works on Linux (patches to expand to macOS welcome!). Read
-the script for information about system dependencies.
+This works on Linux and macOS. Read the script for information about system
+dependencies.
 
 On Windows, we have a script that verifies C++ and Python (requires Visual
 Studio 2015):


[arrow-rs] branch master updated: Fix data corruption in json decoder f64-to-i64 cast (#652)

2021-08-02 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new b075c3c  Fix data corruption in json decoder f64-to-i64 cast (#652)
b075c3c is described below

commit b075c3cef6d48e1e5ffce7e5c555d9c740885fae
Author: Christian Williams 
AuthorDate: Mon Aug 2 13:29:09 2021 -0400

Fix data corruption in json decoder f64-to-i64 cast (#652)

* Add failing test for JSON writer i64 bug

* Add special handling for i64/u64 to json decoder array builder

* Fix linter error - linter wants .flatten on a new line
---
 arrow/src/json/reader.rs| 13 +++--
 arrow/test/data/arrays.json |  2 +-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arrow/src/json/reader.rs b/arrow/src/json/reader.rs
index c4e8470..4912c5e 100644
--- a/arrow/src/json/reader.rs
+++ b/arrow/src/json/reader.rs
@@ -927,8 +927,16 @@ impl Decoder {
 rows.iter()
 .map(|row| {
 row.get(_name)
-.and_then(|value| value.as_f64())
-.and_then(num::cast::cast)
+.and_then(|value| {
+if value.is_i64() {
+value.as_i64().map(num::cast::cast)
+} else if value.is_u64() {
+value.as_u64().map(num::cast::cast)
+} else {
+value.as_f64().map(num::cast::cast)
+}
+})
+.flatten()
 })
 .collect::>(),
 ))
@@ -1933,6 +1941,7 @@ mod tests {
 .unwrap();
 assert_eq!(1, aa.value(0));
 assert_eq!(-10, aa.value(1));
+assert_eq!(162766868459400, aa.value(2));
 let bb = batch
 .column(b.0)
 .as_any()
diff --git a/arrow/test/data/arrays.json b/arrow/test/data/arrays.json
index 5dbdd19..6de2b03 100644
--- a/arrow/test/data/arrays.json
+++ b/arrow/test/data/arrays.json
@@ -1,3 +1,3 @@
 {"a":1, "b":[2.0, 1.3, -6.1], "c":[false, true], "d":"4"}
 {"a":-10, "b":[2.0, 1.3, -6.1], "c":[true, true], "d":"4"}
-{"a":2, "b":[2.0, null, -6.1], "c":[false, null], "d":"text"}
+{"a":162766868459400, "b":[2.0, null, -6.1], "c":[false, null], "d":"text"}


[arrow-rs] branch master updated (9be938e -> b38a4b6)

2021-07-31 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 9be938e  Minimal MapArray support (#491)
 add b38a4b6  Add human readable Format for parquet ByteArray (#642)

No new revisions were added by this update.

Summary of changes:
 parquet/src/data_type.rs | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)


[arrow-rs] branch master updated: Minimal MapArray support (#491)

2021-07-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 9be938e  Minimal MapArray support (#491)
9be938e is described below

commit 9be938e8d2847cf8d41bc59f0c907f23ff61cc3c
Author: Wakahisa 
AuthorDate: Sat Jul 31 07:20:56 2021 +0200

Minimal MapArray support (#491)

* add DataType::Map to datatypes

* barebones MapArray and MapBuilder

This commit adds the MapArray and MapBuilder.
The interfaces are however incomplete at this stage.

* minimal IPC read and write

* barebones MapArray (missed)

* add equality for map, relying on list

A map is a list with some specific rules, so for equality it is the same as 
a list

* json reader for MapArray

* add schema roundtrip

* read and write maps from/to arrow map

* clippy

* Calculate map levels separately

Avoids the generic case of list > struct > [ley, value], which adds overhead

* Fix map reader context and path

* Map array tests

* add doc comments and clean up code

* wip: review feedback

* add test for map

* fix clippy 1.54 lints
---
 arrow/src/array/array.rs   |  26 +++
 arrow/src/array/array_map.rs   | 421 +
 arrow/src/array/builder.rs | 211 +++
 arrow/src/array/data.rs|   5 +-
 arrow/src/array/equal/mod.rs   |  11 +-
 arrow/src/array/equal/utils.rs |   2 +-
 arrow/src/array/equal_json.rs  |  32 +++
 arrow/src/array/mod.rs |   2 +
 arrow/src/datatypes/datatype.rs|  31 +++
 arrow/src/datatypes/field.rs   |  33 +++
 arrow/src/datatypes/mod.rs | 175 +++
 arrow/src/ipc/convert.rs   |  18 ++
 arrow/src/ipc/reader.rs|  22 +-
 arrow/src/ipc/writer.rs|   4 +
 arrow/src/json/reader.rs   | 177 
 arrow/src/util/integration_util.rs |  50 +
 parquet/src/arrow/array_reader.rs  | 235 +++--
 parquet/src/arrow/arrow_reader.rs  |  16 ++
 parquet/src/arrow/arrow_writer.rs  |  39 
 parquet/src/arrow/levels.rs| 132 +++-
 parquet/src/arrow/schema.rs| 312 +--
 21 files changed, 1914 insertions(+), 40 deletions(-)

diff --git a/arrow/src/array/array.rs b/arrow/src/array/array.rs
index d715bc4..4702179 100644
--- a/arrow/src/array/array.rs
+++ b/arrow/src/array/array.rs
@@ -296,6 +296,7 @@ pub fn make_array(data: ArrayData) -> ArrayRef {
 DataType::List(_) => Arc::new(ListArray::from(data)) as ArrayRef,
 DataType::LargeList(_) => Arc::new(LargeListArray::from(data)) as 
ArrayRef,
 DataType::Struct(_) => Arc::new(StructArray::from(data)) as ArrayRef,
+DataType::Map(_, _) => Arc::new(MapArray::from(data)) as ArrayRef,
 DataType::Union(_) => Arc::new(UnionArray::from(data)) as ArrayRef,
 DataType::FixedSizeList(_, _) => {
 Arc::new(FixedSizeListArray::from(data)) as ArrayRef
@@ -452,6 +453,9 @@ pub fn new_null_array(data_type: , length: usize) 
-> ArrayRef {
 .map(|field| ArrayData::new_empty(field.data_type()))
 .collect(),
 )),
+DataType::Map(field, _keys_sorted) => {
+new_null_list_array::(data_type, field.data_type(), length)
+}
 DataType::Union(_) => {
 unimplemented!("Creating null Union array not yet supported")
 }
@@ -658,6 +662,28 @@ mod tests {
 }
 
 #[test]
+fn test_null_map() {
+let data_type = DataType::Map(
+Box::new(Field::new(
+"entry",
+DataType::Struct(vec![
+Field::new("key", DataType::Utf8, false),
+Field::new("key", DataType::Int32, true),
+]),
+false,
+)),
+false,
+);
+let array = new_null_array(_type, 9);
+let a = array.as_any().downcast_ref::().unwrap();
+assert_eq!(a.len(), 9);
+assert_eq!(a.value_offsets()[9], 0i32);
+for i in 0..9 {
+assert!(a.is_null(i));
+}
+}
+
+#[test]
 fn test_null_dictionary() {
 let values = vec![None, None, None, None, None, None, None, None, None]
 as Vec>;
diff --git a/arrow/src/array/array_map.rs b/arrow/src/array/array_map.rs
new file mode 100644
index 000..b10c39e
--- /dev/null
+++ b/arrow/src/array/array_map.rs
@@ -0,0 +1,421 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information

[arrow-rs] branch master updated: Remove Git SHA from created_by Parquet file metadata (#590)

2021-07-22 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f40f89  Remove Git SHA from created_by Parquet file metadata (#590)
9f40f89 is described below

commit 9f40f899e439d072fc859e0b4abf46776387e0d1
Author: Carol (Nichols || Goulding) 
<193874+carols10ce...@users.noreply.github.com>
AuthorDate: Thu Jul 22 04:17:17 2021 -0400

Remove Git SHA from created_by Parquet file metadata (#590)

So that Parquet files will contain the same content whether or not your
home directory is checked into Git or not ;)

Fixes #589.
---
 parquet/build.rs | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/parquet/build.rs b/parquet/build.rs
index b42b2a4..8aada18 100644
--- a/parquet/build.rs
+++ b/parquet/build.rs
@@ -15,29 +15,10 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use std::process::Command;
-
 fn main() {
-// Set Parquet version, build hash and "created by" string.
+// Set Parquet version and "created by" string.
 let version = env!("CARGO_PKG_VERSION");
-let mut created_by = format!("parquet-rs version {}", version);
-if let Ok(git_hash) = 
run(Command::new("git").arg("rev-parse").arg("HEAD")) {
-created_by.push_str(format!(" (build {})", git_hash).as_str());
-println!("cargo:rustc-env=PARQUET_BUILD={}", git_hash);
-}
+let created_by = format!("parquet-rs version {}", version);
 println!("cargo:rustc-env=PARQUET_VERSION={}", version);
 println!("cargo:rustc-env=PARQUET_CREATED_BY={}", created_by);
 }
-
-/// Runs command and returns either content of stdout for successful execution,
-/// or an error message otherwise.
-fn run(command:  Command) -> Result {
-println!("Running: `{:?}`", command);
-match command.output() {
-Ok(ref output) if output.status.success() => {
-Ok(String::from_utf8_lossy().trim().to_string())
-}
-Ok(ref output) => Err(format!("Failed: `{:?}` ({})", command, 
output.status)),
-Err(error) => Err(format!("Failed: `{:?}` ({})", command, error)),
-}
-}


[arrow-rs] branch master updated: Exclude .github in rat files (#551)

2021-07-14 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new fc78af6  Exclude .github in rat files (#551)
fc78af6 is described below

commit fc78af6324513cc3da9fea8c80658d85dfcd8263
Author: Andrew Lamb 
AuthorDate: Wed Jul 14 10:45:29 2021 -0400

Exclude .github in rat files (#551)
---
 dev/release/rat_exclude_files.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/dev/release/rat_exclude_files.txt 
b/dev/release/rat_exclude_files.txt
index d64a431..c5435d0 100644
--- a/dev/release/rat_exclude_files.txt
+++ b/dev/release/rat_exclude_files.txt
@@ -12,3 +12,4 @@ filtered_rat.txt
 rat.txt
 # auto-generated
 arrow-flight/src/arrow.flight.protocol.rs
+.github/*


[arrow-rs] branch master updated: refactor: remove lifetime from DynComparator (#542)

2021-07-14 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new fde79a2  refactor: remove lifetime from DynComparator (#542)
fde79a2 is described below

commit fde79a2d58ac4076d3450549ae042fc112ad026d
Author: Edd Robinson 
AuthorDate: Wed Jul 14 07:14:32 2021 +0100

refactor: remove lifetime from DynComparator (#542)

This commit removes the need for an explicit lifetime on the 
`DynComparator`.

The rationale behind this change is that callers may wish to share this 
comparator amongst threads and the explicit lifetime makes this harder to 
achieve.

As a nice side-effect, performance of the sort kernel seems to have 
improved:

```
$ critcmp master pr

group  master   
   pr
-  --   
   --
bool sort 2^12 1.03310.8±1.34µs 
   1.00302.8±7.78µs
bool sort nulls 2^12   1.01287.4±2.22µs 
   1.00284.0±3.23µs
sort 2^10  1.04 98.7±3.58µs 
   1.00 94.6±0.50µs
sort 2^12  1.05510.7±5.56µs 
   1.00486.2±9.94µs
sort 2^12 limit 10 1.05 48.1±0.38µs 
   1.00 45.6±0.30µs
sort 2^12 limit 1001.04 52.8±0.37µs 
   1.00 50.6±0.41µs
sort 2^12 limit 1000   1.06141.1±0.94µs 
   1.00132.7±0.95µs
sort 2^12 limit 2^12   1.03501.2±4.01µs 
   1.00486.5±4.87µs
sort nulls 2^101.02 70.9±0.72µs 
   1.00 69.4±0.51µs
sort nulls 2^121.02369.7±3.51µs 
   1.00   363.0±18.52µs
sort nulls 2^12 limit 10   1.01 70.6±1.22µs 
   1.00 70.0±1.27µs
sort nulls 2^12 limit 100  1.00 71.7±0.82µs 
   1.00 71.8±1.60µs
sort nulls 2^12 limit 1000 1.01 80.5±1.55µs 
   1.00 79.4±1.41µs
sort nulls 2^12 limit 2^12 1.05375.4±4.78µs 
   1.00356.1±3.04µs
```
---
 arrow/src/array/ord.rs| 48 ---
 arrow/src/compute/kernels/sort.rs |  6 ++---
 2 files changed, 22 insertions(+), 32 deletions(-)

diff --git a/arrow/src/array/ord.rs b/arrow/src/array/ord.rs
index 187542a..7fb4668 100644
--- a/arrow/src/array/ord.rs
+++ b/arrow/src/array/ord.rs
@@ -27,7 +27,7 @@ use crate::error::{ArrowError, Result};
 use num::Float;
 
 /// Compare the values at two arbitrary indices in two arrays.
-pub type DynComparator<'a> = Box Ordering + 'a>;
+pub type DynComparator = Box Ordering + Send + Sync>;
 
 /// compares two floats, placing NaNs at last
 fn cmp_nans_last(a: , b: ) -> Ordering {
@@ -39,60 +39,50 @@ fn cmp_nans_last(a: , b: ) -> Ordering {
 }
 }
 
-fn compare_primitives<'a, T: ArrowPrimitiveType>(
-left: &'a Array,
-right: &'a Array,
-) -> DynComparator<'a>
+fn compare_primitives(left: , right: ) -> 
DynComparator
 where
 T::Native: Ord,
 {
-let left = left.as_any().downcast_ref::>().unwrap();
-let right = right.as_any().downcast_ref::>().unwrap();
+let left: PrimitiveArray = PrimitiveArray::from(left.data().clone());
+let right: PrimitiveArray = PrimitiveArray::from(right.data().clone());
 Box::new(move |i, j| left.value(i).cmp((j)))
 }
 
-fn compare_boolean<'a>(left: &'a Array, right: &'a Array) -> DynComparator<'a> 
{
-let left = left.as_any().downcast_ref::().unwrap();
-let right = right.as_any().downcast_ref::().unwrap();
+fn compare_boolean(left: , right: ) -> DynComparator {
+let left: BooleanArray = BooleanArray::from(left.data().clone());
+let right: BooleanArray = BooleanArray::from(right.data().clone());
+
 Box::new(move |i, j| left.value(i).cmp((j)))
 }
 
-fn compare_float<'a, T: ArrowPrimitiveType>(
-left: &'a Array,
-right: &'a Array,
-) -> DynComparator<'a>
+fn compare_float(left: , right: ) -> 
DynComparator
 where
 T::Native: Float,
 {
-let left = left.as_any().downcast_ref::>().unwrap();
-let right = right.as_any().downcast_ref::>().unwrap();
+let left: PrimitiveArray = PrimitiveArray::from(left.data().clone());
+let right: PrimitiveArray = PrimitiveArray::from(right.data().clone());
 Box::new(move |i, j| cmp_nans_last((i), (j)))
 }
 
-fn compare_string<'a, T>(left: &'a Array, right: &'a Array) -> 

[arrow-rs] branch master updated: Fix build, Make the js package a feature that can be enabled for wasm, rather than always on (#545)

2021-07-13 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new cdcf013  Fix build, Make the js package a feature that can be enabled 
for wasm, rather than always on (#545)
cdcf013 is described below

commit cdcf013f610c169d2a2efa493d586c76da521053
Author: Andrew Lamb 
AuthorDate: Wed Jul 14 00:35:41 2021 -0400

Fix build, Make the js package a feature that can be enabled for wasm, 
rather than always on (#545)

* Fix build, add js feature

* fix command
---
 .github/workflows/rust.yml | 2 +-
 arrow/Cargo.toml   | 3 ++-
 arrow/README.md| 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/rust.yml b/.github/workflows/rust.yml
index 76511bf..5579072 100644
--- a/.github/workflows/rust.yml
+++ b/.github/workflows/rust.yml
@@ -332,7 +332,7 @@ jobs:
   export CARGO_HOME="/github/home/.cargo"
   export CARGO_TARGET_DIR="/github/home/target"
   cd arrow
-  cargo build --target wasm32-unknown-unknown
+  cargo build --features=js --target wasm32-unknown-unknown
 
   # test builds with various feature flags
   default-build:
diff --git a/arrow/Cargo.toml b/arrow/Cargo.toml
index eef7dbc..ca343eb 100644
--- a/arrow/Cargo.toml
+++ b/arrow/Cargo.toml
@@ -43,7 +43,7 @@ indexmap = "1.6"
 rand = { version = "0.8", default-features = false }
 # getrandom is a dependency of rand, not (directly) of arrow
 # need to specify `js` feature to build on wasm
-getrandom = { version = "0.2", features = ["js"] }
+getrandom = { version = "0.2", optional = true }
 num = "0.4"
 csv_crate = { version = "1.1", optional = true, package="csv" }
 regex = "1.3"
@@ -64,6 +64,7 @@ csv = ["csv_crate"]
 ipc = ["flatbuffers"]
 simd = ["packed_simd"]
 prettyprint = ["prettytable-rs"]
+js = ["getrandom/js"]
 # The test utils feature enables code used in benchmarks and tests but
 # not the core arrow code itself
 test_utils = ["rand/std", "rand/std_rng"]
diff --git a/arrow/README.md b/arrow/README.md
index f9b7308..77e36ec 100644
--- a/arrow/README.md
+++ b/arrow/README.md
@@ -30,6 +30,7 @@ The arrow crate provides the following optional features:
 - `csv` (default) - support for reading and writing Arrow arrays to/from csv 
files
 - `ipc` (default) - support for the 
[arrow-flight]((https://crates.io/crates/arrow-flight) IPC and wire format
 - `prettyprint` - support for formatting record batches as textual columns
+- `js` - support for building arrow for WebAssembly / JavaScript
 - `simd` - (_Requires Nightly Rust_) alternate optimized
   implementations of some 
[compute](https://github.com/apache/arrow/tree/master/rust/arrow/src/compute)
   kernels using explicit SIMD processor intrinsics.


[arrow-rs] branch master updated: Remove unused futures dependency from arrow-flight (#528)

2021-07-09 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 6538fe5  Remove unused futures dependency from arrow-flight (#528)
6538fe5 is described below

commit 6538fe597b5952af02f45b715d9363845583129b
Author: Andrew Lamb 
AuthorDate: Fri Jul 9 08:14:11 2021 -0400

Remove unused futures dependency from arrow-flight (#528)
---
 arrow-flight/Cargo.toml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arrow-flight/Cargo.toml b/arrow-flight/Cargo.toml
index 941cc2b..693da46 100644
--- a/arrow-flight/Cargo.toml
+++ b/arrow-flight/Cargo.toml
@@ -33,6 +33,8 @@ bytes = "1"
 prost = "0.7"
 prost-derive = "0.7"
 tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread"] }
+
+[dev-dependencies]
 futures = { version = "0.3", default-features = false, features = ["alloc"]}
 
 [build-dependencies]


[arrow-rs] branch master updated: simplify interactions with arrow flight APIs (#377)

2021-07-05 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 21d69ca  simplify interactions with arrow flight APIs (#377)
21d69ca is described below

commit 21d69cab9b21398b0947da28b5aac3e22139e818
Author: Gary Pennington <31890086+garyanap...@users.noreply.github.com>
AuthorDate: Mon Jul 5 07:44:48 2021 +0100

simplify interactions with arrow flight APIs (#377)

* simplify interactions with arrow flight APIs

Initial work to implement some basic traits

* more polishing and introduction of a couple of wrapper types

Some more polishing of the basic code I provided last week.

* More polishing

Add support for representing tickets as base64 encoded strings.

Also: more polishing of Display, etc...

* improve BOOLEAN writing logic and report error on encoding fail

When writing BOOLEAN data, writing more than 2048 rows of data will
overflow the hard-coded 256 buffer set for the bit-writer in the
PlainEncoder. Once this occurs, further attempts to write to the encoder
fail, becuase capacity is exceeded, but the errors are silently ignored.

This fix improves the error detection and reporting at the point of
encoding and modifies the logic for bit_writing (BOOLEANS). The
bit_writer is initially allocated 256 bytes (as at present), then each
time the capacity is exceeded the capacity is incremented by another
256 bytes.

This certainly resolves the current problem, but it's not exactly a
great fix because the capacity of the bit_writer could now grow
substantially.

Other data types seem to have a more sophisticated mechanism for writing
data which doesn't involve growing or having a fixed size buffer. It
would be desirable to make the BOOLEAN type use this same mechanism if
possible, but that level of change is more intrusive and probably
requires greater knowledge of the implementation than I possess.

resolves: #349

* only manipulate the bit_writer for BOOLEAN data

Tacky, but I can't think of better way to do this without
specialization.

* better isolation of changes

Remove the byte tracking from the PlainEncoder and use the existing
bytes_written() method in BitWriter.

This is neater.

* add test for boolean writer

The test ensures that we can write > 2048 rows to a parquet file and
that when we read the data back, it finishes without hanging (defined as
taking < 5 seconds).

If we don't want that extra complexity, we could remove the
thread/channel stuff and just try to read the file and let the test
runner terminate hanging tests.

* fix capacity calculation error in bool encoding

The values.len() reports the number of values to be encoded and so must
be divided by 8 (bits in a bytes) to determine the effect on the byte
capacity of the bit_writer.

* make BasicAuth accessible

Following merge with master, make sure this is exposed so that
integration tests work.

also: there has been a release since I last looked at this so update the
deprecation warnings.

* fix documentation for ipc_message_from_arrow_schema

TryFrom, not From

* replace deprecated functions in integrations tests with traits

clippy complains about using deprecated functions, so replace them with
the new trait support.

also: fix the trait documentation

* address review comments

 - update deprecated warnings
 - improve TryFrom for DescriptorType
---
 arrow-flight/Cargo.toml|   1 +
 arrow-flight/src/lib.rs| 429 -
 arrow-flight/src/utils.rs  | 137 ++-
 .../flight_client_scenarios/integration_test.rs|   7 +-
 .../flight_server_scenarios/integration_test.rs|  24 +-
 5 files changed, 484 insertions(+), 114 deletions(-)

diff --git a/arrow-flight/Cargo.toml b/arrow-flight/Cargo.toml
index c6027f8..04a1a93 100644
--- a/arrow-flight/Cargo.toml
+++ b/arrow-flight/Cargo.toml
@@ -27,6 +27,7 @@ license = "Apache-2.0"
 
 [dependencies]
 arrow = { path = "../arrow", version = "5.0.0-SNAPSHOT" }
+base64 = "0.13"
 tonic = "0.4"
 bytes = "1"
 prost = "0.7"
diff --git a/arrow-flight/src/lib.rs b/arrow-flight/src/lib.rs
index 6af2e74..a431cfc 100644
--- a/arrow-flight/src/lib.rs
+++ b/arrow-flight/src/lib.rs
@@ -15,6 +15,433 @@
 // specific language governing permissions and limitations
 // under the License.
 
-include!("arrow.flight.protocol.rs");
+use arrow::datatypes::Sch

[arrow-rs] branch master updated: fix reader schema (#513)

2021-06-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new ef88876  fix reader schema (#513)
ef88876 is described below

commit ef8887609017680d94b2a35f9889aa10cf3b3de8
Author: Wakahisa 
AuthorDate: Wed Jun 30 23:56:44 2021 +0200

fix reader schema (#513)

We aren't comparing the right values
---
 parquet/benches/arrow_array_reader.rs | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/parquet/benches/arrow_array_reader.rs 
b/parquet/benches/arrow_array_reader.rs
index 6e87512..acc5141 100644
--- a/parquet/benches/arrow_array_reader.rs
+++ b/parquet/benches/arrow_array_reader.rs
@@ -31,13 +31,9 @@ fn build_test_schema() -> SchemaDescPtr {
 let message_type = "
 message test_schema {
 REQUIRED INT32 mandatory_int32_leaf;
-REPEATED Group test_mid_int32 {
-OPTIONAL INT32 optional_int32_leaf;
-}
+OPTIONAL INT32 optional_int32_leaf;
 REQUIRED BYTE_ARRAY mandatory_string_leaf (UTF8);
-REPEATED Group test_mid_string {
-OPTIONAL BYTE_ARRAY optional_string_leaf (UTF8);
-}
+OPTIONAL BYTE_ARRAY optional_string_leaf (UTF8);
 }
 ";
 parse_message_type(message_type)


[arrow-rs] branch parquet-fix-list-reader created (now 3d6523a)

2021-06-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch parquet-fix-list-reader
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


  at 3d6523a  fix reader schema

This branch includes the following new commits:

 new 3d6523a  fix reader schema

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[arrow-rs] 01/01: fix reader schema

2021-06-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch parquet-fix-list-reader
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 3d6523afd89be5b0b3d681ab0b12073eb63c9fc6
Author: Neville Dipale 
AuthorDate: Sat Jun 26 13:08:40 2021 +0200

fix reader schema

We aren't comparing the right values
---
 parquet/benches/arrow_array_reader.rs | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/parquet/benches/arrow_array_reader.rs 
b/parquet/benches/arrow_array_reader.rs
index 6e87512..acc5141 100644
--- a/parquet/benches/arrow_array_reader.rs
+++ b/parquet/benches/arrow_array_reader.rs
@@ -31,13 +31,9 @@ fn build_test_schema() -> SchemaDescPtr {
 let message_type = "
 message test_schema {
 REQUIRED INT32 mandatory_int32_leaf;
-REPEATED Group test_mid_int32 {
-OPTIONAL INT32 optional_int32_leaf;
-}
+OPTIONAL INT32 optional_int32_leaf;
 REQUIRED BYTE_ARRAY mandatory_string_leaf (UTF8);
-REPEATED Group test_mid_string {
-OPTIONAL BYTE_ARRAY optional_string_leaf (UTF8);
-}
+OPTIONAL BYTE_ARRAY optional_string_leaf (UTF8);
 }
 ";
 parse_message_type(message_type)


[arrow-rs] branch master updated: Implement function slice for RecordBatch (#490)

2021-06-25 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new de62168  Implement function slice for RecordBatch (#490)
de62168 is described below

commit de62168a4f428e3c334e1cfa5c5db23272f313d7
Author: baishen 
AuthorDate: Fri Jun 25 11:36:44 2021 -0500

Implement function slice for RecordBatch (#490)

* Implement RecordBatch::slice()

* optimize

* optimize

* add test case

* fix clippy
---
 arrow/src/record_batch.rs | 91 +++
 1 file changed, 84 insertions(+), 7 deletions(-)

diff --git a/arrow/src/record_batch.rs b/arrow/src/record_batch.rs
index f1fd867..4d2abc3 100644
--- a/arrow/src/record_batch.rs
+++ b/arrow/src/record_batch.rs
@@ -244,6 +244,31 @@ impl RecordBatch {
 [..]
 }
 
+/// Return a new RecordBatch where each column is sliced
+/// according to `offset` and `length`
+///
+/// # Panics
+///
+/// Panics if `offset` with `length` is greater than column length.
+pub fn slice(, offset: usize, length: usize) -> RecordBatch {
+if self.schema.fields().is_empty() {
+assert!((offset + length) == 0);
+return RecordBatch::new_empty(self.schema.clone());
+}
+assert!((offset + length) <= self.num_rows());
+
+let columns = self
+.columns()
+.iter()
+.map(|column| column.slice(offset, length))
+.collect();
+
+Self {
+schema: self.schema.clone(),
+columns,
+}
+}
+
 /// Create a `RecordBatch` from an iterable list of pairs of the
 /// form `(field_name, array)`, with the same requirements on
 /// fields and arrays as [`RecordBatch::try_new`]. This method is
@@ -414,16 +439,68 @@ mod tests {
 let record_batch =
 RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a), 
Arc::new(b)])
 .unwrap();
-check_batch(record_batch)
+check_batch(record_batch, 5)
 }
 
-fn check_batch(record_batch: RecordBatch) {
-assert_eq!(5, record_batch.num_rows());
+fn check_batch(record_batch: RecordBatch, num_rows: usize) {
+assert_eq!(num_rows, record_batch.num_rows());
 assert_eq!(2, record_batch.num_columns());
 assert_eq!(::Int32, 
record_batch.schema().field(0).data_type());
 assert_eq!(::Utf8, 
record_batch.schema().field(1).data_type());
-assert_eq!(5, record_batch.column(0).data().len());
-assert_eq!(5, record_batch.column(1).data().len());
+assert_eq!(num_rows, record_batch.column(0).data().len());
+assert_eq!(num_rows, record_batch.column(1).data().len());
+}
+
+#[test]
+#[should_panic(expected = "assertion failed: (offset + length) <= 
self.num_rows()")]
+fn create_record_batch_slice() {
+let schema = Schema::new(vec![
+Field::new("a", DataType::Int32, false),
+Field::new("b", DataType::Utf8, false),
+]);
+let expected_schema = schema.clone();
+
+let a = Int32Array::from(vec![1, 2, 3, 4, 5, 6, 7, 8]);
+let b = StringArray::from(vec!["a", "b", "c", "d", "e", "f", "h", 
"i"]);
+
+let record_batch =
+RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a), 
Arc::new(b)])
+.unwrap();
+
+let offset = 2;
+let length = 5;
+let record_batch_slice = record_batch.slice(offset, length);
+
+assert_eq!(record_batch_slice.schema().as_ref(), _schema);
+check_batch(record_batch_slice, 5);
+
+let offset = 2;
+let length = 0;
+let record_batch_slice = record_batch.slice(offset, length);
+
+assert_eq!(record_batch_slice.schema().as_ref(), _schema);
+check_batch(record_batch_slice, 0);
+
+let offset = 2;
+let length = 10;
+let _record_batch_slice = record_batch.slice(offset, length);
+}
+
+#[test]
+#[should_panic(expected = "assertion failed: (offset + length) == 0")]
+fn create_record_batch_slice_empty_batch() {
+let schema = Schema::new(vec![]);
+
+let record_batch = RecordBatch::new_empty(Arc::new(schema));
+
+let offset = 0;
+let length = 0;
+let record_batch_slice = record_batch.slice(offset, length);
+assert_eq!(0, record_batch_slice.schema().fields().len());
+
+let offset = 1;
+let length = 2;
+let _record_batch_slice = record_batch.slice(offset, length);
 }
 
 #[test]
@@ -445,7 +522,7 @@ mod tests {
 Field::new("b", DataType::Utf8, false),
 ]);
 assert_eq

[arrow-rs] branch master updated: remove stale comment and update unit tests (#472)

2021-06-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e2f684  remove stale comment and update unit tests (#472)
6e2f684 is described below

commit 6e2f68420e03fe6926e8c2ffbd4441fc8cc1aeab
Author: Jiayu Liu 
AuthorDate: Sun Jun 20 00:40:15 2021 +0800

remove stale comment and update unit tests (#472)
---
 arrow/src/array/array_struct.rs | 24 ++--
 arrow/src/array/builder.rs  | 24 ++--
 2 files changed, 4 insertions(+), 44 deletions(-)

diff --git a/arrow/src/array/array_struct.rs b/arrow/src/array/array_struct.rs
index 9c11b83..f721d35 100644
--- a/arrow/src/array/array_struct.rs
+++ b/arrow/src/array/array_struct.rs
@@ -362,28 +362,8 @@ mod tests {
 .add_buffer(Buffer::from(&[1, 2, 0, 4].to_byte_slice()))
 .build();
 
-assert_eq!(_string_data, arr.column(0).data());
-
-// TODO: implement equality for ArrayData
-assert_eq!(expected_int_data.len(), arr.column(1).data().len());
-assert_eq!(
-expected_int_data.null_count(),
-arr.column(1).data().null_count()
-);
-assert_eq!(
-expected_int_data.null_bitmap(),
-arr.column(1).data().null_bitmap()
-);
-let expected_value_buf = expected_int_data.buffers()[0].clone();
-let actual_value_buf = arr.column(1).data().buffers()[0].clone();
-for i in 0..expected_int_data.len() {
-if !expected_int_data.is_null(i) {
-assert_eq!(
-expected_value_buf.as_slice()[i * 4..(i + 1) * 4],
-actual_value_buf.as_slice()[i * 4..(i + 1) * 4]
-);
-}
-}
+assert_eq!(expected_string_data, *arr.column(0).data());
+assert_eq!(expected_int_data, *arr.column(1).data());
 }
 
 #[test]
diff --git a/arrow/src/array/builder.rs b/arrow/src/array/builder.rs
index eacd764..66f2d81 100644
--- a/arrow/src/array/builder.rs
+++ b/arrow/src/array/builder.rs
@@ -3050,28 +3050,8 @@ mod tests {
 .add_buffer(Buffer::from_slice_ref(&[1, 2, 0, 4]))
 .build();
 
-assert_eq!(_string_data, arr.column(0).data());
-
-// TODO: implement equality for ArrayData
-assert_eq!(expected_int_data.len(), arr.column(1).data().len());
-assert_eq!(
-expected_int_data.null_count(),
-arr.column(1).data().null_count()
-);
-assert_eq!(
-expected_int_data.null_bitmap(),
-arr.column(1).data().null_bitmap()
-);
-let expected_value_buf = expected_int_data.buffers()[0].clone();
-let actual_value_buf = arr.column(1).data().buffers()[0].clone();
-for i in 0..expected_int_data.len() {
-if !expected_int_data.is_null(i) {
-assert_eq!(
-expected_value_buf.as_slice()[i * 4..(i + 1) * 4],
-actual_value_buf.as_slice()[i * 4..(i + 1) * 4]
-);
-}
-}
+assert_eq!(expected_string_data, *arr.column(0).data());
+assert_eq!(expected_int_data, *arr.column(1).data());
 }
 
 #[test]


[arrow-rs] branch master updated: remove unused patch file (#471)

2021-06-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bdbf9d  remove unused patch file (#471)
8bdbf9d is described below

commit 8bdbf9d593c9270a0fe6ed9746d8d96c2bb27a19
Author: Jiayu Liu 
AuthorDate: Sun Jun 20 00:07:58 2021 +0800

remove unused patch file (#471)
---
 arrow/format-0ed34c83.patch | 220 
 1 file changed, 220 deletions(-)

diff --git a/arrow/format-0ed34c83.patch b/arrow/format-0ed34c83.patch
deleted file mode 100644
index 5da0a0c..000
--- a/arrow/format-0ed34c83.patch
+++ /dev/null
@@ -1,220 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-diff --git a/format/Message.fbs b/format/Message.fbs
-index 1a7e0dfff..f1c18d765 100644
 a/format/Message.fbs
-+++ b/format/Message.fbs
-@@ -28,7 +28,7 @@ namespace org.apache.arrow.flatbuf;
- /// Metadata about a field at some level of a nested type tree (but not
- /// its children).
- ///
--/// For example, a List with values [[1, 2, 3], null, [4], [5, 6], 
null]
-+/// For example, a List with values `[[1, 2, 3], null, [4], [5, 6], 
null]`
- /// would have {length: 5, null_count: 2} for its List node, and {length: 6,
- /// null_count: 0} for its Int16 node, as separate FieldNode structs
- struct FieldNode {
-diff --git a/format/Schema.fbs b/format/Schema.fbs
-index 3b37e5d85..3b00dd478 100644
 a/format/Schema.fbs
-+++ b/format/Schema.fbs
-@@ -110,10 +110,11 @@ table FixedSizeList {
- /// not enforced.
- ///
- /// Map
-+/// ```text
- ///   - child[0] entries: Struct
- /// - child[0] key: K
- /// - child[1] value: V
--///
-+/// ```
- /// Neither the "entries" field nor the "key" field may be nullable.
- ///
- /// The metadata is structured so that Arrow systems without special handling
-@@ -129,7 +130,7 @@ enum UnionMode:short { Sparse, Dense }
- /// A union is a complex type with children in Field
- /// By default ids in the type vector refer to the offsets in the children
- /// optionally typeIds provides an indirection between the child offset and 
the type id
--/// for each child typeIds[offset] is the id used in the type vector
-+/// for each child `typeIds[offset]` is the id used in the type vector
- table Union {
-   mode: UnionMode;
-   typeIds: [ int ]; // optional, describes typeid of each child.
-diff --git a/format/SparseTensor.fbs b/format/SparseTensor.fbs
-index 3fe8a7582..a6fd2f9e7 100644
 a/format/SparseTensor.fbs
-+++ b/format/SparseTensor.fbs
-@@ -37,21 +37,21 @@ namespace org.apache.arrow.flatbuf;
- ///
- /// For example, let X be a 2x3x4x5 tensor, and it has the following
- /// 6 non-zero values:
--///
-+/// ```text
- ///   X[0, 1, 2, 0] := 1
- ///   X[1, 1, 2, 3] := 2
- ///   X[0, 2, 1, 0] := 3
- ///   X[0, 1, 3, 0] := 4
- ///   X[0, 1, 2, 1] := 5
- ///   X[1, 2, 0, 4] := 6
--///
-+/// ```
- /// In COO format, the index matrix of X is the following 4x6 matrix:
--///
-+/// ```text
- ///   [[0, 0, 0, 0, 1, 1],
- ///[1, 1, 1, 2, 1, 2],
- ///[2, 2, 3, 1, 2, 0],
- ///[0, 1, 0, 0, 3, 4]]
--///
-+/// ```
- /// When isCanonical is true, the indices is sorted in lexicographical order
- /// (row-major order), and it does not have duplicated entries.  Otherwise,
- /// the indices may not be sorted, or may have duplicated entries.
-@@ -86,26 +86,27 @@ table SparseMatrixIndexCSX {
- 
-   /// indptrBuffer stores the location and size of indptr array that
-   /// represents the range of the rows.
--  /// The i-th row spans from indptr[i] to indptr[i+1] in the data.
-+  /// The i-th row spans from `indptr[i]` to `indptr[i+1]` in the data.
-   /// The length of this array is 1 + (the number of rows), and the type
-   /// of index value is long.
-   ///
-   /// For example, let X be the following 6x4 matrix:
--  ///
-+  /// ```text
-   ///   X := [[0, 1, 2, 0],
-   /// [0, 0, 3, 0],
-   /// [0, 4, 0, 5],
-   /// [0, 0, 0, 0],
-   /// [6, 0, 7, 8],
-   /// [0, 9, 0, 0]].
--  ///
-+  /// ```
-   /// The array of non-zero

[arrow-rs] branch master updated: Implement the Iterator trait for the json Reader. (#451)

2021-06-12 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new e5cda31  Implement the Iterator trait for the json Reader. (#451)
e5cda31 is described below

commit e5cda312b697c3d610637b28c58b6f1b104b41cc
Author: Laurent Mazare 
AuthorDate: Sun Jun 13 08:22:38 2021 +0800

Implement the Iterator trait for the json Reader. (#451)

* Implement the Iterator trait for the json Reader.

* Use transpose.
---
 arrow/src/json/reader.rs | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/arrow/src/json/reader.rs b/arrow/src/json/reader.rs
index d0b9c19..9235142 100644
--- a/arrow/src/json/reader.rs
+++ b/arrow/src/json/reader.rs
@@ -1569,6 +1569,14 @@ impl ReaderBuilder {
 }
 }
 
+impl Iterator for Reader {
+type Item = Result;
+
+fn next( self) -> Option {
+self.next().transpose()
+}
+}
+
 #[cfg(test)]
 mod tests {
 use crate::{
@@ -2946,4 +2954,35 @@ mod tests {
 assert_eq!(batch.num_columns(), 1);
 assert_eq!(batch.num_rows(), 3);
 }
+
+#[test]
+fn test_json_iterator() {
+let builder = 
ReaderBuilder::new().infer_schema(None).with_batch_size(5);
+let reader: Reader = builder
+.build::(File::open("test/data/basic.json").unwrap())
+.unwrap();
+let schema = reader.schema();
+let (col_a_index, _) = schema.column_with_name("a").unwrap();
+
+let mut sum_num_rows = 0;
+let mut num_batches = 0;
+let mut sum_a = 0;
+for batch in reader {
+let batch = batch.unwrap();
+assert_eq!(4, batch.num_columns());
+sum_num_rows += batch.num_rows();
+num_batches += 1;
+let batch_schema = batch.schema();
+assert_eq!(schema, batch_schema);
+let a_array = batch
+.column(col_a_index)
+.as_any()
+.downcast_ref::()
+.unwrap();
+sum_a += (0..a_array.len()).map(|i| a_array.value(i)).sum::();
+}
+assert_eq!(12, sum_num_rows);
+assert_eq!(3, num_batches);
+assert_eq!(111, sum_a);
+}
 }


[arrow-rs] branch master updated: Add Decimal to CsvWriter and improve debug display (#406)

2021-06-12 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new fb45112  Add Decimal to CsvWriter and improve debug display (#406)
fb45112 is described below

commit fb451125c4ed49a425de10afb6f42af0d9723a19
Author: Ádám Lippai 
AuthorDate: Sun Jun 13 02:20:08 2021 +0200

Add Decimal to CsvWriter and improve debug display (#406)

* Add Decimal to CsvWriter and improve debug display

* Measure CSV writer instead of file and data creation

* Re-use decimal formatting
---
 arrow/benches/csv_writer.rs | 19 ++-
 arrow/src/array/array_binary.rs | 36 
 arrow/src/csv/writer.rs | 23 ---
 arrow/src/util/display.rs   | 27 ---
 4 files changed, 62 insertions(+), 43 deletions(-)

diff --git a/arrow/benches/csv_writer.rs b/arrow/benches/csv_writer.rs
index 50b94d6..62c5da9 100644
--- a/arrow/benches/csv_writer.rs
+++ b/arrow/benches/csv_writer.rs
@@ -28,14 +28,14 @@ use arrow::record_batch::RecordBatch;
 use std::fs::File;
 use std::sync::Arc;
 
-fn record_batches_to_csv() {
+fn criterion_benchmark(c:  Criterion) {
 #[cfg(feature = "csv")]
 {
 let schema = Schema::new(vec![
 Field::new("c1", DataType::Utf8, false),
 Field::new("c2", DataType::Float64, true),
 Field::new("c3", DataType::UInt32, false),
-Field::new("c3", DataType::Boolean, true),
+Field::new("c4", DataType::Boolean, true),
 ]);
 
 let c1 = StringArray::from(vec![
@@ -59,16 +59,17 @@ fn record_batches_to_csv() {
 let file = File::create("target/bench_write_csv.csv").unwrap();
 let mut writer = csv::Writer::new(file);
 let batches = vec![, , , , , , , , , , ];
-#[allow(clippy::unit_arg)]
-criterion::black_box(for batch in batches {
-writer.write(batch).unwrap()
+
+c.bench_function("record_batches_to_csv", |b| {
+b.iter(|| {
+#[allow(clippy::unit_arg)]
+criterion::black_box(for batch in  {
+writer.write(batch).unwrap()
+});
+});
 });
 }
 }
 
-fn criterion_benchmark(c:  Criterion) {
-c.bench_function("record_batches_to_csv", |b| 
b.iter(record_batches_to_csv));
-}
-
 criterion_group!(benches, criterion_benchmark);
 criterion_main!(benches);
diff --git a/arrow/src/array/array_binary.rs b/arrow/src/array/array_binary.rs
index 0cb4db4..0b374db 100644
--- a/arrow/src/array/array_binary.rs
+++ b/arrow/src/array/array_binary.rs
@@ -666,6 +666,17 @@ impl DecimalArray {
 self.length * i as i32
 }
 
+#[inline]
+pub fn value_as_string(, row: usize) -> String {
+let decimal_string = self.value(row).to_string();
+if self.scale == 0 {
+decimal_string
+} else {
+let splits = decimal_string.split_at(decimal_string.len() - 
self.scale);
+format!("{}.{}", splits.0, splits.1)
+}
+}
+
 pub fn from_fixed_size_list_array(
 v: FixedSizeListArray,
 precision: usize,
@@ -729,7 +740,9 @@ impl fmt::Debug for DecimalArray {
 fn fmt(, f:  fmt::Formatter) -> fmt::Result {
 write!(f, "DecimalArray<{}, {}>\n[\n", self.precision, self.scale)?;
 print_long_array(self, f, |array, index, f| {
-fmt::Debug::fmt((index), f)
+let formatted_decimal = array.value_as_string(index);
+
+write!(f, "{}", formatted_decimal)
 })?;
 write!(f, "]")
 }
@@ -758,7 +771,7 @@ impl Array for DecimalArray {
 #[cfg(test)]
 mod tests {
 use crate::{
-array::{LargeListArray, ListArray},
+array::{DecimalBuilder, LargeListArray, ListArray},
 datatypes::Field,
 };
 
@@ -1163,17 +1176,16 @@ mod tests {
 
 #[test]
 fn test_decimal_array_fmt_debug() {
-let values: [u8; 32] = [
-192, 219, 180, 17, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 36, 75, 
238, 253,
-255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
-];
-let array_data = ArrayData::builder(DataType::Decimal(23, 6))
-.len(2)
-.add_buffer(Buffer::from([..]))
-.build();
-let arr = DecimalArray::from(array_data);
+let values: Vec = vec![888700, -888700];
+let mut decimal_builder = DecimalBuilder::new(3, 23, 6);
+
+values.iter().for_each(|| {
+decimal_builder.append_value(value).unwrap();
+});
+decimal_builder.append_null().unwrap();
+let arr = decimal_builder.

[arrow-rs] branch master updated: remove unnecessary wraps in sortk (#445)

2021-06-12 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new efe86cd  remove unnecessary wraps in sortk (#445)
efe86cd is described below

commit efe86cdf329ec4bfad3b72bd23ee6558340fa297
Author: Jiayu Liu 
AuthorDate: Sun Jun 13 08:00:35 2021 +0800

remove unnecessary wraps in sortk (#445)
---
 arrow/src/compute/kernels/sort.rs | 96 +--
 1 file changed, 51 insertions(+), 45 deletions(-)

diff --git a/arrow/src/compute/kernels/sort.rs 
b/arrow/src/compute/kernels/sort.rs
index dff5695..b0eecb9 100644
--- a/arrow/src/compute/kernels/sort.rs
+++ b/arrow/src/compute/kernels/sort.rs
@@ -163,7 +163,7 @@ pub fn sort_to_indices(
 
 let (v, n) = partition_validity(values);
 
-match values.data_type() {
+Ok(match values.data_type() {
 DataType::Boolean => sort_boolean(values, v, n, , limit),
 DataType::Int8 => {
 sort_primitive::(values, v, n, cmp, , limit)
@@ -278,10 +278,12 @@ pub fn sort_to_indices(
 DataType::Float64 => {
 sort_list::(values, v, n, , limit)
 }
-t => Err(ArrowError::ComputeError(format!(
-"Sort not supported for list type {:?}",
-t
-))),
+t => {
+return Err(ArrowError::ComputeError(format!(
+"Sort not supported for list type {:?}",
+t
+)))
+}
 },
 DataType::LargeList(field) => match field.data_type() {
 DataType::Int8 => sort_list::(values, v, n, 
, limit),
@@ -304,10 +306,12 @@ pub fn sort_to_indices(
 DataType::Float64 => {
 sort_list::(values, v, n, , limit)
 }
-t => Err(ArrowError::ComputeError(format!(
-"Sort not supported for list type {:?}",
-t
-))),
+t => {
+return Err(ArrowError::ComputeError(format!(
+"Sort not supported for list type {:?}",
+t
+)))
+}
 },
 DataType::FixedSizeList(field, _) => match field.data_type() {
 DataType::Int8 => sort_list::(values, v, n, 
, limit),
@@ -330,10 +334,12 @@ pub fn sort_to_indices(
 DataType::Float64 => {
 sort_list::(values, v, n, , limit)
 }
-t => Err(ArrowError::ComputeError(format!(
-"Sort not supported for list type {:?}",
-t
-))),
+t => {
+return Err(ArrowError::ComputeError(format!(
+"Sort not supported for list type {:?}",
+t
+)))
+}
 },
 DataType::Dictionary(key_type, value_type)
 if *value_type.as_ref() == DataType::Utf8 =>
@@ -363,17 +369,21 @@ pub fn sort_to_indices(
 DataType::UInt64 => {
 sort_string_dictionary::(values, v, n, 
, limit)
 }
-t => Err(ArrowError::ComputeError(format!(
-"Sort not supported for dictionary key type {:?}",
-t
-))),
+t => {
+return Err(ArrowError::ComputeError(format!(
+"Sort not supported for dictionary key type {:?}",
+t
+)))
+}
 }
 }
-t => Err(ArrowError::ComputeError(format!(
-"Sort not supported for data type {:?}",
-t
-))),
-}
+t => {
+return Err(ArrowError::ComputeError(format!(
+"Sort not supported for data type {:?}",
+t
+)))
+}
+})
 }
 
 /// Options that define how sort kernels should behave
@@ -396,14 +406,13 @@ impl Default for SortOptions {
 }
 
 /// Sort primitive values
-#[allow(clippy::unnecessary_wraps)]
 fn sort_boolean(
 values: ,
 value_indices: Vec,
 null_indices: Vec,
 options: ,
 limit: Option,
-) -> Result {
+) -> UInt32Array {
 let values = values
 .as_any()
 .downcast_ref::()
@@ -469,11 +478,10 @@ fn sort_boolean(
 vec![],
 );
 
-Ok(UInt32Array::from(result_data))
+UInt32Array::from(result_data)
 }
 
 /// Sort primitive values
-#[allow(clippy::unnecessary_wraps)]
 fn sort_primitive(
 values: ,
 value_indices: Vec,
@@ -481,7 +489,7 @@ fn sort_primitive(
 cmp: F,
 options: ,
 limit: Option,
-) -> Result
+) -> UInt32Array
 where
 T

[arrow-datafusion] 01/01: add expr::like and expr::notlike to pruning logic

2021-06-05 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch i507-string-like-prune
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git

commit 1062d5c8e77291bd7ae2245b2f701c12d4d27310
Author: Neville Dipale 
AuthorDate: Sat Jun 5 11:57:56 2021 +0200

add expr::like and expr::notlike to pruning logic
---
 datafusion/src/physical_optimizer/pruning.rs | 96 +++-
 1 file changed, 94 insertions(+), 2 deletions(-)

diff --git a/datafusion/src/physical_optimizer/pruning.rs 
b/datafusion/src/physical_optimizer/pruning.rs
index c65733b..0e43e4e 100644
--- a/datafusion/src/physical_optimizer/pruning.rs
+++ b/datafusion/src/physical_optimizer/pruning.rs
@@ -42,6 +42,7 @@ use crate::{
 logical_plan::{Expr, Operator},
 optimizer::utils,
 physical_plan::{planner::DefaultPhysicalPlanner, ColumnarValue, 
PhysicalExpr},
+scalar::ScalarValue,
 };
 
 /// Interface to pass statistics information to [`PruningPredicates`]
@@ -548,7 +549,7 @@ fn build_predicate_expression(
 // allow partial failure in predicate expression generation
 // this can still produce a useful predicate when multiple conditions 
are joined using AND
 Err(_) => {
-return Ok(logical_plan::lit(true));
+return Ok(unhandled);
 }
 };
 let corrected_op = expr_builder.correct_operator(op);
@@ -586,8 +587,45 @@ fn build_predicate_expression(
 .min_column_expr()?
 .lt_eq(expr_builder.scalar_expr().clone())
 }
+Operator::Like => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>
+{
+let scalar_expr =
+
Expr::Literal(ScalarValue::Utf8(Some(string.replace('%', "";
+// Behaves like Eq
+let min_column_expr = expr_builder.min_column_expr()?;
+let max_column_expr = expr_builder.max_column_expr()?;
+min_column_expr
+.lt_eq(scalar_expr.clone())
+.and(scalar_expr.lt_eq(max_column_expr))
+}
+_ => unhandled,
+}
+}
+Operator::NotLike => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>
+{
+let scalar_expr =
+
Expr::Literal(ScalarValue::Utf8(Some(string.replace('%', "";
+// Behaves like Eq
+let min_column_expr = expr_builder.min_column_expr()?;
+let max_column_expr = expr_builder.max_column_expr()?;
+// Inverse of Like
+min_column_expr
+.gt_eq(scalar_expr.clone())
+.and(scalar_expr.gt_eq(max_column_expr))
+}
+_ => unhandled,
+}
+}
 // other expressions are not supported
-_ => logical_plan::lit(true),
+_ => unhandled,
 };
 Ok(statistics_expr)
 }
@@ -1096,6 +1134,60 @@ mod tests {
 }
 
 #[test]
+fn row_group_predicate_starts_with() -> Result<()> {
+let schema = Schema::new(vec![Field::new("c1", DataType::Utf8, true)]);
+// test LIKE operator that is converted to a 'starts_with'
+let expr = col("c1").like(lit("Banana%"));
+let expected_expr =
+"#c1_min LtEq Utf8(\"Banana\") And Utf8(\"Banana\") LtEq #c1_max";
+let predicate_expr =
+build_predicate_expression(, ,  
RequiredStatColumns::new())?;
+assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+
+Ok(())
+}
+
+#[test]
+fn row_group_predicate_like() -> Result<()> {
+let schema = Schema::new(vec![Field::new("c1", DataType::Utf8, true)]);
+// test LIKE operator that can't be converted to a 'starts_with'
+let expr = col("c1").like(lit("%Banana%"));
+let expected_expr = "Boolean(true)";
+let predicate_expr =
+build_predicate_expression(, ,  
RequiredStatColumns::new())?;
+assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+
+Ok(())
+}
+
+#[test]
+fn row_group_predicate_not_starts_with() -> Result<()> {
+let schema = Schema::new(vec![Field::new("c1", DataType::Utf8, true)]);
+// test LIKE operator that can't be converted to a 'star

[arrow-datafusion] branch i507-string-like-prune created (now 1062d5c)

2021-06-05 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch i507-string-like-prune
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git.


  at 1062d5c  add expr::like and expr::notlike to pruning logic

This branch includes the following new commits:

 new 1062d5c  add expr::like and expr::notlike to pruning logic

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[arrow-rs] branch master updated: use prettiery to auto format md files (#398)

2021-06-04 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ddc717  use prettiery to auto format md files (#398)
2ddc717 is described below

commit 2ddc7174af170e923c77d02ad9bd58027bd260e1
Author: Jiayu Liu 
AuthorDate: Sat Jun 5 13:01:58 2021 +0800

use prettiery to auto format md files (#398)
---
 .github/workflows/dev.yml  | 14 +++-
 CODE_OF_CONDUCT.md |  4 +--
 CONTRIBUTING.md| 26 +++
 README.md  | 34 ++--
 arrow/README.md| 36 ++---
 .../tests/fixtures/crossbow-success-message.md | 12 +++
 dev/release/README.md  | 35 
 integration-testing/README.md  | 10 +++---
 parquet/README.md  | 37 +++---
 9 files changed, 107 insertions(+), 101 deletions(-)

diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml
index 9d8146a..545cb97 100644
--- a/.github/workflows/dev.yml
+++ b/.github/workflows/dev.yml
@@ -27,7 +27,6 @@ env:
   ARCHERY_DOCKER_PASSWORD: ${{ secrets.DOCKERHUB_TOKEN }}
 
 jobs:
-
   lint:
 name: Lint C++, Python, R, Rust, Docker, RAT
 runs-on: ubuntu-latest
@@ -41,3 +40,16 @@ jobs:
 run: pip install -e dev/archery[docker]
   - name: Lint
 run: archery lint --rat
+  prettier:
+name: Use prettier to check formatting of documents
+runs-on: ubuntu-latest
+steps:
+  - uses: actions/checkout@v2
+  - uses: actions/setup-node@v2
+with:
+  node-version: "14"
+  - name: Prettier check
+run: |
+  # if you encounter error, try rerun the command below with --write 
instead of --check
+  # and commit the changes
+  npx prettier@2.3.0 --check 
{arrow,arrow-flight,dev,integration-testing,parquet}/**/*.md README.md 
CODE_OF_CONDUCT.md CONTRIBUTING.md
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
index 2efe740..9a24b9b 100644
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -19,6 +19,6 @@
 
 # Code of Conduct
 
-* [Code of Conduct for The Apache Software Foundation][1]
+- [Code of Conduct for The Apache Software Foundation][1]
 
-[1]: https://www.apache.org/foundation/policies/conduct.html
\ No newline at end of file
+[1]: https://www.apache.org/foundation/policies/conduct.html
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 3e636d9..18d6a7b 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -21,15 +21,15 @@
 
 ## Did you find a bug?
 
-The Arrow project uses JIRA as a bug tracker.  To report a bug, you'll have
+The Arrow project uses JIRA as a bug tracker. To report a bug, you'll have
 to first create an account on the
-[Apache Foundation JIRA](https://issues.apache.org/jira/).  The JIRA server
-hosts bugs and issues for multiple Apache projects.  The JIRA project name
+[Apache Foundation JIRA](https://issues.apache.org/jira/). The JIRA server
+hosts bugs and issues for multiple Apache projects. The JIRA project name
 for Arrow is "ARROW".
 
 To be assigned to an issue, ask an Arrow JIRA admin to go to
 [Arrow 
Roles](https://issues.apache.org/jira/plugins/servlet/project-config/ARROW/roles),
-click "Add users to a role," and add you to the "Contributor" role.  Most
+click "Add users to a role," and add you to the "Contributor" role. Most
 committers are authorized to do this; if you're a committer and aren't
 able to load that project admin page, have someone else add you to the
 necessary role.
@@ -39,15 +39,15 @@ Before you create a new bug entry, we recommend you first
 among existing Arrow issues.
 
 When you create a new JIRA entry, please don't forget to fill the "Component"
-field.  Arrow has many subcomponents and this helps triaging and filtering
-tremendously.  Also, we conventionally prefix the issue title with the 
component
+field. Arrow has many subcomponents and this helps triaging and filtering
+tremendously. Also, we conventionally prefix the issue title with the component
 name in brackets, such as "[C++] Crash in Array::Frobnicate()", so as to make
 lists more easy to navigate, and we'd be grateful if you did the same.
 
 ## Did you write a patch that fixes a bug or brings an improvement?
 
-First create a JIRA entry as described above.  Then, submit your changes
-as a GitHub Pull Request.  We'll ask you to prefix the pull request title
+First create a JIRA entry as described above. Then, submit your changes
+as a GitHub Pull Request. We'll ask you to prefix the pull request title
 with the JIRA issue number and the component name in brackets.
 

[arrow-rs] branch master updated: MINOR: update install instruction (#400)

2021-06-04 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new db63714  MINOR: update install instruction (#400)
db63714 is described below

commit db6371400ec4dae83e49859a13c8173f8501b1e4
Author: Ádám Lippai 
AuthorDate: Sat Jun 5 06:54:32 2021 +0200

MINOR: update install instruction (#400)

We have frequent releases and honoring semver, removed minor and patch 
version pinning
---
 parquet/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/parquet/README.md b/parquet/README.md
index 326c966..7f47b56 100644
--- a/parquet/README.md
+++ b/parquet/README.md
@@ -27,7 +27,7 @@ Add this to your Cargo.toml:
 
 ```toml
 [dependencies]
-parquet = "4.1.0"
+parquet = "^4"
 ```
 
 and this to your crate root:


[arrow-rs] branch master updated: Fix typo in release script, update release location (#380)

2021-05-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new f41cb17  Fix typo in release script, update release location (#380)
f41cb17 is described below

commit f41cb17066146552701bb7eb67bc13b2ef9ff1b6
Author: Andrew Lamb 
AuthorDate: Sun May 30 02:25:18 2021 -0400

Fix typo in release script, update release location (#380)

* Fix typo in release script

* release to `arrow-rs-{version}` directory
---
 dev/release/create-tarball.sh  | 2 +-
 dev/release/release-tarball.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/release/create-tarball.sh b/dev/release/create-tarball.sh
index ab3e1d2..9fadedf 100755
--- a/dev/release/create-tarball.sh
+++ b/dev/release/create-tarball.sh
@@ -73,7 +73,7 @@ echo ""
 echo "-"
 cat <https://dist.apache.org/repos/dist/release/arrow ${tmp_dir}/release
 
 echo "Copy ${version}-rc${rc} to release working copy"
-release_version=arrow-${version}
+release_version=arrow-rs-${version}
 mkdir -p ${tmp_dir}/release/${release_version}
 cp -r ${tmp_dir}/dev/* ${tmp_dir}/release/${release_version}/
 svn add ${tmp_dir}/release/${release_version}


[arrow-rs] branch active_release updated: Add crate badges (#362) (#373)

2021-05-27 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch active_release
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/active_release by this push:
 new 58d53cf  Add crate badges (#362) (#373)
58d53cf is described below

commit 58d53cfc8dcf018baf5e15097c3f8a402dc48ea1
Author: Andrew Lamb 
AuthorDate: Thu May 27 02:20:22 2021 -0400

Add crate badges (#362) (#373)

* Add crate badges

* Format markdown

Co-authored-by: Dominik Moritz 
---
 arrow-flight/README.md |  5 ++---
 arrow/README.md|  2 ++
 parquet/README.md  | 16 
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arrow-flight/README.md b/arrow-flight/README.md
index ba63f65..4205ebb 100644
--- a/arrow-flight/README.md
+++ b/arrow-flight/README.md
@@ -19,11 +19,10 @@
 
 # Apache Arrow Flight
 
+[![Crates.io](https://img.shields.io/crates/v/arrow-flight.svg)](https://crates.io/crates/arrow-flight)
+
 Apache Arrow Flight is a gRPC based protocol for exchanging Arrow data between 
processes. See the blog post [Introducing Apache Arrow Flight: A Framework for 
Fast Data 
Transport](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) 
for more information.
 
 This crate simply provides the Rust implementation of the 
[Flight.proto](../../format/Flight.proto) gRPC protocol and provides an example 
that demonstrates how to build a Flight server implemented with Tonic.
 
 Note that building a Flight server also requires an implementation of Arrow 
IPC which is based on the Flatbuffers serialization framework. The Rust 
implementation of Arrow IPC is not yet complete although the generated 
Flatbuffers code is available as part of the core Arrow crate.
-
-
-
diff --git a/arrow/README.md b/arrow/README.md
index 674c3fc..e873509 100644
--- a/arrow/README.md
+++ b/arrow/README.md
@@ -19,6 +19,8 @@
 
 # Native Rust implementation of Apache Arrow
 
+[![Crates.io](https://img.shields.io/crates/v/arrow.svg)](https://crates.io/crates/arrow)
+
 This crate contains a native Rust implementation of the [Arrow columnar 
format](https://arrow.apache.org/docs/format/Columnar.html).
 
 ## Developer's guide
diff --git a/parquet/README.md b/parquet/README.md
index 836a23b..d032fed 100644
--- a/parquet/README.md
+++ b/parquet/README.md
@@ -19,19 +19,25 @@
 
 # An Apache Parquet implementation in Rust
 
+[![Crates.io](https://img.shields.io/crates/v/parquet.svg)](https://crates.io/crates/parquet)
+
 ## Usage
+
 Add this to your Cargo.toml:
+
 ```toml
 [dependencies]
 parquet = "5.0.0-SNAPSHOT"
 ```
 
 and this to your crate root:
+
 ```rust
 extern crate parquet;
 ```
 
 Example usage of reading data:
+
 ```rust
 use std::fs::File;
 use std::path::Path;
@@ -44,6 +50,7 @@ while let Some(record) = iter.next() {
 println!("{}", record);
 }
 ```
+
 See [crate documentation](https://docs.rs/crate/parquet/5.0.0-SNAPSHOT) on 
available API.
 
 ## Upgrading from versions prior to 4.0
@@ -61,12 +68,14 @@ It is preferred that `LogicalType` is used, as it supports 
nanosecond
 precision timestamps without using the deprecated `Int96` Parquet type.
 
 ## Supported Parquet Version
+
 - Parquet-format 2.6.0
 
 To update Parquet format to a newer version, check if 
[parquet-format](https://github.com/sunchao/parquet-format-rs)
 version is available. Then simply update version of `parquet-format` crate in 
Cargo.toml.
 
 ## Features
+
 - [X] All encodings supported
 - [X] All compression codecs supported
 - [X] Read support
@@ -87,15 +96,18 @@ Parquet requires LLVM.  Our windows CI image includes LLVM 
but to build the libr
 users will have to install LLVM. Follow 
[this](https://github.com/appveyor/ci/issues/2651) link for info.
 
 ## Build
+
 Run `cargo build` or `cargo build --release` to build in release mode.
 Some features take advantage of SSE4.2 instructions, which can be
 enabled by adding `RUSTFLAGS="-C target-feature=+sse4.2"` before the
 `cargo build` command.
 
 ## Test
+
 Run `cargo test` for unit tests. To also run tests related to the binaries, 
use `cargo test --features cli`.
 
 ## Binaries
+
 The following binaries are provided (use `cargo install --features cli` to 
install them):
 - **parquet-schema** for printing Parquet file schema and metadata.
 `Usage: parquet-schema `, where `file-path` is the path to a 
Parquet file. Use `-v/--verbose` flag
@@ -111,16 +123,20 @@ be printed). Use `-j/--json` to print records in JSON 
lines format.
 files to read.
 
 If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is 
set properly:
+
 ```
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
 ```
 
 ## Benchmarks
+
 Run `cargo bench` for benchmarks.
 
 ## Docs
+
 To build documentation, run `cargo doc --no-deps`.
 To compile and view in the browser, run `cargo doc --no-deps --open`.
 
 ## License
+
 Licensed und

[arrow-rs] branch active_release updated: Only register Flight.proto with cargo if it exists (#351) (#374)

2021-05-27 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch active_release
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/active_release by this push:
 new f0702df  Only register Flight.proto with cargo if it exists (#351) 
(#374)
f0702df is described below

commit f0702df314434a1c79184c019b09d2aa2c39c00f
Author: Andrew Lamb 
AuthorDate: Thu May 27 02:19:50 2021 -0400

Only register Flight.proto with cargo if it exists (#351) (#374)

Co-authored-by: Raphael Taylor-Davies 
<1781103+tustv...@users.noreply.github.com>
---
 arrow-flight/build.rs | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arrow-flight/build.rs b/arrow-flight/build.rs
index bc84f37..1cbfceb 100644
--- a/arrow-flight/build.rs
+++ b/arrow-flight/build.rs
@@ -23,9 +23,6 @@ use std::{
 };
 
 fn main() -> Result<(), Box> {
-// avoid rerunning build if the file has not changed
-println!("cargo:rerun-if-changed=../format/Flight.proto");
-
 // override the build location, in order to check in the changes to proto 
files
 env::set_var("OUT_DIR", "src");
 
@@ -33,6 +30,9 @@ fn main() -> Result<(), Box> {
 // built or released so we build an absolute path to the proto file
 let path = Path::new("../format/Flight.proto");
 if path.exists() {
+// avoid rerunning build if the file has not changed
+println!("cargo:rerun-if-changed=../format/Flight.proto");
+
 tonic_build::compile_protos("../format/Flight.proto")?;
 // read file contents to string
 let mut file = OpenOptions::new()


[arrow-rs] branch master updated (7753f41 -> f26ffb3)

2021-05-26 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 7753f41  Only register Flight.proto with cargo if it exists (#351)
 add f26ffb3  Remove superfluous space (#363)

No new revisions were added by this update.

Summary of changes:
 .github/pull_request_template.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[arrow-rs] branch master updated (4a27a3b -> 7753f41)

2021-05-26 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 4a27a3b  Add crate badges (#362)
 add 7753f41  Only register Flight.proto with cargo if it exists (#351)

No new revisions were added by this update.

Summary of changes:
 arrow-flight/build.rs | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


[arrow-rs] branch master updated: Add crate badges (#362)

2021-05-26 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 4a27a3b  Add crate badges (#362)
4a27a3b is described below

commit 4a27a3b3c797e801d919ac30cd432f27f9a3d28c
Author: Dominik Moritz 
AuthorDate: Wed May 26 13:20:04 2021 -0700

Add crate badges (#362)

* Add crate badges

* Format markdown
---
 arrow-flight/README.md |  5 ++---
 arrow/README.md|  2 ++
 parquet/README.md  | 16 
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arrow-flight/README.md b/arrow-flight/README.md
index ba63f65..4205ebb 100644
--- a/arrow-flight/README.md
+++ b/arrow-flight/README.md
@@ -19,11 +19,10 @@
 
 # Apache Arrow Flight
 
+[![Crates.io](https://img.shields.io/crates/v/arrow-flight.svg)](https://crates.io/crates/arrow-flight)
+
 Apache Arrow Flight is a gRPC based protocol for exchanging Arrow data between 
processes. See the blog post [Introducing Apache Arrow Flight: A Framework for 
Fast Data 
Transport](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) 
for more information.
 
 This crate simply provides the Rust implementation of the 
[Flight.proto](../../format/Flight.proto) gRPC protocol and provides an example 
that demonstrates how to build a Flight server implemented with Tonic.
 
 Note that building a Flight server also requires an implementation of Arrow 
IPC which is based on the Flatbuffers serialization framework. The Rust 
implementation of Arrow IPC is not yet complete although the generated 
Flatbuffers code is available as part of the core Arrow crate.
-
-
-
diff --git a/arrow/README.md b/arrow/README.md
index 7c54da0..f67d582 100644
--- a/arrow/README.md
+++ b/arrow/README.md
@@ -19,6 +19,8 @@
 
 # Native Rust implementation of Apache Arrow
 
+[![Crates.io](https://img.shields.io/crates/v/arrow.svg)](https://crates.io/crates/arrow)
+
 This crate contains a native Rust implementation of the [Arrow columnar 
format](https://arrow.apache.org/docs/format/Columnar.html).
 
 ## Developer's guide
diff --git a/parquet/README.md b/parquet/README.md
index 836a23b..d032fed 100644
--- a/parquet/README.md
+++ b/parquet/README.md
@@ -19,19 +19,25 @@
 
 # An Apache Parquet implementation in Rust
 
+[![Crates.io](https://img.shields.io/crates/v/parquet.svg)](https://crates.io/crates/parquet)
+
 ## Usage
+
 Add this to your Cargo.toml:
+
 ```toml
 [dependencies]
 parquet = "5.0.0-SNAPSHOT"
 ```
 
 and this to your crate root:
+
 ```rust
 extern crate parquet;
 ```
 
 Example usage of reading data:
+
 ```rust
 use std::fs::File;
 use std::path::Path;
@@ -44,6 +50,7 @@ while let Some(record) = iter.next() {
 println!("{}", record);
 }
 ```
+
 See [crate documentation](https://docs.rs/crate/parquet/5.0.0-SNAPSHOT) on 
available API.
 
 ## Upgrading from versions prior to 4.0
@@ -61,12 +68,14 @@ It is preferred that `LogicalType` is used, as it supports 
nanosecond
 precision timestamps without using the deprecated `Int96` Parquet type.
 
 ## Supported Parquet Version
+
 - Parquet-format 2.6.0
 
 To update Parquet format to a newer version, check if 
[parquet-format](https://github.com/sunchao/parquet-format-rs)
 version is available. Then simply update version of `parquet-format` crate in 
Cargo.toml.
 
 ## Features
+
 - [X] All encodings supported
 - [X] All compression codecs supported
 - [X] Read support
@@ -87,15 +96,18 @@ Parquet requires LLVM.  Our windows CI image includes LLVM 
but to build the libr
 users will have to install LLVM. Follow 
[this](https://github.com/appveyor/ci/issues/2651) link for info.
 
 ## Build
+
 Run `cargo build` or `cargo build --release` to build in release mode.
 Some features take advantage of SSE4.2 instructions, which can be
 enabled by adding `RUSTFLAGS="-C target-feature=+sse4.2"` before the
 `cargo build` command.
 
 ## Test
+
 Run `cargo test` for unit tests. To also run tests related to the binaries, 
use `cargo test --features cli`.
 
 ## Binaries
+
 The following binaries are provided (use `cargo install --features cli` to 
install them):
 - **parquet-schema** for printing Parquet file schema and metadata.
 `Usage: parquet-schema `, where `file-path` is the path to a 
Parquet file. Use `-v/--verbose` flag
@@ -111,16 +123,20 @@ be printed). Use `-j/--json` to print records in JSON 
lines format.
 files to read.
 
 If you see `Library not loaded` error, please make sure `LD_LIBRARY_PATH` is 
set properly:
+
 ```
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib
 ```
 
 ## Benchmarks
+
 Run `cargo bench` for benchmarks.
 
 ## Docs
+
 To build documentation, run `cargo doc --no-deps`.
 To compile and view in the browser, run `cargo doc --no-deps --open`.
 
 ## License
+
 Licensed under the Apache License, Version 2.0: 
http://www.apache.org/licenses/LICENSE-2.0.


[arrow-rs] branch master updated: Version upgrades (#304)

2021-05-17 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new a959c85  Version upgrades (#304)
a959c85 is described below

commit a959c85f8e567e7f117445f78a7c524e57edfaf4
Author: Daniël Heres 
AuthorDate: Mon May 17 08:09:38 2021 +0200

Version upgrades (#304)
---
 arrow/Cargo.toml   | 2 +-
 parquet/Cargo.toml | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arrow/Cargo.toml b/arrow/Cargo.toml
index d66ac25..6d532ce 100644
--- a/arrow/Cargo.toml
+++ b/arrow/Cargo.toml
@@ -42,7 +42,7 @@ serde_json = { version = "1.0", features = ["preserve_order"] 
}
 indexmap = "1.6"
 rand = "0.7"
 csv = "1.1"
-num = "0.3"
+num = "0.4"
 regex = "1.3"
 lazy_static = "1.4"
 packed_simd = { version = "0.3.4", optional = true, package = "packed_simd_2" }
diff --git a/parquet/Cargo.toml b/parquet/Cargo.toml
index fc221b0..1e54047 100644
--- a/parquet/Cargo.toml
+++ b/parquet/Cargo.toml
@@ -38,11 +38,11 @@ snap = { version = "1.0", optional = true }
 brotli = { version = "3.3", optional = true }
 flate2 = { version = "1.0", optional = true }
 lz4 = { version = "1.23", optional = true }
-zstd = { version = "0.7", optional = true }
+zstd = { version = "0.8", optional = true }
 chrono = "0.4"
-num-bigint = "0.3"
+num-bigint = "0.4"
 arrow = { path = "../arrow", version = "5.0.0-SNAPSHOT", optional = true }
-base64 = { version = "0.12", optional = true }
+base64 = { version = "0.13", optional = true }
 clap = { version = "2.33.3", optional = true }
 serde_json = { version = "1.0", features = ["preserve_order"], optional = true 
}
 
@@ -53,7 +53,7 @@ snap = "1.0"
 brotli = "3.3"
 flate2 = "1.0"
 lz4 = "1.23"
-zstd = "0.7"
+zstd = "0.8"
 arrow = { path = "../arrow", version = "5.0.0-SNAPSHOT" }
 serde_json = { version = "1.0", features = ["preserve_order"] }
 


[arrow-rs] branch master updated: Fix subtraction underflow when sorting string arrays with many nulls (#285)

2021-05-13 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new ce8e67c  Fix subtraction underflow when sorting string arrays with 
many nulls (#285)
ce8e67c is described below

commit ce8e67c28ad1431cda36b38434e53871c2dd520a
Author: Michael Edwards 
AuthorDate: Thu May 13 13:28:46 2021 +0200

Fix subtraction underflow when sorting string arrays with many nulls (#285)
---
 arrow/src/compute/kernels/sort.rs | 285 --
 1 file changed, 274 insertions(+), 11 deletions(-)

diff --git a/arrow/src/compute/kernels/sort.rs 
b/arrow/src/compute/kernels/sort.rs
index 9287425..7cd463d 100644
--- a/arrow/src/compute/kernels/sort.rs
+++ b/arrow/src/compute/kernels/sort.rs
@@ -410,24 +410,27 @@ fn sort_boolean(
 len = limit.min(len);
 }
 if !descending {
-sort_by( valids, len - nulls_len, |a, b| cmp(a.1, b.1));
+sort_by( valids, len.saturating_sub(nulls_len), |a, b| {
+cmp(a.1, b.1)
+});
 } else {
-sort_by( valids, len - nulls_len, |a, b| cmp(a.1, b.1).reverse());
+sort_by( valids, len.saturating_sub(nulls_len), |a, b| {
+cmp(a.1, b.1).reverse()
+});
 // reverse to keep a stable ordering
 nulls.reverse();
 }
 
 // collect results directly into a buffer instead of a vec to avoid 
another aligned allocation
-let mut result = MutableBuffer::new(values.len() * 
std::mem::size_of::());
+let result_capacity = len * std::mem::size_of::();
+let mut result = MutableBuffer::new(result_capacity);
 // sets len to capacity so we can access the whole buffer as a typed slice
-result.resize(values.len() * std::mem::size_of::(), 0);
+result.resize(result_capacity, 0);
 let result_slice:  [u32] = result.typed_data_mut();
 
-debug_assert_eq!(result_slice.len(), nulls_len + valids_len);
-
 if options.nulls_first {
 let size = nulls_len.min(len);
-result_slice[0..nulls_len.min(len)].copy_from_slice();
+result_slice[0..size].copy_from_slice([0..size]);
 if nulls_len < len {
 insert_valid_values(result_slice, nulls_len, [0..len - 
size]);
 }
@@ -626,9 +629,13 @@ where
 len = limit.min(len);
 }
 if !descending {
-sort_by( valids, len - nulls_len, |a, b| cmp(a.1, b.1));
+sort_by( valids, len.saturating_sub(nulls_len), |a, b| {
+cmp(a.1, b.1)
+});
 } else {
-sort_by( valids, len - nulls_len, |a, b| cmp(a.1, b.1).reverse());
+sort_by( valids, len.saturating_sub(nulls_len), |a, b| {
+cmp(a.1, b.1).reverse()
+});
 // reverse to keep a stable ordering
 nulls.reverse();
 }
@@ -689,11 +696,11 @@ where
 len = limit.min(len);
 }
 if !descending {
-sort_by( valids, len - nulls_len, |a, b| {
+sort_by( valids, len.saturating_sub(nulls_len), |a, b| {
 cmp_array(a.1.as_ref(), b.1.as_ref())
 });
 } else {
-sort_by( valids, len - nulls_len, |a, b| {
+sort_by( valids, len.saturating_sub(nulls_len), |a, b| {
 cmp_array(a.1.as_ref(), b.1.as_ref()).reverse()
 });
 // reverse to keep a stable ordering
@@ -1285,6 +1292,48 @@ mod tests {
 None,
 vec![5, 0, 2, 1, 4, 3],
 );
+
+// valid values less than limit with extra nulls
+test_sort_to_indices_primitive_arrays::(
+vec![Some(2.0), None, None, Some(1.0)],
+Some(SortOptions {
+descending: false,
+nulls_first: false,
+}),
+Some(3),
+vec![3, 0, 1],
+);
+
+test_sort_to_indices_primitive_arrays::(
+vec![Some(2.0), None, None, Some(1.0)],
+Some(SortOptions {
+descending: false,
+nulls_first: true,
+}),
+Some(3),
+vec![1, 2, 3],
+);
+
+// more nulls than limit
+test_sort_to_indices_primitive_arrays::(
+vec![Some(1.0), None, None, None],
+Some(SortOptions {
+descending: false,
+nulls_first: true,
+}),
+Some(2),
+vec![1, 2],
+);
+
+test_sort_to_indices_primitive_arrays::(
+vec![Some(1.0), None, None, None],
+Some(SortOptions {
+descending: false,
+nulls_first: false,
+}),
+Some(2),
+vec![0, 1],
+);
 }
 
 #[test]
@@ -1329,6 +1378,48 @@ mod tests {
 Some(3),
 vec![5, 0, 2],
 );
+
+// valid values less than limit with extra nulls
+test_sort_to_indices_boolean_arr

[arrow-rs] branch master updated: Fix null struct and list roundtrip (#270)

2021-05-10 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 8226219  Fix null struct and list roundtrip (#270)
8226219 is described below

commit 8226219fe7104f6c8a2740806f96f02c960d991c
Author: Wakahisa 
AuthorDate: Tue May 11 07:42:41 2021 +0200

Fix null struct and list roundtrip (#270)

* fix null struct and list inconsistencies in writer

* fix list reader null and empty slot calculation

* remove stray TODOs
---
 parquet/src/arrow/array_reader.rs |  95 -
 parquet/src/arrow/arrow_writer.rs |  54 ++---
 parquet/src/arrow/levels.rs   | 430 +-
 3 files changed, 265 insertions(+), 314 deletions(-)

diff --git a/parquet/src/arrow/array_reader.rs 
b/parquet/src/arrow/array_reader.rs
index f209b8b..f54e446 100644
--- a/parquet/src/arrow/array_reader.rs
+++ b/parquet/src/arrow/array_reader.rs
@@ -615,6 +615,8 @@ pub struct ListArrayReader {
 item_type: ArrowType,
 list_def_level: i16,
 list_rep_level: i16,
+list_empty_def_level: i16,
+list_null_def_level: i16,
 def_level_buffer: Option,
 rep_level_buffer: Option,
 _marker: PhantomData,
@@ -628,6 +630,8 @@ impl 
ListArrayReader {
 item_type: ArrowType,
 def_level: i16,
 rep_level: i16,
+list_null_def_level: i16,
+list_empty_def_level: i16,
 ) -> Self {
 Self {
 item_reader,
@@ -635,6 +639,8 @@ impl 
ListArrayReader {
 item_type,
 list_def_level: def_level,
 list_rep_level: rep_level,
+list_null_def_level,
+list_empty_def_level,
 def_level_buffer: None,
 rep_level_buffer: None,
 _marker: PhantomData,
@@ -843,61 +849,49 @@ impl ArrayReader for 
ListArrayReader {
 // Where n is the max definition level of the list's parent.
 // If a Parquet schema's only leaf is the list, then n = 0.
 
-// TODO: ARROW-10391 - add a test case with a non-nullable child, 
check if max is 3
-let list_field_type = match self.get_data_type() {
-ArrowType::List(field)
-| ArrowType::FixedSizeList(field, _)
-| ArrowType::LargeList(field) => field,
-_ => {
-// Panic: this is safe as we only write lists from list 
datatypes
-unreachable!()
-}
-};
-let max_list_def_range = if list_field_type.is_nullable() { 3 } else { 
2 };
-let max_list_definition = *(def_levels.iter().max().unwrap());
-// TODO: ARROW-10391 - Find a reliable way of validating deeply-nested 
lists
-// debug_assert!(
-// max_list_definition >= max_list_def_range,
-// "Lift definition max less than range"
-// );
-let list_null_def = max_list_definition - max_list_def_range;
-let list_empty_def = max_list_definition - 1;
-let mut null_list_indices: Vec = Vec::new();
-for i in 0..def_levels.len() {
-if def_levels[i] == list_null_def {
-null_list_indices.push(i);
-}
-}
+// If the list index is at empty definition, the child slot is null
+let null_list_indices: Vec = def_levels
+.iter()
+.enumerate()
+.filter_map(|(index, def)| {
+if *def <= self.list_empty_def_level {
+Some(index)
+} else {
+None
+}
+})
+.collect();
 let batch_values = match null_list_indices.len() {
 0 => next_batch_array.clone(),
 _ => remove_indices(next_batch_array.clone(), item_type, 
null_list_indices)?,
 };
 
-// null list has def_level = 0
-// empty list has def_level = 1
-// null item in a list has def_level = 2
-// non-null item has def_level = 3
 // first item in each list has rep_level = 0, subsequent items have 
rep_level = 1
-
 let mut offsets: Vec = Vec::new();
 let mut cur_offset = OffsetSize::zero();
-for i in 0..rep_levels.len() {
-if rep_levels[i] == 0 {
-offsets.push(cur_offset)
+def_levels.iter().zip(rep_levels).for_each(|(d, r)| {
+if *r == 0 || d == _empty_def_level {
+offsets.push(cur_offset);
 }
-if def_levels[i] >= list_empty_def {
+if d > _empty_def_level {
 cur_offset += OffsetSize::one();
 }
-}
+});
 offsets.push(cur_offset);
 
 let num_bytes = bit_util::ceil(offsets.len(), 8);
-let mut null_buf = 
MutableBuffer::new(num_bytes).with_bitse

[arrow-rs] branch master updated: Speed up bound checking in `take` (#281)

2021-05-10 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 510f02f  Speed up bound checking in `take` (#281)
510f02f is described below

commit 510f02f449193bea9df3f423d18ce7a9e4112bdf
Author: Daniël Heres 
AuthorDate: Tue May 11 07:35:05 2021 +0200

Speed up bound checking in `take` (#281)

* WIP improve take performance

* WIP

* Bound checking speed

* Simplify

* fmt

* Improve formatting
---
 arrow/benches/take_kernels.rs | 19 ++-
 arrow/src/compute/kernels/take.rs | 25 +++--
 2 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arrow/benches/take_kernels.rs b/arrow/benches/take_kernels.rs
index 2853eb5..b1d03d7 100644
--- a/arrow/benches/take_kernels.rs
+++ b/arrow/benches/take_kernels.rs
@@ -23,7 +23,7 @@ use rand::Rng;
 
 extern crate arrow;
 
-use arrow::compute::take;
+use arrow::compute::{take, TakeOptions};
 use arrow::datatypes::*;
 use arrow::util::test_util::seedable_rng;
 use arrow::{array::*, util::bench_util::*};
@@ -46,6 +46,12 @@ fn bench_take(values:  Array, indices: ) {
 criterion::black_box(take(values, , None).unwrap());
 }
 
+fn bench_take_bounds_check(values:  Array, indices: ) {
+criterion::black_box(
+take(values, , Some(TakeOptions { check_bounds: true 
})).unwrap(),
+);
+}
+
 fn add_benchmark(c:  Criterion) {
 let values = create_primitive_array::(512, 0.0);
 let indices = create_random_index(512, 0.0);
@@ -56,6 +62,17 @@ fn add_benchmark(c:  Criterion) {
 b.iter(|| bench_take(, ))
 });
 
+let values = create_primitive_array::(512, 0.0);
+let indices = create_random_index(512, 0.0);
+c.bench_function("take check bounds i32 512", |b| {
+b.iter(|| bench_take_bounds_check(, ))
+});
+let values = create_primitive_array::(1024, 0.0);
+let indices = create_random_index(1024, 0.0);
+c.bench_function("take check bounds i32 1024", |b| {
+b.iter(|| bench_take_bounds_check(, ))
+});
+
 let indices = create_random_index(512, 0.5);
 c.bench_function("take i32 nulls 512", |b| {
 b.iter(|| bench_take(, ))
diff --git a/arrow/src/compute/kernels/take.rs 
b/arrow/src/compute/kernels/take.rs
index 0217573..d325ce4 100644
--- a/arrow/src/compute/kernels/take.rs
+++ b/arrow/src/compute/kernels/take.rs
@@ -100,17 +100,30 @@ where
 let options = options.unwrap_or_default();
 if options.check_bounds {
 let len = values.len();
-for i in 0..indices.len() {
-if indices.is_valid(i) {
-let ix = 
ToPrimitive::to_usize((i)).ok_or_else(|| {
+if indices.null_count() > 0 {
+indices.iter().flatten().try_for_each(|index| {
+let ix = ToPrimitive::to_usize().ok_or_else(|| {
 ArrowError::ComputeError("Cast to usize 
failed".to_string())
 })?;
 if ix >= len {
 return Err(ArrowError::ComputeError(
-format!("Array index out of bounds, cannot get item at 
index {} from {} entries", ix, len))
-);
+format!("Array index out of bounds, cannot get item at 
index {} from {} entries", ix, len))
+);
 }
-}
+Ok(())
+})?;
+} else {
+indices.values().iter().try_for_each(|index| {
+let ix = ToPrimitive::to_usize(index).ok_or_else(|| {
+ArrowError::ComputeError("Cast to usize 
failed".to_string())
+})?;
+if ix >= len {
+return Err(ArrowError::ComputeError(
+format!("Array index out of bounds, cannot get item at 
index {} from {} entries", ix, len))
+);
+}
+Ok(())
+})?
 }
 }
 match values.data_type() {


[arrow-rs] branch master updated: support full u32 and u64 roundtrip through parquet (#258)

2021-05-10 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 2f5f58a  support full u32 and u64 roundtrip through parquet (#258)
2f5f58a is described below

commit 2f5f58a2087be67b0109f0c8843c216a4fd1
Author: Marco Neumann 
AuthorDate: Mon May 10 18:44:58 2021 +0200

support full u32 and u64 roundtrip through parquet (#258)

* re-export arity kernels in `arrow::compute`

Seems logical since all other kernels are re-exported as well under this
flat hierarchy.

* return file from 
`parquet::arrow::arrow_writer::tests::[one_column]_roundtrip`

* support full arrow u64 through parquet

- updates arrow to parquet type mapping to use reinterpret/overflow cast
  for u64<->i64 similar to what the C++ stack does
- changes statistics calculation to account for the fact that u64 should
  be compared unsigned (as per spec)

Fixes #254.

* avoid copying array when reading u64 from parquet

* support full arrow u32 through parquet

This is idential to the solution we now have for u64.
---
 arrow/src/compute/mod.rs  |   1 +
 parquet/src/arrow/array_reader.rs |  30 ++--
 parquet/src/arrow/arrow_writer.rs | 141 +-
 parquet/src/column/writer.rs  |  59 
 4 files changed, 193 insertions(+), 38 deletions(-)

diff --git a/arrow/src/compute/mod.rs b/arrow/src/compute/mod.rs
index be1aa27..166f156 100644
--- a/arrow/src/compute/mod.rs
+++ b/arrow/src/compute/mod.rs
@@ -23,6 +23,7 @@ mod util;
 
 pub use self::kernels::aggregate::*;
 pub use self::kernels::arithmetic::*;
+pub use self::kernels::arity::*;
 pub use self::kernels::boolean::*;
 pub use self::kernels::cast::*;
 pub use self::kernels::comparison::*;
diff --git a/parquet/src/arrow/array_reader.rs 
b/parquet/src/arrow/array_reader.rs
index d125cf6..f209b8b 100644
--- a/parquet/src/arrow/array_reader.rs
+++ b/parquet/src/arrow/array_reader.rs
@@ -268,10 +268,29 @@ impl ArrayReader for PrimitiveArrayReader 
{
 }
 }
 
+let target_type = self.get_data_type().clone();
 let arrow_data_type = match T::get_physical_type() {
 PhysicalType::BOOLEAN => ArrowBooleanType::DATA_TYPE,
-PhysicalType::INT32 => ArrowInt32Type::DATA_TYPE,
-PhysicalType::INT64 => ArrowInt64Type::DATA_TYPE,
+PhysicalType::INT32 => {
+match target_type {
+ArrowType::UInt32 => {
+// follow C++ implementation and use 
overflow/reinterpret cast from  i32 to u32 which will map
+// `i32::MIN..0` to `(i32::MAX as u32)..u32::MAX`
+ArrowUInt32Type::DATA_TYPE
+}
+_ => ArrowInt32Type::DATA_TYPE,
+}
+}
+PhysicalType::INT64 => {
+match target_type {
+ArrowType::UInt64 => {
+// follow C++ implementation and use 
overflow/reinterpret cast from  i64 to u64 which will map
+// `i64::MIN..0` to `(i64::MAX as u64)..u64::MAX`
+ArrowUInt64Type::DATA_TYPE
+}
+_ => ArrowInt64Type::DATA_TYPE,
+}
+}
 PhysicalType::FLOAT => ArrowFloat32Type::DATA_TYPE,
 PhysicalType::DOUBLE => ArrowFloat64Type::DATA_TYPE,
 PhysicalType::INT96
@@ -343,15 +362,14 @@ impl ArrayReader for PrimitiveArrayReader 
{
 // are datatypes which we must convert explicitly.
 // These are:
 // - date64: we should cast int32 to date32, then date32 to date64.
-let target_type = self.get_data_type();
 let array = match target_type {
 ArrowType::Date64 => {
 // this is cheap as it internally reinterprets the data
 let a = arrow::compute::cast(, ::Date32)?;
-arrow::compute::cast(, target_type)?
+arrow::compute::cast(, _type)?
 }
 ArrowType::Decimal(p, s) => {
-let mut builder = DecimalBuilder::new(array.len(), *p, *s);
+let mut builder = DecimalBuilder::new(array.len(), p, s);
 match array.data_type() {
 ArrowType::Int32 => {
 let values = 
array.as_any().downcast_ref::().unwrap();
@@ -380,7 +398,7 @@ impl ArrayReader for PrimitiveArrayReader {
 }
 Arc::new(builder.finish()) as ArrayRef
 }
-_ => arrow::compute::cast(, target_type)?,
+_ => arrow::compute::cast(, _type)?,
 };
 
 // save 

[arrow-rs] branch nevi-me-patch-1 created (now 4e61130)

2021-05-10 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch nevi-me-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


  at 4e61130  Update PR template by commenting out instructions

This branch includes the following new commits:

 new 4e61130  Update PR template by commenting out instructions

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



[arrow-rs] 01/01: Update PR template by commenting out instructions

2021-05-10 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch nevi-me-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git

commit 4e6113026a186aff92ff304af5faffceefa1cdd4
Author: Wakahisa 
AuthorDate: Mon May 10 18:35:27 2021 +0200

Update PR template by commenting out instructions

Some contributors don't remove the guidelines when creating PRs, so it 
might be more convenient if we hide them behind comments.
The comments are still visible when editing, but are not displayed when the 
markdown is rendered
---
 .github/pull_request_template.md | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
index 5da0d08..95403e1 100644
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@@ -1,19 +1,31 @@
 # Which issue does this PR close?
 
+
 
 Closes #.
 
  # Rationale for this change
+ 
+ 
 
 # What changes are included in this PR?
 
+
 
 # Are there any user-facing changes?
 
+
+
 
+


[arrow-rs] branch master updated: Fix typo in csv/reader.rs (#265)

2021-05-06 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new a870b24  Fix typo in csv/reader.rs (#265)
a870b24 is described below

commit a870b24bd4eb76d3e0e5c718c9956a7dcdee52fd
Author: Dominik Moritz 
AuthorDate: Thu May 6 22:36:56 2021 -0700

Fix typo in csv/reader.rs (#265)
---
 arrow/src/csv/reader.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arrow/src/csv/reader.rs b/arrow/src/csv/reader.rs
index 9fafc38..00f1d7f 100644
--- a/arrow/src/csv/reader.rs
+++ b/arrow/src/csv/reader.rs
@@ -353,7 +353,7 @@ impl Reader {
 }
 
 // Initialize batch_records with StringRecords so they
-// can be reused accross batches
+// can be reused across batches
 let mut batch_records = Vec::with_capacity(batch_size);
 batch_records.resize_with(batch_size, Default::default);
 


[arrow-rs] branch master updated: Fix empty Schema::metadata deserialization error (#260)

2021-05-06 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 64ea8da  Fix empty Schema::metadata deserialization error (#260)
64ea8da is described below

commit 64ea8dae64b05a1a4ffcde739b02411219653dc2
Author: hulunbier 
AuthorDate: Fri May 7 13:32:32 2021 +0800

Fix empty Schema::metadata deserialization error (#260)

* Fix empty Schema::metadata deserialization error

Hope this fixes issue #241

* Rename UT name to `test_ser_de_metadata`

Co-authored-by: hulunbier 
---
 arrow/src/datatypes/schema.rs | 33 +
 1 file changed, 33 insertions(+)

diff --git a/arrow/src/datatypes/schema.rs b/arrow/src/datatypes/schema.rs
index ad89b29..cfc0744 100644
--- a/arrow/src/datatypes/schema.rs
+++ b/arrow/src/datatypes/schema.rs
@@ -35,6 +35,7 @@ pub struct Schema {
 pub(crate) fields: Vec,
 /// A map of key-value pairs containing additional meta data.
 #[serde(skip_serializing_if = "HashMap::is_empty")]
+#[serde(default)]
 pub(crate) metadata: HashMap,
 }
 
@@ -335,3 +336,35 @@ struct MetadataKeyValue {
 key: String,
 value: String,
 }
+
+#[cfg(test)]
+mod tests {
+use crate::datatypes::DataType;
+
+use super::*;
+
+#[test]
+fn test_ser_de_metadata() {
+// ser/de with empty metadata
+let mut schema = Schema::new(vec![
+Field::new("name", DataType::Utf8, false),
+Field::new("address", DataType::Utf8, false),
+Field::new("priority", DataType::UInt8, false),
+]);
+
+let json = serde_json::to_string().unwrap();
+let de_schema = serde_json::from_str().unwrap();
+
+assert_eq!(schema, de_schema);
+
+// ser/de with non-empty metadata
+schema.metadata = [("key".to_owned(), "val".to_owned())]
+.iter()
+.cloned()
+.collect();
+let json = serde_json::to_string().unwrap();
+let de_schema = serde_json::from_str().unwrap();
+
+assert_eq!(schema, de_schema);
+}
+}


[arrow-rs] branch master updated: Added env to run rust in integration. (#253)

2021-05-04 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 508f25c  Added env to run rust in integration. (#253)
508f25c is described below

commit 508f25c10032857da34ea88cc8166f0741616a32
Author: Jorge Leitao 
AuthorDate: Wed May 5 06:47:26 2021 +0200

Added env to run rust in integration. (#253)
---
 .github/workflows/integration.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/integration.yml 
b/.github/workflows/integration.yml
index 8dd2bd8..115bfad 100644
--- a/.github/workflows/integration.yml
+++ b/.github/workflows/integration.yml
@@ -48,4 +48,4 @@ jobs:
   - name: Setup Archery
 run: pip install -e dev/archery[docker]
   - name: Execute Docker Build
-run: archery docker run conda-integration
+run: archery docker run -e ARCHERY_INTEGRATION_WITH_RUST=1 
conda-integration


[arrow-rs] branch master updated: fix NaN handling in parquet statistics (#256)

2021-05-04 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
 new 04779e0  fix NaN handling in parquet statistics (#256)
04779e0 is described below

commit 04779e0b57efa2f88c75abc080cd5feb70737484
Author: Marco Neumann 
AuthorDate: Wed May 5 06:46:24 2021 +0200

fix NaN handling in parquet statistics (#256)

Closes #255.
---
 parquet/src/column/writer.rs | 91 +---
 1 file changed, 86 insertions(+), 5 deletions(-)

diff --git a/parquet/src/column/writer.rs b/parquet/src/column/writer.rs
index 0b56594..64e4880 100644
--- a/parquet/src/column/writer.rs
+++ b/parquet/src/column/writer.rs
@@ -921,12 +921,16 @@ impl ColumnWriterImpl {
 }
 }
 
+#[allow(clippy::eq_op)]
 fn update_page_min_max( self, val: ::T) {
-if self.min_page_value.as_ref().map_or(true, |min| min > val) {
-self.min_page_value = Some(val.clone());
-}
-if self.max_page_value.as_ref().map_or(true, |max| max < val) {
-self.max_page_value = Some(val.clone());
+// simple "isNaN" check that works for all types
+if val == val {
+if self.min_page_value.as_ref().map_or(true, |min| min > val) {
+self.min_page_value = Some(val.clone());
+}
+if self.max_page_value.as_ref().map_or(true, |max| max < val) {
+self.max_page_value = Some(val.clone());
+}
 }
 }
 
@@ -1652,6 +1656,68 @@ mod tests {
 );
 }
 
+#[test]
+fn test_float_statistics_nan_middle() {
+let stats = statistics_roundtrip::(&[1.0, f32::NAN, 2.0]);
+assert!(stats.has_min_max_set());
+if let Statistics::Float(stats) = stats {
+assert_eq!(stats.min(), &1.0);
+assert_eq!(stats.max(), &2.0);
+} else {
+panic!("expecting Statistics::Float");
+}
+}
+
+#[test]
+fn test_float_statistics_nan_start() {
+let stats = statistics_roundtrip::(&[f32::NAN, 1.0, 2.0]);
+assert!(stats.has_min_max_set());
+if let Statistics::Float(stats) = stats {
+assert_eq!(stats.min(), &1.0);
+assert_eq!(stats.max(), &2.0);
+} else {
+panic!("expecting Statistics::Float");
+}
+}
+
+#[test]
+fn test_float_statistics_nan_only() {
+let stats = statistics_roundtrip::(&[f32::NAN, f32::NAN]);
+assert!(!stats.has_min_max_set());
+assert!(matches!(stats, Statistics::Float(_)));
+}
+
+#[test]
+fn test_double_statistics_nan_middle() {
+let stats = statistics_roundtrip::(&[1.0, f64::NAN, 2.0]);
+assert!(stats.has_min_max_set());
+if let Statistics::Double(stats) = stats {
+assert_eq!(stats.min(), &1.0);
+assert_eq!(stats.max(), &2.0);
+} else {
+panic!("expecting Statistics::Float");
+}
+}
+
+#[test]
+fn test_double_statistics_nan_start() {
+let stats = statistics_roundtrip::(&[f64::NAN, 1.0, 2.0]);
+assert!(stats.has_min_max_set());
+if let Statistics::Double(stats) = stats {
+assert_eq!(stats.min(), &1.0);
+assert_eq!(stats.max(), &2.0);
+} else {
+panic!("expecting Statistics::Float");
+}
+}
+
+#[test]
+fn test_double_statistics_nan_only() {
+let stats = statistics_roundtrip::(&[f64::NAN, f64::NAN]);
+assert!(!stats.has_min_max_set());
+assert!(matches!(stats, Statistics::Double(_)));
+}
+
 /// Performs write-read roundtrip with randomly generated values and 
levels.
 /// `max_size` is maximum number of values or levels (if `max_def_level` > 
0) to write
 /// for a column.
@@ -1905,4 +1971,19 @@ mod tests {
 Ok(())
 }
 }
+
+/// Write data into parquet using [`get_test_page_writer`] and 
[`get_test_column_writer`] and returns generated statistics.
+fn statistics_roundtrip(values: &[::T]) -> 
Statistics {
+let page_writer = get_test_page_writer();
+let props = Arc::new(WriterProperties::builder().build());
+let mut writer = get_test_column_writer::(page_writer, 0, 0, props);
+writer.write_batch(values, None, None).unwrap();
+
+let (_bytes_written, _rows_written, metadata) = 
writer.close().unwrap();
+if let Some(stats) = metadata.statistics() {
+stats.clone()
+} else {
+panic!("metadata missing statistics");
+}
+}
 }


[arrow-datafusion] branch master updated: Revert "Add datafusion-python (#69)" (#257)

2021-05-04 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
 new d0af907  Revert "Add datafusion-python  (#69)" (#257)
d0af907 is described below

commit d0af907652aa8773d1de21dfd2f15bbcf6f50ce3
Author: Andy Grove 
AuthorDate: Tue May 4 08:51:44 2021 -0600

Revert "Add datafusion-python  (#69)" (#257)

This reverts commit 46bde0bd148aacf1677a575cb9ddbc154b6c4fb3.
---
 .github/workflows/python_build.yml |  89 ---
 .github/workflows/python_test.yaml |  58 
 Cargo.toml |   4 +-
 dev/release/rat_exclude_files.txt  |   1 -
 python/.cargo/config   |  22 ---
 python/.dockerignore   |  19 ---
 python/.gitignore  |  20 ---
 python/Cargo.toml  |  57 ---
 python/README.md   | 146 --
 python/pyproject.toml  |  20 ---
 python/rust-toolchain  |   1 -
 python/src/context.rs  | 115 ---
 python/src/dataframe.rs| 161 
 python/src/errors.rs   |  61 
 python/src/expression.rs   | 162 
 python/src/functions.rs| 165 -
 python/src/lib.rs  |  44 --
 python/src/scalar.rs   |  36 -
 python/src/to_py.rs|  77 --
 python/src/to_rust.rs  | 111 --
 python/src/types.rs|  76 --
 python/src/udaf.rs | 147 ---
 python/src/udf.rs  |  62 
 python/tests/__init__.py   |  16 --
 python/tests/generic.py|  75 --
 python/tests/test_df.py| 115 ---
 python/tests/test_sql.py   | 294 -
 python/tests/test_udaf.py  |  91 
 28 files changed, 1 insertion(+), 2244 deletions(-)

diff --git a/.github/workflows/python_build.yml 
b/.github/workflows/python_build.yml
deleted file mode 100644
index c86bb81..000
--- a/.github/workflows/python_build.yml
+++ /dev/null
@@ -1,89 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-name: Build
-on:
-  push:
-tags:
-  - v*
-
-jobs:
-  build-python-mac-win:
-name: Mac/Win
-runs-on: ${{ matrix.os }}
-strategy:
-  fail-fast: false
-  matrix:
-python-version: [3.6, 3.7, 3.8]
-os: [macos-latest, windows-latest]
-steps:
-  - uses: actions/checkout@v2
-
-  - uses: actions/setup-python@v1
-with:
-  python-version: ${{ matrix.python-version }}
-
-  - uses: actions-rs/toolchain@v1
-with:
-  toolchain: nightly-2021-01-06
-
-  - name: Install dependencies
-run: |
-  python -m pip install --upgrade pip
-  pip install maturin
-
-  - name: Build Python package
-run: cd python && maturin build --release --no-sdist --strip 
--interpreter python${{matrix.python_version}}
-
-  - name: List wheels
-if: matrix.os == 'windows-latest'
-run: dir python/target\wheels\
-
-  - name: List wheels
-if:  matrix.os != 'windows-latest'
-run: find ./python/target/wheels/
-
-  - name: Archive wheels
-uses: actions/upload-artifact@v2
-with:
-  name: dist
-  path: python/target/wheels/*
-
-  build-manylinux:
-name: Manylinux
-runs-on: ubuntu-latest
-steps:
-  - uses: actions/checkout@v2
-  - name: Build wheels
-run: docker run --rm -v $(pwd):/io konstin2/maturin build --release 
--manylinux
-  - name: Archive wheels
-uses: actions/upload-artifact@v2
-with:
-  name: dist
-  path: python/target/wheels/*
-
-  release:
-name: Publish in PyPI
-needs: [build-manylinux, build-python-mac-win]
-runs-on: ubuntu-latest
-steps:
-  - uses: actions/download-artifact@v2
-  - name: Publish to PyPI
-

[arrow-rs] branch master updated (8f030db -> 6a65543)

2021-05-03 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 8f030db  Made integration tests always run. (#248)
 add 6a65543  fix parquet max_definition for non-null structs (#246)

No new revisions were added by this update.

Summary of changes:
 parquet/src/arrow/arrow_writer.rs |  60 --
 parquet/src/arrow/levels.rs   | 124 +++---
 2 files changed, 170 insertions(+), 14 deletions(-)


[arrow-rs] branch master updated (51513c1 -> 111d5d6)

2021-04-27 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git.


from 51513c1  ARROW-12411: [Rust] Create RecordBatches from Iterators (#7)
 add 111d5d6  Support string dictionaries in csv reader (#228) (#229)

No new revisions were added by this update.

Summary of changes:
 arrow/src/csv/reader.rs | 147 +++-
 1 file changed, 121 insertions(+), 26 deletions(-)


[arrow] branch master updated (249fa7c -> 892776f)

2021-03-31 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 249fa7c  ARROW-12123: [Rust][DataFusion] Use smallvec for indices for 
better join performance
 add 892776f  ARROW-12153: [Rust] [Parquet] Return file stats after writing 
file

No new revisions were added by this update.

Summary of changes:
 rust/datafusion/src/execution/context.rs |  2 +-
 rust/parquet/src/arrow/arrow_reader.rs   |  2 +-
 rust/parquet/src/arrow/arrow_writer.rs   |  2 +-
 rust/parquet/src/file/writer.rs  | 14 +++---
 4 files changed, 10 insertions(+), 10 deletions(-)


[arrow] branch master updated (8de898d -> cd4379d)

2021-03-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 8de898d  ARROW-12138: [Go][IPC] Update flatbuffers definitions
 add cd4379d  ARROW-12121: [Rust] [Parquet] Arrow writer benchmarks

No new revisions were added by this update.

Summary of changes:
 rust/parquet/Cargo.toml  |   5 +
 rust/parquet/benches/arrow_writer.rs | 201 +++
 2 files changed, 206 insertions(+)
 create mode 100644 rust/parquet/benches/arrow_writer.rs


[arrow] branch master updated (9aa0f85 -> 4de0ed7)

2021-03-30 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 9aa0f85  ARROW-11973 [Rust][DataFusion] Boolean kleene kernels
 add 4de0ed7  ARROW-12120: [Rust] Generate random arrays and batches

No new revisions were added by this update.

Summary of changes:
 rust/arrow/benches/aggregate_kernels.rs  |   4 +-
 rust/arrow/benches/comparison_kernels.rs |   2 +-
 rust/arrow/benches/concatenate_kernel.rs |   8 +-
 rust/arrow/benches/equal.rs  |   4 +-
 rust/arrow/benches/filter_kernels.rs |   2 +-
 rust/arrow/benches/mutable_array.rs  |   4 +-
 rust/arrow/benches/take_kernels.rs   |  12 +-
 rust/arrow/src/util/bench_util.rs|  50 -
 rust/arrow/src/util/data_gen.rs  | 347 +++
 rust/arrow/src/util/mod.rs   |   1 +
 10 files changed, 415 insertions(+), 19 deletions(-)
 create mode 100644 rust/arrow/src/util/data_gen.rs


[arrow] branch master updated: ARROW-12043: [Rust] [Parquet] Write FSB arrays

2021-03-28 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 894dd17  ARROW-12043: [Rust] [Parquet] Write FSB arrays
894dd17 is described below

commit 894dd17c9602439c2b84c0b849fb0966606ceb1c
Author: Neville Dipale 
AuthorDate: Sun Mar 28 11:01:56 2021 +0200

ARROW-12043: [Rust] [Parquet] Write FSB arrays

Minor change to compute the levels for FSB arrays and write them out. Added 
a roundtrip test.

Closes #9771 from nevi-me/ARROW-12043

Authored-by: Neville Dipale 
Signed-off-by: Neville Dipale 
---
 rust/parquet/src/arrow/arrow_writer.rs | 28 ++--
 rust/parquet/src/arrow/levels.rs   | 30 --
 rust/parquet/src/arrow/mod.rs  |  2 +-
 3 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/rust/parquet/src/arrow/arrow_writer.rs 
b/rust/parquet/src/arrow/arrow_writer.rs
index 1ce907f..a3577ca 100644
--- a/rust/parquet/src/arrow/arrow_writer.rs
+++ b/rust/parquet/src/arrow/arrow_writer.rs
@@ -146,7 +146,8 @@ fn write_leaves(
 | ArrowDataType::Binary
 | ArrowDataType::Utf8
 | ArrowDataType::LargeUtf8
-| ArrowDataType::Decimal(_, _) => {
+| ArrowDataType::Decimal(_, _)
+| ArrowDataType::FixedSizeBinary(_) => {
 let mut col_writer = get_col_writer( row_group_writer)?;
 write_leaf(
  col_writer,
@@ -189,11 +190,14 @@ fn write_leaves(
 ArrowDataType::Float16 => Err(ParquetError::ArrowError(
 "Float16 arrays not supported".to_string(),
 )),
-ArrowDataType::FixedSizeList(_, _)
-| ArrowDataType::FixedSizeBinary(_)
-| ArrowDataType::Union(_) => Err(ParquetError::NYI(
-"Attempting to write an Arrow type that is not yet 
implemented".to_string(),
-)),
+ArrowDataType::FixedSizeList(_, _) | ArrowDataType::Union(_) => {
+Err(ParquetError::NYI(
+format!(
+"Attempting to write an Arrow type {:?} to parquet that is 
not yet implemented", 
+array.data_type()
+)
+))
+}
 }
 }
 
@@ -1225,6 +1229,18 @@ mod tests {
 }
 
 #[test]
+fn fixed_size_binary_single_column() {
+let mut builder = FixedSizeBinaryBuilder::new(16, 4);
+builder.append_value(b"0123").unwrap();
+builder.append_null().unwrap();
+builder.append_value(b"8910").unwrap();
+builder.append_value(b"1112").unwrap();
+let array = Arc::new(builder.finish());
+
+one_column_roundtrip("timestamp_millisecond_single_column", array, 
true);
+}
+
+#[test]
 fn string_single_column() {
 let raw_values: Vec<_> = (0..SMALL_SIZE).map(|i| 
i.to_string()).collect();
 let raw_strs = raw_values.iter().map(|s| s.as_str());
diff --git a/rust/parquet/src/arrow/levels.rs b/rust/parquet/src/arrow/levels.rs
index 641e330..2168670 100644
--- a/rust/parquet/src/arrow/levels.rs
+++ b/rust/parquet/src/arrow/levels.rs
@@ -136,7 +136,8 @@ impl LevelInfo {
 | DataType::Interval(_)
 | DataType::Binary
 | DataType::LargeBinary
-| DataType::Decimal(_, _) => {
+| DataType::Decimal(_, _)
+| DataType::FixedSizeBinary(_) => {
 // we return a vector of 1 value to represent the primitive
 vec![self.calculate_child_levels(
 array_offsets,
@@ -145,7 +146,6 @@ impl LevelInfo {
 field.is_nullable(),
 )]
 }
-DataType::FixedSizeBinary(_) => unimplemented!(),
 DataType::List(list_field) | DataType::LargeList(list_field) => {
 // Calculate the list level
 let list_level = self.calculate_child_levels(
@@ -189,7 +189,8 @@ impl LevelInfo {
 | DataType::Utf8
 | DataType::LargeUtf8
 | DataType::Dictionary(_, _)
-| DataType::Decimal(_, _) => {
+| DataType::Decimal(_, _)
+| DataType::FixedSizeBinary(_) => {
 vec![list_level.calculate_child_levels(
 child_offsets,
 child_mask,
@@ -197,7 +198,6 @@ impl LevelInfo {
 list_field.is_nullable(),
 )]
 }
-DataType::FixedSizeBinary(_) => unimplemented!(),
 DataType::List(_) | DataType::LargeList(_) | 
DataType::Struct(_) => {
 list_level.calculate_ar

[arrow] branch master updated: ARROW-12116: [Rust] Fix and ignore 1.51 clippy lints

2021-03-27 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 60011c0  ARROW-12116: [Rust] Fix and ignore 1.51 clippy lints
60011c0 is described below

commit 60011c081508b09724469d7a4d1d93b4bd015fe4
Author: Neville Dipale 
AuthorDate: Sun Mar 28 00:59:48 2021 +0200

ARROW-12116: [Rust] Fix and ignore 1.51 clippy lints

There's an acronym Rust lint that started failing after 1.51 was announced. 
The lint is in the `arrow::ffi` and `arrow::ipc::gen` modules, so I'm instead 
ignoring it and documenting this.

Closes #9815 from nevi-me/1-51-lints

Authored-by: Neville Dipale 
Signed-off-by: Neville Dipale 
---
 rust/arrow/src/lib.rs   | 13 ++---
 rust/arrow/src/util/pretty.rs   |  3 +--
 rust/datafusion/src/lib.rs  |  4 +++-
 rust/datafusion/src/logical_plan/builder.rs |  4 ++--
 rust/parquet/src/lib.rs |  5 -
 5 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/rust/arrow/src/lib.rs b/rust/arrow/src/lib.rs
index 68a820b..30f968c9 100644
--- a/rust/arrow/src/lib.rs
+++ b/rust/arrow/src/lib.rs
@@ -129,11 +129,18 @@
 #![cfg_attr(feature = "avx512", feature(avx512_target_feature))]
 #![allow(dead_code)]
 #![allow(non_camel_case_types)]
+#![deny(clippy::redundant_clone)]
+#![allow(
+// introduced to ignore lint errors when upgrading from 2020-04-22 to 
2020-11-14
+clippy::float_equality_without_abs,
+clippy::type_complexity,
+// upper_case_acronyms lint was introduced in Rust 1.51.
+// It is triggered in the ffi module, and ipc::gen, which we have no 
control over
+clippy::upper_case_acronyms,
+clippy::vec_init_then_push
+)]
 #![allow(bare_trait_objects)]
 #![warn(missing_debug_implementations)]
-#![deny(clippy::redundant_clone)]
-// introduced to ignore lint errors when upgrading from 2020-04-22 to 
2020-11-14
-#![allow(clippy::float_equality_without_abs, clippy::type_complexity)]
 
 pub mod alloc;
 mod arch;
diff --git a/rust/arrow/src/util/pretty.rs b/rust/arrow/src/util/pretty.rs
index 7baf559..f354899 100644
--- a/rust/arrow/src/util/pretty.rs
+++ b/rust/arrow/src/util/pretty.rs
@@ -93,8 +93,7 @@ fn create_column(field: , columns: &[ArrayRef]) -> 
Result {
 
 for col in columns {
 for row in 0..col.len() {
-let mut cells = Vec::new();
-cells.push(Cell::new(_value_to_string(, row)?));
+let cells = vec![Cell::new(_value_to_string(, row)?)];
 table.add_row(Row::new(cells));
 }
 }
diff --git a/rust/datafusion/src/lib.rs b/rust/datafusion/src/lib.rs
index 3e1e1e2..2733430 100644
--- a/rust/datafusion/src/lib.rs
+++ b/rust/datafusion/src/lib.rs
@@ -18,9 +18,11 @@
 // Clippy lints, some should be disabled incrementally
 #![allow(
 clippy::float_cmp,
+clippy::from_over_into,
 clippy::module_inception,
 clippy::new_without_default,
-clippy::type_complexity
+clippy::type_complexity,
+clippy::upper_case_acronyms
 )]
 
 //! [DataFusion](https://github.com/apache/arrow/tree/master/rust/datafusion)
diff --git a/rust/datafusion/src/logical_plan/builder.rs 
b/rust/datafusion/src/logical_plan/builder.rs
index aa0380e..e748872 100644
--- a/rust/datafusion/src/logical_plan/builder.rs
+++ b/rust/datafusion/src/logical_plan/builder.rs
@@ -303,8 +303,8 @@ impl LogicalPlanBuilder {
 
 Ok(Self::from(::Aggregate {
 input: Arc::new(self.plan.clone()),
-group_expr: group_expr,
-aggr_expr: aggr_expr,
+group_expr,
+aggr_expr,
 schema: DFSchemaRef::new(aggr_schema),
 }))
 }
diff --git a/rust/parquet/src/lib.rs b/rust/parquet/src/lib.rs
index 19e1a0f..a931b95 100644
--- a/rust/parquet/src/lib.rs
+++ b/rust/parquet/src/lib.rs
@@ -23,13 +23,16 @@
 clippy::cast_ptr_alignment,
 clippy::float_cmp,
 clippy::float_equality_without_abs,
+clippy::from_over_into,
 clippy::many_single_char_names,
 clippy::needless_range_loop,
 clippy::new_without_default,
 clippy::or_fun_call,
 clippy::same_item_push,
 clippy::too_many_arguments,
-clippy::transmute_ptr_to_ptr
+clippy::transmute_ptr_to_ptr,
+clippy::upper_case_acronyms,
+clippy::vec_init_then_push
 )]
 
 #[macro_use]


[arrow] branch master updated (143c2be -> 2c5e264)

2021-03-26 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 143c2be  ARROW-11736: [R] Allow string compute functions to be optional
 add 2c5e264  ARROW-11365: [Rust] [Parquet] Logical type printer and parser

No new revisions were added by this update.

Summary of changes:
 rust/parquet/src/arrow/schema.rs   |  51 +++-
 rust/parquet/src/basic.rs  |  73 +-
 rust/parquet/src/schema/parser.rs  | 484 -
 rust/parquet/src/schema/printer.rs | 423 +---
 4 files changed, 903 insertions(+), 128 deletions(-)


[arrow] branch master updated (0bea590 -> 4eefa35)

2021-03-25 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 0bea590  ARROW-11422: [C#] add decimal support
 add 4eefa35  ARROW-12019: [Rust] [Parquet] Update README for 2.6.0 support

No new revisions were added by this update.

Summary of changes:
 rust/parquet/README.md| 18 +++---
 rust/parquet/src/basic.rs |  7 ---
 2 files changed, 15 insertions(+), 10 deletions(-)


[arrow] branch master updated (41833d3 -> 21483ad)

2021-03-25 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 41833d3  ARROW-12071: [GLib] Keep input stream reference of 
GArrowJSONReader
 add 21483ad  ARROW-12076: [Rust] Fix build

No new revisions were added by this update.

Summary of changes:
 rust/arrow/src/compute/kernels/comparison.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


[arrow] branch master updated (ae87509 -> eebf64b)

2021-03-22 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from ae87509  ARROW-12038: [Rust][DataFusion] Upgrade hashbrown to 0.11
 add eebf64b  ARROW-11511: [Rust] Replace `Arc` by `ArrayData` 
in all arrays

No new revisions were added by this update.

Summary of changes:
 rust/arrow/examples/dynamic_types.rs   |   2 +-
 rust/arrow/src/array/array.rs  |  82 +++
 rust/arrow/src/array/array_binary.rs   |  45 ++--
 rust/arrow/src/array/array_boolean.rs  |  16 +-
 rust/arrow/src/array/array_dictionary.rs   |  28 +--
 rust/arrow/src/array/array_list.rs |  48 ++--
 rust/arrow/src/array/array_primitive.rs|  34 +--
 rust/arrow/src/array/array_string.rs   |  24 +-
 rust/arrow/src/array/array_struct.rs   |  34 ++-
 rust/arrow/src/array/array_union.rs|  28 +--
 rust/arrow/src/array/builder.rs|  10 +-
 rust/arrow/src/array/data.rs   |  36 ++-
 rust/arrow/src/array/equal/dictionary.rs   |   4 +-
 rust/arrow/src/array/equal/fixed_list.rs   |   4 +-
 rust/arrow/src/array/equal/list.rs |   4 +-
 rust/arrow/src/array/equal/mod.rs  | 255 +++--
 rust/arrow/src/array/ffi.rs|  15 +-
 rust/arrow/src/array/null.rs   |  19 +-
 rust/arrow/src/array/ord.rs|   4 +-
 rust/arrow/src/array/transform/mod.rs  | 116 +-
 rust/arrow/src/compute/kernels/arithmetic.rs   |  21 +-
 rust/arrow/src/compute/kernels/arity.rs|   2 +-
 rust/arrow/src/compute/kernels/boolean.rs  |  13 +-
 rust/arrow/src/compute/kernels/cast.rs |  47 ++--
 rust/arrow/src/compute/kernels/comparison.rs   |  24 +-
 rust/arrow/src/compute/kernels/concat.rs   |   9 +-
 rust/arrow/src/compute/kernels/filter.rs   |   6 +-
 rust/arrow/src/compute/kernels/length.rs   |   3 +-
 rust/arrow/src/compute/kernels/limit.rs|   4 +-
 rust/arrow/src/compute/kernels/sort.rs |  13 +-
 rust/arrow/src/compute/kernels/substring.rs|   3 +-
 rust/arrow/src/compute/kernels/take.rs |  57 +++--
 rust/arrow/src/compute/kernels/window.rs   |   3 +-
 rust/arrow/src/compute/kernels/zip.rs  |   3 +-
 rust/arrow/src/compute/util.rs |  12 +-
 rust/arrow/src/ffi.rs  |  21 +-
 rust/arrow/src/ipc/reader.rs   |  12 +-
 rust/arrow/src/ipc/writer.rs   |   6 +-
 rust/arrow/src/json/reader.rs  |  26 ++-
 rust/arrow/src/json/writer.rs  |   6 +-
 rust/arrow/src/record_batch.rs |   6 +-
 rust/arrow/src/util/integration_util.rs|  10 +-
 .../src/physical_plan/math_expressions.rs  |   4 +-
 rust/integration-testing/src/lib.rs|  10 +-
 rust/parquet/src/arrow/array_reader.rs |  16 +-
 rust/parquet/src/arrow/arrow_writer.rs |  81 +--
 rust/parquet/src/arrow/levels.rs   |   4 +-
 47 files changed, 614 insertions(+), 616 deletions(-)


[arrow] branch master updated (6112255 -> ae87509)

2021-03-22 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6112255  ARROW-10250: [C++][FlightRPC] Consistently use 
FlightClientOptions::Defaults
 add ae87509  ARROW-12038: [Rust][DataFusion] Upgrade hashbrown to 0.11

No new revisions were added by this update.

Summary of changes:
 rust/datafusion/Cargo.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[arrow] branch master updated (775a714 -> ef64d00)

2021-03-19 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 775a714  ARROW-10903 [Rust] Implement FromIter>> 
constructor for FixedSizeBinaryArray
 add ef64d00  ARROW-11824: [Rust] [Parquet] Use logical types in Arrow 
schema conversion

No new revisions were added by this update.

Summary of changes:
 rust/arrow/src/array/array_binary.rs |   8 +-
 rust/parquet/src/arrow/schema.rs | 254 ++
 rust/parquet/src/schema/parser.rs|  10 +-
 rust/parquet/src/schema/types.rs | 293 ++-
 4 files changed, 419 insertions(+), 146 deletions(-)


[arrow] branch master updated (976ddbf -> 69d436d)

2021-03-07 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 976ddbf  ARROW-11896: [Rust] Disable Debug symbols on CI test builds
 add 69d436d  ARROW-11803: [Rust] [Parquet] Support v2 LogicalType

No new revisions were added by this update.

Summary of changes:
 rust/parquet/src/arrow/array_reader.rs |   13 +-
 rust/parquet/src/arrow/schema.rs   |   96 +--
 rust/parquet/src/basic.rs  | 1098 +++-
 rust/parquet/src/column/reader.rs  |4 +-
 rust/parquet/src/file/footer.rs|1 +
 rust/parquet/src/file/writer.rs|   57 +-
 rust/parquet/src/record/api.rs |  116 ++--
 rust/parquet/src/record/reader.rs  |   10 +-
 rust/parquet/src/schema/mod.rs |4 +-
 rust/parquet/src/schema/parser.rs  |   39 +-
 rust/parquet/src/schema/printer.rs |   42 +-
 rust/parquet/src/schema/types.rs   |  181 --
 rust/parquet/src/schema/visitor.rs |8 +-
 13 files changed, 1143 insertions(+), 526 deletions(-)



[arrow] branch master updated (b07027e -> bfa99d9)

2021-03-05 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from b07027e  ARROW-11735: [R] Allow Parquet and Arrow Dataset to be 
optional components
 add bfa99d9  ARROW-11881: [Rust][DataFusion] Fix clippy lint

No new revisions were added by this update.

Summary of changes:
 rust/datafusion/src/physical_plan/merge.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



  1   2   3   4   >