[arrow] branch decimal256 updated (4e06c1e -> d201b13)

2020-09-14 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a change to branch decimal256
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 4e06c1e  ARROW-9711: [Rust] Add new benchmark derived from TPC-H
 add e553b73  ARROW-9743: [R] Sanitize paths in open_dataset
 add 2dcc9a1  ARROW-9654: [Rust][DataFusion] Add `EXPLAIN ` statement
 add 5677f9e  ARROW-8581: [C#] Accept and return DateTime from DateXXArray
 add 3941b66  ARROW-9739: [CI][Ruby] Don't install gem documents
 add 222859d  ARROW-9358: [Integration] remove generated_large_batch.json
 add 0d0a0cf  ARROW-9377: [Java] Support unsigned dictionary indices
 add 5d88f10  ARROW-8402: [Java] Support ValidateFull methods in Java
 add afa3eed  ARROW-9729: [Java] Disable Error Prone when project is 
imported into …
 add 597ad62  ARROW-9617: [Rust] [DataFusion] Add length of string array
 add 613ab4a  ARROW-9742: [Rust] [DataFusion] Improved DataFrame trait 
(formerly known as the Table trait)
 add 2c58141  ARROW-9758: [Rust] [DataFusion] Allow physical planner to be 
replaced
 add a94f2b3  ARROW-9673: [Rust] [DataFusion] Add a param "dialect" for 
DFParser::parse_sql
 add 58b38a6  ARROW-9618: [Rust] [DataFusion] Made it easier to write 
optimizers
 add 2e3d7ec  ARROW-9528: [Python] Honor tzinfo when converting from 
datetime
 add 9bd3d50  ARROW-9759: [Rust] [DataFusion] Implement DataFrame.sort()
 add 51e574f  ARROW-9764: [CI][Java] Fix wrong image name for push
 add 4d836ef  ARROW-9757: [Rust] [DataFusion] Add prelude.rs
 add 7593c9a  ARROW-9556: [Python][C++] Segfaults in UnionArray with null 
values
 add 1018a4f  ARROW-9517: [C++/Python] Add support for temporary 
credentials to S3Options
 add 18181fe  ARROW-9768 [Rust] [DataFusion] Rename PhysicalPlannerImpl to 
DefaultPhysicalPlanner
 add c4f8436  ARROW-9495: [C++] Equality assertions don't handle Inf / -Inf 
properly
 add 2f98d1e  ARROW-9710: [C++] Improve performance of Decimal128::ToString 
by 10x, and make the implementation reusable for Decimal256.
 add 8a0db9e  ARROW-9783: [Rust] [DataFusion] Remove aggregate expression 
data type
 add 59dbe54  ARROW-9785: [Python] Fix excessively slow S3 options test
 add d61c8a6  ARROW-9744: [Python] Fix build failure on aarch64
 add ae60bad  ARROW-9789: [C++] Don't install jemalloc in parallel
 add 197f903  ARROW-9619: [Rust] [DataFusion] Add predicate push-down
 add fa4b8d4  ARROW-9781: [C++] Fix valgrind uninitialized value warnings
 add 4db4859  ARROW-9670: [C++][FlightRPC] don't hang if Close and Read 
called simultaneously
 add 0cced8f  ARROW-9793: [Rust] [DataFusion] Fixed unit tests
 add 41fa221  ARROW-9792: [Rust] [DataFusion] Aggregate expression 
functions should not return result
 add 5abe72f  ARROW-9788: [Rust] [DataFusion] Rename SelectionExec to 
FilterExec
 add 2ebde1c  ARROW-9800: [Rust][Parquet] Remove println! when writing 
column statistics
 add 01f06cf  ARROW-9778: [Rust] [DataFusion] Implement Expr.nullable() and 
make consistent between logical and physical plans
 add 3cb0bd8  ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain
 add f0f02c6  ARROW-9784: [Rust][DataFusion] Make running TPCH benchmark 
repeatable
 add 9e73081  ARROW-9733: [Rust] [DataFusion] Added support for 
COUNT/MIN/MAX on string columns
 add 25b0b1b  ARROW-9790: [Rust][Parquet] Fix PrimitiveArrayReader boundary 
conditions
 add c90ad63  ARROW-9532: [Python][Doc] Use Python3_EXECUTABLE instead of 
PYTHON_EXECUTABLE for finding Python executable
 add de8bfdd  ARROW-9808: [Python] Update read_table doc string
 add 60987f5  ARROW-8773: [Python] Preserve nullability of fields in 
schema.empty_table()
 add cb7d1c1  ARROW-9388: [C++] Division kernels
 add 0576da6  ARROW-9768: [Python] Check overflow in conversion of datetime 
objects to nanosecond timestamps
 add 5d9ccb7  ARROW-6437: [R] Add AWS SDK to system dependencies for macOS 
and Windows
 add 36d267b  [MINOR] Fix typo and use more concise word in README.md
 add 597a26e  ARROW-9807: [R] News update/version bump post-1.0.1
 add 5e7be07  ARROW-9678: [Rust] [DataFusion] Improve projection push down 
to remove unused columns
 add f98de24  ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by 
accessing the scalar functions' registry.
 add 085b44d  ARROW-9490: [Python][C++] Bug in pa.array when input mixes 
int8 with float
 add 0a698c0  ARROW-9831: [Rust][DataFusion] Fixed compilation error
 add 2e8fcd4  ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now 
returns DataFrame
 add 85f4324  ARROW-9819: [C++] Bump mimalloc to 1.6.4
 add 735c870  ARROW-9809: [Rust][DataFusion] Fixed type coercion, 
supertypes and type checking.
 add 657b3d3  ARROW-9833: [Rust] [DataFusion] TableProvider.scan now 
returns ExecutionPlan
 add d1d85db  ARROW-9464: [Rust] 

[arrow] branch decimal256 updated (4e06c1e -> d201b13)

2020-09-14 Thread emkornfield
This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a change to branch decimal256
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 4e06c1e  ARROW-9711: [Rust] Add new benchmark derived from TPC-H
 add e553b73  ARROW-9743: [R] Sanitize paths in open_dataset
 add 2dcc9a1  ARROW-9654: [Rust][DataFusion] Add `EXPLAIN ` statement
 add 5677f9e  ARROW-8581: [C#] Accept and return DateTime from DateXXArray
 add 3941b66  ARROW-9739: [CI][Ruby] Don't install gem documents
 add 222859d  ARROW-9358: [Integration] remove generated_large_batch.json
 add 0d0a0cf  ARROW-9377: [Java] Support unsigned dictionary indices
 add 5d88f10  ARROW-8402: [Java] Support ValidateFull methods in Java
 add afa3eed  ARROW-9729: [Java] Disable Error Prone when project is 
imported into …
 add 597ad62  ARROW-9617: [Rust] [DataFusion] Add length of string array
 add 613ab4a  ARROW-9742: [Rust] [DataFusion] Improved DataFrame trait 
(formerly known as the Table trait)
 add 2c58141  ARROW-9758: [Rust] [DataFusion] Allow physical planner to be 
replaced
 add a94f2b3  ARROW-9673: [Rust] [DataFusion] Add a param "dialect" for 
DFParser::parse_sql
 add 58b38a6  ARROW-9618: [Rust] [DataFusion] Made it easier to write 
optimizers
 add 2e3d7ec  ARROW-9528: [Python] Honor tzinfo when converting from 
datetime
 add 9bd3d50  ARROW-9759: [Rust] [DataFusion] Implement DataFrame.sort()
 add 51e574f  ARROW-9764: [CI][Java] Fix wrong image name for push
 add 4d836ef  ARROW-9757: [Rust] [DataFusion] Add prelude.rs
 add 7593c9a  ARROW-9556: [Python][C++] Segfaults in UnionArray with null 
values
 add 1018a4f  ARROW-9517: [C++/Python] Add support for temporary 
credentials to S3Options
 add 18181fe  ARROW-9768 [Rust] [DataFusion] Rename PhysicalPlannerImpl to 
DefaultPhysicalPlanner
 add c4f8436  ARROW-9495: [C++] Equality assertions don't handle Inf / -Inf 
properly
 add 2f98d1e  ARROW-9710: [C++] Improve performance of Decimal128::ToString 
by 10x, and make the implementation reusable for Decimal256.
 add 8a0db9e  ARROW-9783: [Rust] [DataFusion] Remove aggregate expression 
data type
 add 59dbe54  ARROW-9785: [Python] Fix excessively slow S3 options test
 add d61c8a6  ARROW-9744: [Python] Fix build failure on aarch64
 add ae60bad  ARROW-9789: [C++] Don't install jemalloc in parallel
 add 197f903  ARROW-9619: [Rust] [DataFusion] Add predicate push-down
 add fa4b8d4  ARROW-9781: [C++] Fix valgrind uninitialized value warnings
 add 4db4859  ARROW-9670: [C++][FlightRPC] don't hang if Close and Read 
called simultaneously
 add 0cced8f  ARROW-9793: [Rust] [DataFusion] Fixed unit tests
 add 41fa221  ARROW-9792: [Rust] [DataFusion] Aggregate expression 
functions should not return result
 add 5abe72f  ARROW-9788: [Rust] [DataFusion] Rename SelectionExec to 
FilterExec
 add 2ebde1c  ARROW-9800: [Rust][Parquet] Remove println! when writing 
column statistics
 add 01f06cf  ARROW-9778: [Rust] [DataFusion] Implement Expr.nullable() and 
make consistent between logical and physical plans
 add 3cb0bd8  ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain
 add f0f02c6  ARROW-9784: [Rust][DataFusion] Make running TPCH benchmark 
repeatable
 add 9e73081  ARROW-9733: [Rust] [DataFusion] Added support for 
COUNT/MIN/MAX on string columns
 add 25b0b1b  ARROW-9790: [Rust][Parquet] Fix PrimitiveArrayReader boundary 
conditions
 add c90ad63  ARROW-9532: [Python][Doc] Use Python3_EXECUTABLE instead of 
PYTHON_EXECUTABLE for finding Python executable
 add de8bfdd  ARROW-9808: [Python] Update read_table doc string
 add 60987f5  ARROW-8773: [Python] Preserve nullability of fields in 
schema.empty_table()
 add cb7d1c1  ARROW-9388: [C++] Division kernels
 add 0576da6  ARROW-9768: [Python] Check overflow in conversion of datetime 
objects to nanosecond timestamps
 add 5d9ccb7  ARROW-6437: [R] Add AWS SDK to system dependencies for macOS 
and Windows
 add 36d267b  [MINOR] Fix typo and use more concise word in README.md
 add 597a26e  ARROW-9807: [R] News update/version bump post-1.0.1
 add 5e7be07  ARROW-9678: [Rust] [DataFusion] Improve projection push down 
to remove unused columns
 add f98de24  ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by 
accessing the scalar functions' registry.
 add 085b44d  ARROW-9490: [Python][C++] Bug in pa.array when input mixes 
int8 with float
 add 0a698c0  ARROW-9831: [Rust][DataFusion] Fixed compilation error
 add 2e8fcd4  ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now 
returns DataFrame
 add 85f4324  ARROW-9819: [C++] Bump mimalloc to 1.6.4
 add 735c870  ARROW-9809: [Rust][DataFusion] Fixed type coercion, 
supertypes and type checking.
 add 657b3d3  ARROW-9833: [Rust] [DataFusion] TableProvider.scan now 
returns ExecutionPlan
 add d1d85db  ARROW-9464: [Rust] 

[arrow] branch master updated (77a9933 -> d201b13)

2020-09-14 Thread apitrou
This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 77a9933  ARROW-9465: [Python] Improve ergonomics of compute module
 add d201b13  ARROW-9859: [C++] Decode username and password in URIs

No new revisions were added by this update.

Summary of changes:
 cpp/src/arrow/filesystem/s3fs.cc  | 21 ++---
 cpp/src/arrow/filesystem/s3fs_test.cc |  3 +++
 cpp/src/arrow/util/uri.cc | 17 +
 cpp/src/arrow/util/uri_test.cc| 13 +
 4 files changed, 39 insertions(+), 15 deletions(-)



[arrow] branch master updated (90e474d -> 77a9933)

2020-09-14 Thread apitrou
This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 90e474d  ARROW-5123: [Rust] Parquet derive for simple structs
 add 77a9933  ARROW-9465: [Python] Improve ergonomics of compute module

No new revisions were added by this update.

Summary of changes:
 cpp/src/arrow/compute/api_aggregate.h  |   2 +-
 cpp/src/arrow/compute/exec.cc  |   3 -
 cpp/src/arrow/compute/function.cc  |   6 +
 cpp/src/arrow/compute/function.h   |   6 +-
 cpp/src/arrow/compute/kernel.cc|   5 +
 cpp/src/arrow/compute/kernel_test.cc   |   9 +
 .../compute/kernels/aggregate_basic_internal.h |   6 +-
 cpp/src/arrow/compute/kernels/aggregate_test.cc|   6 +-
 cpp/src/arrow/compute/kernels/scalar_set_lookup.cc |  31 +--
 docs/source/cpp/compute.rst|   2 +-
 python/pyarrow/_compute.pyx| 191 +++--
 python/pyarrow/compute.py  | 229 ++---
 python/pyarrow/includes/libarrow.pxd   |  11 +
 python/pyarrow/tests/test_compute.py   | 187 +++--
 r/src/compute.cpp  |   2 +-
 15 files changed, 563 insertions(+), 133 deletions(-)



[arrow] branch master updated: ARROW-5123: [Rust] Parquet derive for simple structs

2020-09-14 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 90e474d  ARROW-5123: [Rust] Parquet derive for simple structs
90e474d is described below

commit 90e474d8ab845115f23675e6b6f6aec73a429af4
Author: Xavier Lange 
AuthorDate: Mon Sep 14 13:49:33 2020 +0200

ARROW-5123: [Rust] Parquet derive for simple structs

A rebase and significant rewrite of 
https://github.com/sunchao/parquet-rs/pull/197

Big improvement: I now use a more natural nested enum style, it helps break 
out what patterns of data types are . The rest of the broad strokes still apply.

Goal
===

Writing many columns to a file is a chore. If you can put your values in to 
a struct which mirrors the schema of your file, this 
`derive(ParquetRecordWriter)` will write out all the fields, in the order in 
which they are defined, to a row_group.

How to Use
===

```
extern crate parquet;
#[macro_use] extern crate parquet_derive;

#[derive(ParquetRecordWriter)]
struct ACompleteRecord<'a> {
  pub a_bool: bool,
  pub a_str: &'a str,
}
```

RecordWriter trait
===

This is the new trait which `parquet_derive` will implement for your 
structs.

```
use super::RowGroupWriter;

pub trait RecordWriter {
  fn write_to_row_group(, row_group_writer:  Box);
}
```

How does it work?
===

The `parquet_derive` crate adds code generating functionality to the rust 
compiler. The code generation takes rust syntax and emits additional syntax. 
This macro expansion works on rust 1.15+ stable. This is a dynamic plugin, 
loaded by the machinery in cargo. Users don't have to do any special `build.rs` 
steps or anything like that, it's automatic by including `parquet_derive` in 
their project. The `parquet_derive/src/Cargo.toml` has a section saying as much:

```
[lib]
proc-macro = true
```

The rust struct tagged with `#[derive(ParquetRecordWriter)]` is provided to 
the `parquet_record_writer` function in `parquet_derive/src/lib.rs`. The `syn` 
crate parses the struct from a string-representation to a AST (a recursive enum 
value). The AST contains all the values I care about when generating a 
`RecordWriter` impl:

 - the name of the struct
 - the lifetime variables of the struct
 - the fields of the struct

The fields of the struct are translated from AST to a flat `FieldInfo` 
struct. It has the bits I care about for writing a column: `field_name`, 
`field_lifetime`, `field_type`, `is_option`, `column_writer_variant`.

The code then does the equivalent of templating to build the `RecordWriter` 
implementation. The templating functionality is provided by the `quote` crate. 
At a high-level the template for `RecordWriter` looks like:

```
impl RecordWriter for $struct_name {
  fn write_row_group(..) {
$({
  $column_writer_snippet
})
  }
}
```

this template is then added under the struct definition, ending up 
something like:

```
struct MyStruct {
}
impl RecordWriter for MyStruct {
  fn write_row_group(..) {
{
   write_col_1();
};
   {
   write_col_2();
   }
  }
}
```

and finally _THIS_ is the code passed to rustc. It's just code now, fully 
expanded and standalone. If a user ever changes their `struct MyValue` 
definition the `ParquetRecordWriter` will be regenerated. There's no 
intermediate values to version control or worry about.

Viewing the Derived Code
===

To see the generated code before it's compiled, one very useful bit is to 
install `cargo expand` [more info on 
gh](https://github.com/dtolnay/cargo-expand), then you can do:

```
$WORK_DIR/parquet-rs/parquet_derive_test
cargo expand --lib > ../temp.rs
```

then you can dump the contents:

```
struct DumbRecord {
pub a_bool: bool,
pub a2_bool: bool,
}
impl RecordWriter for &[DumbRecord] {
fn write_to_row_group(
,
row_group_writer:  Box,
) {
let mut row_group_writer = row_group_writer;
{
let vals: Vec = self.iter().map(|x| x.a_bool).collect();
let mut column_writer = 
row_group_writer.next_column().unwrap().unwrap();
if let 
parquet::column::writer::ColumnWriter::BoolColumnWriter(ref mut typed) =
column_writer
{
typed.write_batch([..], None, None).unwrap();
}
row_group_writer.close_column(column_writer).unwrap();
};
{
let vals: Vec 

[arrow] branch master updated (cfa2363 -> 68921d1)

2020-09-14 Thread nevime
This is an automated email from the ASF dual-hosted git repository.

nevime pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from cfa2363  ARROW-9737: [C++][Gandiva] Add bitwise_xor() for integers
 add 68921d1  ARROW-9984: [Rust] [DataFusion] Minor cleanup DRY

No new revisions were added by this update.

Summary of changes:
 rust/datafusion/src/logical_plan/mod.rs | 69 -
 1 file changed, 24 insertions(+), 45 deletions(-)