[arrow] branch decimal256 updated (4e06c1e -> d201b13)
This is an automated email from the ASF dual-hosted git repository. emkornfield pushed a change to branch decimal256 in repository https://gitbox.apache.org/repos/asf/arrow.git. from 4e06c1e ARROW-9711: [Rust] Add new benchmark derived from TPC-H add e553b73 ARROW-9743: [R] Sanitize paths in open_dataset add 2dcc9a1 ARROW-9654: [Rust][DataFusion] Add `EXPLAIN ` statement add 5677f9e ARROW-8581: [C#] Accept and return DateTime from DateXXArray add 3941b66 ARROW-9739: [CI][Ruby] Don't install gem documents add 222859d ARROW-9358: [Integration] remove generated_large_batch.json add 0d0a0cf ARROW-9377: [Java] Support unsigned dictionary indices add 5d88f10 ARROW-8402: [Java] Support ValidateFull methods in Java add afa3eed ARROW-9729: [Java] Disable Error Prone when project is imported into … add 597ad62 ARROW-9617: [Rust] [DataFusion] Add length of string array add 613ab4a ARROW-9742: [Rust] [DataFusion] Improved DataFrame trait (formerly known as the Table trait) add 2c58141 ARROW-9758: [Rust] [DataFusion] Allow physical planner to be replaced add a94f2b3 ARROW-9673: [Rust] [DataFusion] Add a param "dialect" for DFParser::parse_sql add 58b38a6 ARROW-9618: [Rust] [DataFusion] Made it easier to write optimizers add 2e3d7ec ARROW-9528: [Python] Honor tzinfo when converting from datetime add 9bd3d50 ARROW-9759: [Rust] [DataFusion] Implement DataFrame.sort() add 51e574f ARROW-9764: [CI][Java] Fix wrong image name for push add 4d836ef ARROW-9757: [Rust] [DataFusion] Add prelude.rs add 7593c9a ARROW-9556: [Python][C++] Segfaults in UnionArray with null values add 1018a4f ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options add 18181fe ARROW-9768 [Rust] [DataFusion] Rename PhysicalPlannerImpl to DefaultPhysicalPlanner add c4f8436 ARROW-9495: [C++] Equality assertions don't handle Inf / -Inf properly add 2f98d1e ARROW-9710: [C++] Improve performance of Decimal128::ToString by 10x, and make the implementation reusable for Decimal256. add 8a0db9e ARROW-9783: [Rust] [DataFusion] Remove aggregate expression data type add 59dbe54 ARROW-9785: [Python] Fix excessively slow S3 options test add d61c8a6 ARROW-9744: [Python] Fix build failure on aarch64 add ae60bad ARROW-9789: [C++] Don't install jemalloc in parallel add 197f903 ARROW-9619: [Rust] [DataFusion] Add predicate push-down add fa4b8d4 ARROW-9781: [C++] Fix valgrind uninitialized value warnings add 4db4859 ARROW-9670: [C++][FlightRPC] don't hang if Close and Read called simultaneously add 0cced8f ARROW-9793: [Rust] [DataFusion] Fixed unit tests add 41fa221 ARROW-9792: [Rust] [DataFusion] Aggregate expression functions should not return result add 5abe72f ARROW-9788: [Rust] [DataFusion] Rename SelectionExec to FilterExec add 2ebde1c ARROW-9800: [Rust][Parquet] Remove println! when writing column statistics add 01f06cf ARROW-9778: [Rust] [DataFusion] Implement Expr.nullable() and make consistent between logical and physical plans add 3cb0bd8 ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain add f0f02c6 ARROW-9784: [Rust][DataFusion] Make running TPCH benchmark repeatable add 9e73081 ARROW-9733: [Rust] [DataFusion] Added support for COUNT/MIN/MAX on string columns add 25b0b1b ARROW-9790: [Rust][Parquet] Fix PrimitiveArrayReader boundary conditions add c90ad63 ARROW-9532: [Python][Doc] Use Python3_EXECUTABLE instead of PYTHON_EXECUTABLE for finding Python executable add de8bfdd ARROW-9808: [Python] Update read_table doc string add 60987f5 ARROW-8773: [Python] Preserve nullability of fields in schema.empty_table() add cb7d1c1 ARROW-9388: [C++] Division kernels add 0576da6 ARROW-9768: [Python] Check overflow in conversion of datetime objects to nanosecond timestamps add 5d9ccb7 ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows add 36d267b [MINOR] Fix typo and use more concise word in README.md add 597a26e ARROW-9807: [R] News update/version bump post-1.0.1 add 5e7be07 ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns add f98de24 ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by accessing the scalar functions' registry. add 085b44d ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float add 0a698c0 ARROW-9831: [Rust][DataFusion] Fixed compilation error add 2e8fcd4 ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame add 85f4324 ARROW-9819: [C++] Bump mimalloc to 1.6.4 add 735c870 ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking. add 657b3d3 ARROW-9833: [Rust] [DataFusion] TableProvider.scan now returns ExecutionPlan add d1d85db ARROW-9464: [Rust]
[arrow] branch decimal256 updated (4e06c1e -> d201b13)
This is an automated email from the ASF dual-hosted git repository. emkornfield pushed a change to branch decimal256 in repository https://gitbox.apache.org/repos/asf/arrow.git. from 4e06c1e ARROW-9711: [Rust] Add new benchmark derived from TPC-H add e553b73 ARROW-9743: [R] Sanitize paths in open_dataset add 2dcc9a1 ARROW-9654: [Rust][DataFusion] Add `EXPLAIN ` statement add 5677f9e ARROW-8581: [C#] Accept and return DateTime from DateXXArray add 3941b66 ARROW-9739: [CI][Ruby] Don't install gem documents add 222859d ARROW-9358: [Integration] remove generated_large_batch.json add 0d0a0cf ARROW-9377: [Java] Support unsigned dictionary indices add 5d88f10 ARROW-8402: [Java] Support ValidateFull methods in Java add afa3eed ARROW-9729: [Java] Disable Error Prone when project is imported into … add 597ad62 ARROW-9617: [Rust] [DataFusion] Add length of string array add 613ab4a ARROW-9742: [Rust] [DataFusion] Improved DataFrame trait (formerly known as the Table trait) add 2c58141 ARROW-9758: [Rust] [DataFusion] Allow physical planner to be replaced add a94f2b3 ARROW-9673: [Rust] [DataFusion] Add a param "dialect" for DFParser::parse_sql add 58b38a6 ARROW-9618: [Rust] [DataFusion] Made it easier to write optimizers add 2e3d7ec ARROW-9528: [Python] Honor tzinfo when converting from datetime add 9bd3d50 ARROW-9759: [Rust] [DataFusion] Implement DataFrame.sort() add 51e574f ARROW-9764: [CI][Java] Fix wrong image name for push add 4d836ef ARROW-9757: [Rust] [DataFusion] Add prelude.rs add 7593c9a ARROW-9556: [Python][C++] Segfaults in UnionArray with null values add 1018a4f ARROW-9517: [C++/Python] Add support for temporary credentials to S3Options add 18181fe ARROW-9768 [Rust] [DataFusion] Rename PhysicalPlannerImpl to DefaultPhysicalPlanner add c4f8436 ARROW-9495: [C++] Equality assertions don't handle Inf / -Inf properly add 2f98d1e ARROW-9710: [C++] Improve performance of Decimal128::ToString by 10x, and make the implementation reusable for Decimal256. add 8a0db9e ARROW-9783: [Rust] [DataFusion] Remove aggregate expression data type add 59dbe54 ARROW-9785: [Python] Fix excessively slow S3 options test add d61c8a6 ARROW-9744: [Python] Fix build failure on aarch64 add ae60bad ARROW-9789: [C++] Don't install jemalloc in parallel add 197f903 ARROW-9619: [Rust] [DataFusion] Add predicate push-down add fa4b8d4 ARROW-9781: [C++] Fix valgrind uninitialized value warnings add 4db4859 ARROW-9670: [C++][FlightRPC] don't hang if Close and Read called simultaneously add 0cced8f ARROW-9793: [Rust] [DataFusion] Fixed unit tests add 41fa221 ARROW-9792: [Rust] [DataFusion] Aggregate expression functions should not return result add 5abe72f ARROW-9788: [Rust] [DataFusion] Rename SelectionExec to FilterExec add 2ebde1c ARROW-9800: [Rust][Parquet] Remove println! when writing column statistics add 01f06cf ARROW-9778: [Rust] [DataFusion] Implement Expr.nullable() and make consistent between logical and physical plans add 3cb0bd8 ARROW-9760: [Rust] [DataFusion] Added DataFrame::explain add f0f02c6 ARROW-9784: [Rust][DataFusion] Make running TPCH benchmark repeatable add 9e73081 ARROW-9733: [Rust] [DataFusion] Added support for COUNT/MIN/MAX on string columns add 25b0b1b ARROW-9790: [Rust][Parquet] Fix PrimitiveArrayReader boundary conditions add c90ad63 ARROW-9532: [Python][Doc] Use Python3_EXECUTABLE instead of PYTHON_EXECUTABLE for finding Python executable add de8bfdd ARROW-9808: [Python] Update read_table doc string add 60987f5 ARROW-8773: [Python] Preserve nullability of fields in schema.empty_table() add cb7d1c1 ARROW-9388: [C++] Division kernels add 0576da6 ARROW-9768: [Python] Check overflow in conversion of datetime objects to nanosecond timestamps add 5d9ccb7 ARROW-6437: [R] Add AWS SDK to system dependencies for macOS and Windows add 36d267b [MINOR] Fix typo and use more concise word in README.md add 597a26e ARROW-9807: [R] News update/version bump post-1.0.1 add 5e7be07 ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns add f98de24 ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by accessing the scalar functions' registry. add 085b44d ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float add 0a698c0 ARROW-9831: [Rust][DataFusion] Fixed compilation error add 2e8fcd4 ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame add 85f4324 ARROW-9819: [C++] Bump mimalloc to 1.6.4 add 735c870 ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking. add 657b3d3 ARROW-9833: [Rust] [DataFusion] TableProvider.scan now returns ExecutionPlan add d1d85db ARROW-9464: [Rust]
[arrow] branch master updated (77a9933 -> d201b13)
This is an automated email from the ASF dual-hosted git repository. apitrou pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 77a9933 ARROW-9465: [Python] Improve ergonomics of compute module add d201b13 ARROW-9859: [C++] Decode username and password in URIs No new revisions were added by this update. Summary of changes: cpp/src/arrow/filesystem/s3fs.cc | 21 ++--- cpp/src/arrow/filesystem/s3fs_test.cc | 3 +++ cpp/src/arrow/util/uri.cc | 17 + cpp/src/arrow/util/uri_test.cc| 13 + 4 files changed, 39 insertions(+), 15 deletions(-)
[arrow] branch master updated (90e474d -> 77a9933)
This is an automated email from the ASF dual-hosted git repository. apitrou pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 90e474d ARROW-5123: [Rust] Parquet derive for simple structs add 77a9933 ARROW-9465: [Python] Improve ergonomics of compute module No new revisions were added by this update. Summary of changes: cpp/src/arrow/compute/api_aggregate.h | 2 +- cpp/src/arrow/compute/exec.cc | 3 - cpp/src/arrow/compute/function.cc | 6 + cpp/src/arrow/compute/function.h | 6 +- cpp/src/arrow/compute/kernel.cc| 5 + cpp/src/arrow/compute/kernel_test.cc | 9 + .../compute/kernels/aggregate_basic_internal.h | 6 +- cpp/src/arrow/compute/kernels/aggregate_test.cc| 6 +- cpp/src/arrow/compute/kernels/scalar_set_lookup.cc | 31 +-- docs/source/cpp/compute.rst| 2 +- python/pyarrow/_compute.pyx| 191 +++-- python/pyarrow/compute.py | 229 ++--- python/pyarrow/includes/libarrow.pxd | 11 + python/pyarrow/tests/test_compute.py | 187 +++-- r/src/compute.cpp | 2 +- 15 files changed, 563 insertions(+), 133 deletions(-)
[arrow] branch master updated: ARROW-5123: [Rust] Parquet derive for simple structs
This is an automated email from the ASF dual-hosted git repository. nevime pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 90e474d ARROW-5123: [Rust] Parquet derive for simple structs 90e474d is described below commit 90e474d8ab845115f23675e6b6f6aec73a429af4 Author: Xavier Lange AuthorDate: Mon Sep 14 13:49:33 2020 +0200 ARROW-5123: [Rust] Parquet derive for simple structs A rebase and significant rewrite of https://github.com/sunchao/parquet-rs/pull/197 Big improvement: I now use a more natural nested enum style, it helps break out what patterns of data types are . The rest of the broad strokes still apply. Goal === Writing many columns to a file is a chore. If you can put your values in to a struct which mirrors the schema of your file, this `derive(ParquetRecordWriter)` will write out all the fields, in the order in which they are defined, to a row_group. How to Use === ``` extern crate parquet; #[macro_use] extern crate parquet_derive; #[derive(ParquetRecordWriter)] struct ACompleteRecord<'a> { pub a_bool: bool, pub a_str: &'a str, } ``` RecordWriter trait === This is the new trait which `parquet_derive` will implement for your structs. ``` use super::RowGroupWriter; pub trait RecordWriter { fn write_to_row_group(, row_group_writer: Box); } ``` How does it work? === The `parquet_derive` crate adds code generating functionality to the rust compiler. The code generation takes rust syntax and emits additional syntax. This macro expansion works on rust 1.15+ stable. This is a dynamic plugin, loaded by the machinery in cargo. Users don't have to do any special `build.rs` steps or anything like that, it's automatic by including `parquet_derive` in their project. The `parquet_derive/src/Cargo.toml` has a section saying as much: ``` [lib] proc-macro = true ``` The rust struct tagged with `#[derive(ParquetRecordWriter)]` is provided to the `parquet_record_writer` function in `parquet_derive/src/lib.rs`. The `syn` crate parses the struct from a string-representation to a AST (a recursive enum value). The AST contains all the values I care about when generating a `RecordWriter` impl: - the name of the struct - the lifetime variables of the struct - the fields of the struct The fields of the struct are translated from AST to a flat `FieldInfo` struct. It has the bits I care about for writing a column: `field_name`, `field_lifetime`, `field_type`, `is_option`, `column_writer_variant`. The code then does the equivalent of templating to build the `RecordWriter` implementation. The templating functionality is provided by the `quote` crate. At a high-level the template for `RecordWriter` looks like: ``` impl RecordWriter for $struct_name { fn write_row_group(..) { $({ $column_writer_snippet }) } } ``` this template is then added under the struct definition, ending up something like: ``` struct MyStruct { } impl RecordWriter for MyStruct { fn write_row_group(..) { { write_col_1(); }; { write_col_2(); } } } ``` and finally _THIS_ is the code passed to rustc. It's just code now, fully expanded and standalone. If a user ever changes their `struct MyValue` definition the `ParquetRecordWriter` will be regenerated. There's no intermediate values to version control or worry about. Viewing the Derived Code === To see the generated code before it's compiled, one very useful bit is to install `cargo expand` [more info on gh](https://github.com/dtolnay/cargo-expand), then you can do: ``` $WORK_DIR/parquet-rs/parquet_derive_test cargo expand --lib > ../temp.rs ``` then you can dump the contents: ``` struct DumbRecord { pub a_bool: bool, pub a2_bool: bool, } impl RecordWriter for &[DumbRecord] { fn write_to_row_group( , row_group_writer: Box, ) { let mut row_group_writer = row_group_writer; { let vals: Vec = self.iter().map(|x| x.a_bool).collect(); let mut column_writer = row_group_writer.next_column().unwrap().unwrap(); if let parquet::column::writer::ColumnWriter::BoolColumnWriter(ref mut typed) = column_writer { typed.write_batch([..], None, None).unwrap(); } row_group_writer.close_column(column_writer).unwrap(); }; { let vals: Vec
[arrow] branch master updated (cfa2363 -> 68921d1)
This is an automated email from the ASF dual-hosted git repository. nevime pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from cfa2363 ARROW-9737: [C++][Gandiva] Add bitwise_xor() for integers add 68921d1 ARROW-9984: [Rust] [DataFusion] Minor cleanup DRY No new revisions were added by this update. Summary of changes: rust/datafusion/src/logical_plan/mod.rs | 69 - 1 file changed, 24 insertions(+), 45 deletions(-)