[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8018: ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by accessing the scalar functions' registry.

2020-08-22 Thread GitBox
jorgecarleitao edited a comment on pull request #8018: URL: https://github.com/apache/arrow/pull/8018#issuecomment-678609613 Hey both, I think I implemented what we discussed, but the devil is always in the details. The main changes are: * `ExecutionContextState::datasources`

[GitHub] [arrow] thamht4190 opened a new pull request #8023: Arrow 9318 encryption key management

2020-08-22 Thread GitBox
thamht4190 opened a new pull request #8023: URL: https://github.com/apache/arrow/pull/8023 This PR is C++ implementation for parquet key tool, based on [the Java implementation](https://github.com/apache/parquet-mr/pull/615) and [the design

[GitHub] [arrow] jorgecarleitao commented on pull request #8018: ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by accessing the scalar functions' registry.

2020-08-22 Thread GitBox
jorgecarleitao commented on pull request #8018: URL: https://github.com/apache/arrow/pull/8018#issuecomment-678609613 Hey both, I think I implemented what we discussed, but the devil is always in the details. The main changes are: * `ExecutionContextState::datasources` and

[GitHub] [arrow] github-actions[bot] commented on pull request #8023: Arrow 9318 encryption key management

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8023: URL: https://github.com/apache/arrow/pull/8023#issuecomment-678610345 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] github-actions[bot] commented on pull request #8023: ARROW-9318: [C++] Parquet encryption key management

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8023: URL: https://github.com/apache/arrow/pull/8023#issuecomment-678611290 https://issues.apache.org/jira/browse/ARROW-9318 This is an automated message from the Apache Git

[GitHub] [arrow] wesm closed issue #7968: VectorSchemaRoot reuse when using dictionary encoded field

2020-08-22 Thread GitBox
wesm closed issue #7968: URL: https://github.com/apache/arrow/issues/7968 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kszucs commented on a change in pull request #8008: ARROW-9369: [Python] Support conversion from python sequence to dictionary type

2020-08-22 Thread GitBox
kszucs commented on a change in pull request #8008: URL: https://github.com/apache/arrow/pull/8008#discussion_r475098380 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -1123,6 +1168,50 @@ class DecimalConverter : public TypedConverter decimal_type_; };

[GitHub] [arrow] jorgecarleitao opened a new pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-22 Thread GitBox
jorgecarleitao opened a new pull request #8024: URL: https://github.com/apache/arrow/pull/8024 This commit makes all type coercion happen on the physical plane instead of logical plane and fixes the supertype function. This makes field names to not change due to coercion rules, better

[GitHub] [arrow] jorgecarleitao commented on pull request #7919: ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns

2020-08-22 Thread GitBox
jorgecarleitao commented on pull request #7919: URL: https://github.com/apache/arrow/pull/7919#issuecomment-678659297 @andygrove , is there anything we need to work this further? This is an automated message from the Apache

[GitHub] [arrow] andygrove commented on a change in pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-22 Thread GitBox
andygrove commented on a change in pull request #8024: URL: https://github.com/apache/arrow/pull/8024#discussion_r475106389 ## File path: rust/datafusion/src/execution/physical_plan/expressions.rs ## @@ -991,30 +991,236 @@ impl fmt::Display for BinaryExpr { } } +//

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-22 Thread GitBox
jorgecarleitao commented on a change in pull request #8024: URL: https://github.com/apache/arrow/pull/8024#discussion_r475106741 ## File path: rust/datafusion/src/execution/physical_plan/expressions.rs ## @@ -991,30 +991,236 @@ impl fmt::Display for BinaryExpr { } }

[GitHub] [arrow] andygrove commented on pull request #7919: ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns

2020-08-22 Thread GitBox
andygrove commented on pull request #7919: URL: https://github.com/apache/arrow/pull/7919#issuecomment-678662753 I do have a nagging concern that the logic may not work if the query plan contains aliases that rename columns, but we can address that as a follow up if/when that becomes an

[GitHub] [arrow] andygrove closed pull request #7919: ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns

2020-08-22 Thread GitBox
andygrove closed pull request #7919: URL: https://github.com/apache/arrow/pull/7919 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] losze1cj commented on issue #8025: ParquetWriter creates bad files when passed pyarrow schema from pyarrow table?

2020-08-22 Thread GitBox
losze1cj commented on issue #8025: URL: https://github.com/apache/arrow/issues/8025#issuecomment-678663628 Using pyarrow==0.15.1 pandas==0.25.3 This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] jorgecarleitao opened a new pull request #8026: ARROW-9831: [Rust][DataFusion] Fixed compilation error

2020-08-22 Thread GitBox
jorgecarleitao opened a new pull request #8026: URL: https://github.com/apache/arrow/pull/8026 @andygrove , the commit that we just merged to master was not aligned with master, causing the code to fail to compile :( This fixes the compilation, but there are still failing tests. I

[GitHub] [arrow] github-actions[bot] commented on pull request #8026: ARROW-9831: [Rust][DataFusion] Fixed compilation error

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8026: URL: https://github.com/apache/arrow/pull/8026#issuecomment-67813 https://issues.apache.org/jira/browse/ARROW-9831 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove commented on pull request #8026: ARROW-9831: [Rust][DataFusion] Fixed compilation error

2020-08-22 Thread GitBox
andygrove commented on pull request #8026: URL: https://github.com/apache/arrow/pull/8026#issuecomment-678666746 maybe ignore the tests in this PR so we can merge, then follow up with fix? This is an automated message from

[GitHub] [arrow] github-actions[bot] commented on pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8024: URL: https://github.com/apache/arrow/pull/8024#issuecomment-678658740 https://issues.apache.org/jira/browse/ARROW-9809 This is an automated message from the Apache Git

[GitHub] [arrow] losze1cj opened a new issue #8025: ParquetWriter creates bad files when passed pyarrow schema from pyarrow table?

2020-08-22 Thread GitBox
losze1cj opened a new issue #8025: URL: https://github.com/apache/arrow/issues/8025 It seems that the ParquetWriter doesn't behave as expected when I am passing a pyarrow schema that comes out of a pyarrow table. Approaching a problem in two ways, I notice unexpected behavior. If

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-22 Thread GitBox
jorgecarleitao commented on a change in pull request #8024: URL: https://github.com/apache/arrow/pull/8024#discussion_r475109199 ## File path: rust/datafusion/src/execution/physical_plan/expressions.rs ## @@ -991,30 +991,236 @@ impl fmt::Display for BinaryExpr { } }

[GitHub] [arrow] andygrove closed pull request #8026: ARROW-9831: [Rust][DataFusion] Fixed compilation error

2020-08-22 Thread GitBox
andygrove closed pull request #8026: URL: https://github.com/apache/arrow/pull/8026 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorgecarleitao commented on pull request #7919: ARROW-9678: [Rust] [DataFusion] Improve projection push down to remove unused columns

2020-08-22 Thread GitBox
jorgecarleitao commented on pull request #7919: URL: https://github.com/apache/arrow/pull/7919#issuecomment-678665344 Thank you, @andygrove ! I encapsulated that thought on ARROW-9830, with issue type "Test". :) This is an

[GitHub] [arrow] jorgecarleitao commented on pull request #8026: ARROW-9831: [Rust][DataFusion] Fixed compilation error

2020-08-22 Thread GitBox
jorgecarleitao commented on pull request #8026: URL: https://github.com/apache/arrow/pull/8026#issuecomment-678667466 I disabled some tests. :et's see if this passes. I am really sorry about that. It is not very clear from github's UI that the PR is not rebased on top of master and

[GitHub] [arrow] jorgecarleitao commented on pull request #8026: ARROW-9831: [Rust][DataFusion] Fixed compilation error

2020-08-22 Thread GitBox
jorgecarleitao commented on pull request #8026: URL: https://github.com/apache/arrow/pull/8026#issuecomment-678669613 @andygrove clippy failed, but meanwhile I pushed a fix and re-enabled them. It was a bug on the projection push down that was only shown after you added that check

[GitHub] [arrow] wesm closed pull request #8017: ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float

2020-08-22 Thread GitBox
wesm closed pull request #8017: URL: https://github.com/apache/arrow/pull/8017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] emkornfield commented on issue #8025: ParquetWriter creates bad files when passed pyarrow schema from pyarrow table?

2020-08-22 Thread GitBox
emkornfield commented on issue #8025: URL: https://github.com/apache/arrow/issues/8025#issuecomment-678675831 Thank you for the report. A few questions. Does parquet tools indicate of the file is readable? Can you read it back with pyarrow? Do you still see the issue with arrow

[GitHub] [arrow] github-actions[bot] commented on pull request #8027: ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8027: URL: https://github.com/apache/arrow/pull/8027#issuecomment-678679471 https://issues.apache.org/jira/browse/ARROW-9762 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove closed pull request #8018: ARROW-9815 [Rust] [DataFusion] Fixed deadlock caused by accessing the scalar functions' registry.

2020-08-22 Thread GitBox
andygrove closed pull request #8018: URL: https://github.com/apache/arrow/pull/8018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove commented on pull request #8027: ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame

2020-08-22 Thread GitBox
andygrove commented on pull request #8027: URL: https://github.com/apache/arrow/pull/8027#issuecomment-678677523 @jorgecarleitao @alamb fyi This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] andygrove opened a new pull request #8027: ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame

2020-08-22 Thread GitBox
andygrove opened a new pull request #8027: URL: https://github.com/apache/arrow/pull/8027 I need this change so that I can have Ballista use the DataFusion DataFrame trait and start testing the extension points for the physical planner.

[GitHub] [arrow] andygrove closed pull request #8027: ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame

2020-08-22 Thread GitBox
andygrove closed pull request #8027: URL: https://github.com/apache/arrow/pull/8027 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] arw2019 commented on pull request #8017: ARROW-9490: [Python][C++] Bug in pa.array when input mixes int8 with float

2020-08-22 Thread GitBox
arw2019 commented on pull request #8017: URL: https://github.com/apache/arrow/pull/8017#issuecomment-678686874 thanks @wesm for reviewing! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] kou closed pull request #8019: ARROW-9819: [C++] Bump mimalloc to 1.6.4

2020-08-22 Thread GitBox
kou closed pull request #8019: URL: https://github.com/apache/arrow/pull/8019 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] github-actions[bot] commented on pull request #8028: ARROW-9833: [Rust] [DataFusion] TableProvider.scan now returns ExecutionPlan

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8028: URL: https://github.com/apache/arrow/pull/8028#issuecomment-678692312 https://issues.apache.org/jira/browse/ARROW-9833 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove opened a new pull request #8028: ARROW-9833: [Rust] [DataFusion] TableProvider.scan now returns ExecutionPlan

2020-08-22 Thread GitBox
andygrove opened a new pull request #8028: URL: https://github.com/apache/arrow/pull/8028 `TableProvider.scan()` now returns `ExecutionPlan` instead of `Vec`. This is a step towards removing the `Partition` trait and passing the `partition_id` to `ExecutionPlan.execute()` instead.

[GitHub] [arrow] andygrove commented on pull request #8028: ARROW-9833: [Rust] [DataFusion] TableProvider.scan now returns ExecutionPlan

2020-08-22 Thread GitBox
andygrove commented on pull request #8028: URL: https://github.com/apache/arrow/pull/8028#issuecomment-678692380 @jorgecarleitao @alamb This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] andygrove opened a new pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-22 Thread GitBox
andygrove opened a new pull request #8029: URL: https://github.com/apache/arrow/pull/8029 This follows on from https://github.com/apache/arrow/pull/8028. - Removes `Partition` trait, which was really redundant. - `ExecutonPlan.execute()` now takes a partition index. - Removed

[GitHub] [arrow] andygrove commented on pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-22 Thread GitBox
andygrove commented on pull request #8029: URL: https://github.com/apache/arrow/pull/8029#issuecomment-678704604 @jorgecarleitao @alamb Sorry, this is a bit of a larger change than usual. This is an automated message from

[GitHub] [arrow] github-actions[bot] commented on pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-22 Thread GitBox
github-actions[bot] commented on pull request #8029: URL: https://github.com/apache/arrow/pull/8029#issuecomment-678704633 https://issues.apache.org/jira/browse/ARROW-9464 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove commented on pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-22 Thread GitBox
andygrove commented on pull request #8029: URL: https://github.com/apache/arrow/pull/8029#issuecomment-678728434 Yes, exactly. We could do more efficient things in the future such as perform sorts in parallel and then do a sort-merge join on the results. On Sat, Aug 22, 2020,

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-22 Thread GitBox
jorgecarleitao commented on a change in pull request #8029: URL: https://github.com/apache/arrow/pull/8029#discussion_r475168077 ## File path: rust/datafusion/src/execution/physical_plan/sort.rs ## @@ -61,44 +61,28 @@ impl ExecutionPlan for SortExec {