[GitHub] [arrow] alamb commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
alamb commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678766915 @jorgecarleitao -- another thing I can think of would be to postpone the UDF resolution until the type coercion logical optimizer pass. So in other words, when converting

[GitHub] [arrow] alamb commented on pull request #8031: ARROW-9815: [Rust][DataFusion] Add a trait for looking up scalar functions by name

2020-08-23 Thread GitBox
alamb commented on pull request #8031: URL: https://github.com/apache/arrow/pull/8031#issuecomment-678779883 FYI @jorgecarleitao and @andygrove This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] alamb commented on a change in pull request #8031: ARROW-9815: [Rust][DataFusion] Add a trait for looking up scalar functions by name

2020-08-23 Thread GitBox
alamb commented on a change in pull request #8031: URL: https://github.com/apache/arrow/pull/8031#discussion_r475224655 ## File path: rust/datafusion/src/execution/context.rs ## @@ -189,16 +190,7 @@ impl ExecutionContext { /// Register a scalar UDF pub fn

[GitHub] [arrow] andygrove closed pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-23 Thread GitBox
andygrove closed pull request #8029: URL: https://github.com/apache/arrow/pull/8029 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8033: ARROW-9837: [Rust][DataFusion] Add provider for variable

2020-08-23 Thread GitBox
github-actions[bot] commented on pull request #8033: URL: https://github.com/apache/arrow/pull/8033#issuecomment-678790001 https://issues.apache.org/jira/browse/ARROW-9837 This is an automated message from the Apache Git

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8034: ARROW-9464: [Rust] [DataFusion] Physical plan optimization rule to insert MergeExec when needed

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #8034: URL: https://github.com/apache/arrow/pull/8034#discussion_r475253287 ## File path: rust/datafusion/src/execution/physical_plan/mod.rs ## @@ -74,6 +85,15 @@ impl Partitioning { } } +/// Distribution schemes

[GitHub] [arrow] github-actions[bot] commented on pull request #8030: ARROW-9835: [Rust][DataFusion] Removed FunctionMeta and FunctionType

2020-08-23 Thread GitBox
github-actions[bot] commented on pull request #8030: URL: https://github.com/apache/arrow/pull/8030#issuecomment-678764184 https://issues.apache.org/jira/browse/ARROW-9835 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove commented on a change in pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-23 Thread GitBox
andygrove commented on a change in pull request #8029: URL: https://github.com/apache/arrow/pull/8029#discussion_r475225864 ## File path: rust/datafusion/src/execution/physical_plan/sort.rs ## @@ -61,44 +61,28 @@ impl ExecutionPlan for SortExec {

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8031: ARROW-9815: [Rust][DataFusion] Add a trait for looking up scalar functions by name

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #8031: URL: https://github.com/apache/arrow/pull/8031#discussion_r475229968 ## File path: rust/datafusion/src/execution/context.rs ## @@ -477,9 +466,9 @@ impl ExecutionConfig { /// Execution context for registering data

[GitHub] [arrow] wqc200 opened a new pull request #8033: ARROW-9837: [Rust][DataFusion] Add provider for variable

2020-08-23 Thread GitBox
wqc200 opened a new pull request #8033: URL: https://github.com/apache/arrow/pull/8033 Select @@version; @@version is a variable, and if we want to get its value, we should get it from outside the system, This is an

[GitHub] [arrow] alamb commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
alamb commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678780754 > When you mean data_type you mean the arguments' types or the return_type? I was referring to `Expr::ScalarFunction::data_type`:

[GitHub] [arrow] andygrove commented on pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-23 Thread GitBox
andygrove commented on pull request #8029: URL: https://github.com/apache/arrow/pull/8029#issuecomment-678784408 > This is an impressive simplification and improvement. Really great work, @andygrove ! > > I went through it and could not find any issue with it, only benefits. >

[GitHub] [arrow] github-actions[bot] commented on pull request #8032: ARROW-9836: [Rust][DataFusion] Improve API for usage of UDFs

2020-08-23 Thread GitBox
github-actions[bot] commented on pull request #8032: URL: https://github.com/apache/arrow/pull/8032#issuecomment-678788344 https://issues.apache.org/jira/browse/ARROW-9836 This is an automated message from the Apache Git

[GitHub] [arrow] alamb commented on a change in pull request #8029: ARROW-9464: [Rust] [DataFusion] Remove Partition trait

2020-08-23 Thread GitBox
alamb commented on a change in pull request #8029: URL: https://github.com/apache/arrow/pull/8029#discussion_r475205751 ## File path: rust/datafusion/src/execution/context.rs ## @@ -350,72 +350,64 @@ impl ExecutionContext { } /// Execute a physical plan and collect

[GitHub] [arrow] jorgecarleitao commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678768666 When you mean `data_type` you mean the arguments' types or the `return_type`? This is an automated message

[GitHub] [arrow] github-actions[bot] commented on pull request #8031: ARROW-9815: [Rust][DataFusion] Add a trait for looking up scalar functions by name

2020-08-23 Thread GitBox
github-actions[bot] commented on pull request #8031: URL: https://github.com/apache/arrow/pull/8031#issuecomment-678780263 https://issues.apache.org/jira/browse/ARROW-9815 This is an automated message from the Apache Git

[GitHub] [arrow] alamb commented on a change in pull request #8030: ARROW-9835: [Rust][DataFusion] Removed FunctionMeta and FunctionType

2020-08-23 Thread GitBox
alamb commented on a change in pull request #8030: URL: https://github.com/apache/arrow/pull/8030#discussion_r475225101 ## File path: rust/datafusion/src/execution/context.rs ## @@ -489,15 +489,13 @@ impl SchemaProvider for ExecutionContextState {

[GitHub] [arrow] alamb opened a new pull request #8031: ARROW-9815: [Rust][DataFusion] Add a trait for looking up scalar functions by name

2020-08-23 Thread GitBox
alamb opened a new pull request #8031: URL: https://github.com/apache/arrow/pull/8031 Inspired by the conversation on https://github.com/apache/arrow/pull/8018/files, I have been bothered by the use of Arc/Mutex and the resulting code complication in ExecutionContext and LogicalPlanning.

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8030: ARROW-9835: [Rust][DataFusion] Removed FunctionMeta and FunctionType

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #8030: URL: https://github.com/apache/arrow/pull/8030#discussion_r475227407 ## File path: rust/datafusion/src/execution/context.rs ## @@ -489,15 +489,13 @@ impl SchemaProvider for ExecutionContextState {

[GitHub] [arrow] jorgecarleitao commented on pull request #8034: ARROW-9464: [Rust] [DataFusion] Physical plan optimization rule to insert MergeExec when needed

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #8034: URL: https://github.com/apache/arrow/pull/8034#issuecomment-678807425 It looks fantastic! Super excited to see this! Gave it a quick look, and my understanding so far: * Physical nodes have requirements

[GitHub] [arrow] jorgecarleitao commented on pull request #8027: ARROW-9762: [Rust] [DataFusion] ExecutionContext::sql now returns DataFrame

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #8027: URL: https://github.com/apache/arrow/pull/8027#issuecomment-678760761 Well, thank you to @andygrove , that took the initiative and did the hard work! This is an automated

[GitHub] [arrow] jorgecarleitao commented on pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #8024: URL: https://github.com/apache/arrow/pull/8024#issuecomment-678770336 @alamb , thanks a lot for that insight. I may have been using the wrong notation here. I think that we have each columns' type during logical planning: the

[GitHub] [arrow] andygrove commented on pull request #8034: ARROW-9464: [Rust] [DataFusion] Physical plan optimization rule to insert MergeExec when needed

2020-08-23 Thread GitBox
andygrove commented on pull request #8034: URL: https://github.com/apache/arrow/pull/8034#issuecomment-678809178 > It looks fantastic! Super excited to see this! > > Gave it a quick look, and my understanding so far: > > * Physical nodes have requirements

[GitHub] [arrow] jorgecarleitao opened a new pull request #8030: ARROW-9835: [Rust][DataFusion] Removed FunctionMeta and FunctionType

2020-08-23 Thread GitBox
jorgecarleitao opened a new pull request #8030: URL: https://github.com/apache/arrow/pull/8030 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] andygrove closed pull request #8028: ARROW-9833: [Rust] [DataFusion] TableProvider.scan now returns ExecutionPlan

2020-08-23 Thread GitBox
andygrove closed pull request #8028: URL: https://github.com/apache/arrow/pull/8028 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove closed pull request #8024: ARROW-9809: [Rust][DataFusion] Fixed type coercion, supertypes and type checking.

2020-08-23 Thread GitBox
andygrove closed pull request #8024: URL: https://github.com/apache/arrow/pull/8024 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8034: ARROW-9464: [Rust] [DataFusion] Physical plan optimization rule to insert MergeExec when needed

2020-08-23 Thread GitBox
github-actions[bot] commented on pull request #8034: URL: https://github.com/apache/arrow/pull/8034#issuecomment-678805666 https://issues.apache.org/jira/browse/ARROW-9464 This is an automated message from the Apache Git

[GitHub] [arrow] andygrove opened a new pull request #8034: ARROW-9464: [Rust] [DataFusion] Physical plan optimization rule to insert MergeExec when needed

2020-08-23 Thread GitBox
andygrove opened a new pull request #8034: URL: https://github.com/apache/arrow/pull/8034 This PR adds the first physical optimization rule, to insert explicit MergeExec nodes into the physical plan when operators require a single partition of input (such as GlobalLimitExec, SortExec,

[GitHub] [arrow] andygrove commented on pull request #8034: ARROW-9464: [Rust] [DataFusion] Physical plan optimization rule to insert MergeExec when needed

2020-08-23 Thread GitBox
andygrove commented on pull request #8034: URL: https://github.com/apache/arrow/pull/8034#issuecomment-678805566 @alamb @jorgecarleitao I'm pretty excited about this PR. This is a good example of how we can write optimizer rules against a trait-based plan.

[GitHub] [arrow] jorgecarleitao commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678781518 The code you pointed to reads `return_type: DataType`. I will assume you mean the return type declared in `Expr::ScalarFunctions`. Two minds thinking alike: I was

[GitHub] [arrow] jorgecarleitao opened a new pull request #8032: ARROW-9836: [Rust][DataFusion] Improve API for usage of UDFs

2020-08-23 Thread GitBox
jorgecarleitao opened a new pull request #8032: URL: https://github.com/apache/arrow/pull/8032 See associated issue and document for details. The gist is that currently, users call UDFs through ``` df.select(scalar_functions(“sqrt”, vec![col(“a”)], DataType::Float64))

[GitHub] [arrow] jorgecarleitao commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678739758 This is currently failing for an interesting reason, and I need some help in decision making. TL;DR options: 1. wait for #8024 and stop using the type

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8033: ARROW-9837: [Rust][DataFusion] Add provider for variable

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #8033: URL: https://github.com/apache/arrow/pull/8033#discussion_r475270454 ## File path: rust/datafusion/src/variable/system.rs ## @@ -0,0 +1,18 @@ +use crate::logicalplan::ScalarValue; +use crate::error::Result; +use

[GitHub] [arrow] liyafan82 commented on pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-08-23 Thread GitBox
liyafan82 commented on pull request #7837: URL: https://github.com/apache/arrow/pull/7837#issuecomment-678870459 Merging this. Thanks to all reviews for the good comments. This is an automated message from the Apache Git

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao edited a comment on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678825604 I pushed a new commit for this. Essentially, the new commit moves the type coercion of UDFs to the physical plan, thus aligning this code base to the current master

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8033: ARROW-9837: [Rust][DataFusion] Add provider for variable

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #8033: URL: https://github.com/apache/arrow/pull/8033#discussion_r475270612 ## File path: rust/datafusion/src/logicalplan.rs ## @@ -713,6 +718,9 @@ impl fmt::Debug for Expr { match self {

[GitHub] [arrow] sagnikc-dremio opened a new pull request #8035: ARROW-9641: [C++][Gandiva] Implement round() for floating point and double floating point numbers

2020-08-23 Thread GitBox
sagnikc-dremio opened a new pull request #8035: URL: https://github.com/apache/arrow/pull/8035 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] jorgecarleitao commented on pull request #7984: ARROW-9779: [Rust] [DataFusion] Increase stability of average accumulator

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #7984: URL: https://github.com/apache/arrow/pull/7984#issuecomment-678829586 Closing as wont fix. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] jorgecarleitao closed pull request #7984: ARROW-9779: [Rust] [DataFusion] Increase stability of average accumulator

2020-08-23 Thread GitBox
jorgecarleitao closed pull request #7984: URL: https://github.com/apache/arrow/pull/7984 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] liyafan82 closed pull request #7837: ARROW-9554: [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result

2020-08-23 Thread GitBox
liyafan82 closed pull request #7837: URL: https://github.com/apache/arrow/pull/7837 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] emkornfield commented on a change in pull request #8023: ARROW-9318: [C++] Parquet encryption key management

2020-08-23 Thread GitBox
emkornfield commented on a change in pull request #8023: URL: https://github.com/apache/arrow/pull/8023#discussion_r475337736 ## File path: cpp/src/parquet/encryption.h ## @@ -47,15 +47,15 @@ using ColumnPathToEncryptionPropertiesMap = class PARQUET_EXPORT

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #7967: URL: https://github.com/apache/arrow/pull/7967#discussion_r475266986 ## File path: rust/datafusion/src/execution/physical_plan/planner.rs ## @@ -306,32 +305,25 @@ impl DefaultPhysicalPlanner {

[GitHub] [arrow] zhztheplayer edited a comment on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-08-23 Thread GitBox
zhztheplayer edited a comment on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-678839259 @emkornfield Sorry for the late reply. And yes I was planing to try adding a `Bits.java` based implementation to this PR, but I may not be able to work on it

[GitHub] [arrow] zhztheplayer commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-08-23 Thread GitBox
zhztheplayer commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-678839259 @emkornfield Sorry for the late reply. And yes I was planing to try adding a `Bits.java` based implementation to this PR, but I may not be able to working on it instantly

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao commented on a change in pull request #7967: URL: https://github.com/apache/arrow/pull/7967#discussion_r475266854 ## File path: rust/datafusion/src/execution/physical_plan/math_expressions.rs ## @@ -103,56 +103,3 @@ pub fn scalar_functions() -> Vec {

[GitHub] [arrow] jorgecarleitao commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

2020-08-23 Thread GitBox
jorgecarleitao commented on pull request #7967: URL: https://github.com/apache/arrow/pull/7967#issuecomment-678825604 I pushed a new commit for this. Essentially, the new commit moves the type coercion of UDFs to the physical plan, thus aligning this code base to the current master after

[GitHub] [arrow] github-actions[bot] commented on pull request #8035: ARROW-9641: [C++][Gandiva] Implement round() for floating point and double floating point numbers

2020-08-23 Thread GitBox
github-actions[bot] commented on pull request #8035: URL: https://github.com/apache/arrow/pull/8035#issuecomment-678885179 https://issues.apache.org/jira/browse/ARROW-9641 This is an automated message from the Apache Git

[GitHub] [arrow] emkornfield commented on a change in pull request #8023: ARROW-9318: [C++] Parquet encryption key management

2020-08-23 Thread GitBox
emkornfield commented on a change in pull request #8023: URL: https://github.com/apache/arrow/pull/8023#discussion_r475351570 ## File path: cpp/src/parquet/test_encryption_util.h ## @@ -65,5 +66,36 @@ inline std::string data_file(const char* file) { return ss.str(); }