[PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
prashantksharma opened a new pull request, #415: URL: https://github.com/apache/datafusion-comet/pull/415 ## Which issue does this PR close? Closes #361 ## Rationale for this change The updates to `expr.proto` by using enums instead of strings, which prevents potent

Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
prashantksharma commented on issue #361: URL: https://github.com/apache/datafusion-comet/issues/361#issuecomment-2105614143 @andygrove cc: @viirya I have opended a draft PR. I have tested the changes using - `make test-rust` - `make test-jvm` Details on PR messa

Re: [PR] fix: Unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105618196 Ci failed because of connection timeout: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] fix: Unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105628167 > thanks @leoluan2009 appreciate if you could add the unit test to prevent regression @comphead can you give me a example? thanks -- This is an automated message from

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105629456 May I take a look? Merci. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] Apply guarantee rewriter to sql workflow [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 opened a new issue, #10456: URL: https://github.com/apache/datafusion/issues/10456 ### Is your feature request related to a problem or challenge? While deprecating`Expr::GetIndexedField`, I found there are many test cases that are not covered in sqllogictest, for example `

Re: [I] Use `min_value` and `max_value` on statistics to avoid `ExecutionPlan.execute` [datafusion]

2024-05-11 Thread via GitHub
samuelcolvin commented on issue #10400: URL: https://github.com/apache/datafusion/issues/10400#issuecomment-2105649193 @alamb using `PruningPredicate` makes sense, but please can you point me at where I need to make changes to add this functionality? -- This is an automated message from t

[PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
NoeB opened a new pull request, #10457: URL: https://github.com/apache/datafusion/pull/10457 ## Which issue does this PR close? part of #10384 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ##

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597413786 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
Jefffrey commented on code in PR #10457: URL: https://github.com/apache/datafusion/pull/10457#discussion_r1597413465 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2285,6 +2285,201 @@ ORDER BY tag 33 11 NULL 33 11 NULL 33 11 NULL B +# bit_and_i32 +statement ok

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-11 Thread via GitHub
peter-toth commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105664520 > Are there any potential issues with simply using the existing `Hash` implementation of `Expr` to create `HashSet`s? > > Serveral other optimization passes use string

Re: [PR] Minor: Add usecase to comments in `LogicalPlan::recompute_schema` [datafusion]

2024-05-11 Thread via GitHub
alamb merged PR #10443: URL: https://github.com/apache/datafusion/pull/10443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Add usecase to comments in `LogicalPlan::recompute_schema` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10443: URL: https://github.com/apache/datafusion/pull/10443#issuecomment-2105669432 Thanks for the review @Jefffrey and @yyy1000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] doc: fix old master branch references to main [datafusion]

2024-05-11 Thread via GitHub
Jefffrey opened a new pull request, #10458: URL: https://github.com/apache/datafusion/pull/10458 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
NoeB commented on PR #10457: URL: https://github.com/apache/datafusion/pull/10457#issuecomment-2105674257 @Jefffrey Thank you for the review, I added a new commit which adds the missing distinct for the type -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] doc: fix old master branch references to main [datafusion]

2024-05-11 Thread via GitHub
alamb merged PR #10458: URL: https://github.com/apache/datafusion/pull/10458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10454: URL: https://github.com/apache/datafusion/pull/10454#discussion_r1597426402 ## datafusion/expr/src/expr.rs: ## @@ -1654,28 +1654,42 @@ fn fmt_function( write!(f, "{}({}{})", fun, distinct_str, args.join(", ")) } -fn create_function_n

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-11 Thread via GitHub
alamb commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2105680216 > UPDATE: It looks like Expr already derives Hash. Is there a reason we're not using that instead of string keys? I believe CSE may predate the Hash impl for Expr -- This i

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597427699 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -20,23 +20,124 @@ use std::sync::Arc; use crate::signature::{ ArrayFunctionSignature, FIXED_SIZE_LIST_

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105681808 of course thank you @ClSlaid -- just be aware these PRs have tended to be tricky. I recommend doing it incrementally if possible. Let me know if you need some help 🙏 -- This

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-11 Thread via GitHub
peter-toth commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2105681913 > UPDATE: It looks like Expr already derives Hash. Is there a reason we're not using that instead of string keys? I forgot to comment on this thread that my detailed answ

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597428939 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

[PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-11 Thread via GitHub
vaibhawvipul opened a new pull request, #416: URL: https://github.com/apache/datafusion-comet/pull/416 ## Which issue does this PR close? Closes #374 . ## Rationale for this change ## What changes are included in this PR? ## How are these ch

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597432978 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597433886 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-11 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2105690563 @viirya @andygrove passed all the tests on my personal github actions -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597436309 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -20,23 +20,124 @@ use std::sync::Arc; use crate::signature::{ ArrayFunctionSignature, FIXED_SIZE_

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597437435 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597437435 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597438771 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597438771 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-11 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2105729144 hi @parthchandra @andygrove made changes as suggested ported the date parsing logic [from SparkDateTimeUtils](https://github.com/apache/spark/blob/9d79ab42b127d1a12164cec260

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on PR #10457: URL: https://github.com/apache/datafusion/pull/10457#issuecomment-2105730542 Thanks @NoeB and @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 merged PR #10457: URL: https://github.com/apache/datafusion/pull/10457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-11 Thread via GitHub
erratic-pattern commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105737013 Thanks for the detailed write up @peter-toth . Though I did mention `HashSet` specifically, my suggestion more generally goes along the lines of using the `Hash` implem

Re: [PR] Introduce user-defined signature [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 merged PR #10439: URL: https://github.com/apache/datafusion/pull/10439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Support "User defined coercion" rules [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 closed issue #10423: Support "User defined coercion" rules URL: https://github.com/apache/datafusion/issues/10423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597445680 ## core/src/execution/datafusion/planner.rs: ## @@ -346,10 +346,10 @@ impl PhysicalPlanner { let child = self.create_expr(expr.child.as_ref().

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597445759 ## core/src/execution/proto/expr.proto: ## @@ -233,12 +233,20 @@ message Remainder { DataType return_type = 4; } +enum EvalMode { + LEGACY = 0; + TRY =

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597446255 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1207,7 +1223,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597446354 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -525,6 +527,18 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-11 Thread via GitHub
andygrove commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105743297 > I think this is more a user experience problem, how should we design it is discussable. It is more than a user experience issue. The current API is causing test failu

Re: [I] Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)? [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on issue #414: URL: https://github.com/apache/datafusion-comet/issues/414#issuecomment-2105743676 I plan on creating a PR to update our documentation to make it clear that we only support Apache Spark and not other Spark implementations. -- This is an automated messag

Re: [PR] refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
erratic-pattern commented on code in PR #10454: URL: https://github.com/apache/datafusion/pull/10454#discussion_r1597448118 ## datafusion/expr/src/expr.rs: ## @@ -1654,28 +1654,42 @@ fn fmt_function( write!(f, "{}({}{})", fun, distinct_str, args.join(", ")) } -fn create_

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-11 Thread via GitHub
peter-toth commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105745741 > I like the idea of generalizing the `(u64, &Expr)` struct into something reuseable across optimizations. Honestly, I don't know the those referenced usecases, but I f

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597465482 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

[PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-11 Thread via GitHub
jonahgao opened a new pull request, #10459: URL: https://github.com/apache/datafusion/pull/10459 ## Which issue does this PR close? Closes #10413. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-11 Thread via GitHub
ozankabak commented on PR #10404: URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2105942784 Thank you. We will address your review feedback and then merge afterwards. 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Remove `AggregateFunctionDefinition::Name` [datafusion]

2024-05-11 Thread via GitHub
comphead merged PR #10441: URL: https://github.com/apache/datafusion/pull/10441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead merged PR #10430: URL: https://github.com/apache/datafusion/pull/10430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105950929 hi, @alamb, I'd like to know if those two expressions differ: ```rust let aggr_expr = vec![Expr::AggregateFunction(AggregateFunction::new(

[PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid opened a new pull request, #10460: URL: https://github.com/apache/datafusion/pull/10460 ## Which issue does this PR close? Closes #10293. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-11 Thread via GitHub
Michael-J-Ward commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105957139 Since the `regexp_*` UDFs have the same problem, I suspect that `array_slice` was just our first encounter w/ the underlying issue: any "inner" UDF implementation that be

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597478628 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597478938 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597478938 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597480063 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] fix: Unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-11 Thread via GitHub
viirya commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105965411 > Ci failed because of connection timeout I will re-trigger failed pipelines. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-11 Thread via GitHub
viirya commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2105965894 Triggered CI pipelines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fix Docs [datafusion-python]

2024-05-11 Thread via GitHub
andygrove merged PR #676: URL: https://github.com/apache/datafusion-python/pull/676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] doc builds are broken [datafusion-python]

2024-05-11 Thread via GitHub
andygrove closed issue #675: doc builds are broken URL: https://github.com/apache/datafusion-python/issues/675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2105992517 Thanks @erratic-pattern -- I took the liberty of merging the branch up from main to resolve a merge conflict as well -- This is an automated message from the Apache Git Service. To

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105992692 > hi, @alamb, I'd like to know if those two expressions are semantically identical: Yes, I believe they are the same -- This is an automated message from the Apache Git S

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597493524 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597493578 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10430: URL: https://github.com/apache/datafusion/pull/10430#issuecomment-2105995932 Thanks for the review @comphead 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] build(deps): bump datafusion from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #680: URL: https://github.com/apache/datafusion-python/pull/680 Bumps [datafusion](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed7568f655dd

[PR] build(deps): bump datafusion-common from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #679: URL: https://github.com/apache/datafusion-python/pull/679 Bumps [datafusion-common](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed756

[PR] build(deps): bump datafusion-optimizer from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #681: URL: https://github.com/apache/datafusion-python/pull/681 Bumps [datafusion-optimizer](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed

[PR] build(deps): bump syn from 2.0.60 to 2.0.63 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #682: URL: https://github.com/apache/datafusion-python/pull/682 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.60 to 2.0.63. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.63 Pa

Re: [PR] build(deps): bump syn from 2.0.48 to 2.0.58 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] commented on PR #631: URL: https://github.com/apache/datafusion-python/pull/631#issuecomment-2105997735 Superseded by #682. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] build(deps): bump syn from 2.0.48 to 2.0.58 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] closed pull request #631: build(deps): bump syn from 2.0.48 to 2.0.58 URL: https://github.com/apache/datafusion-python/pull/631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Introduce user-defined signature [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10439: URL: https://github.com/apache/datafusion/pull/10439#issuecomment-2105998161 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[PR] build(deps): bump datafusion-substrait from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #683: URL: https://github.com/apache/datafusion-python/pull/683 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed

[PR] build(deps): bump datafusion-functions-array from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #684: URL: https://github.com/apache/datafusion-python/pull/684 Bumps [datafusion-functions-array](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c

[PR] build(deps): bump datafusion-sql from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #685: URL: https://github.com/apache/datafusion-python/pull/685 Bumps [datafusion-sql](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed7568f6

[PR] build(deps): bump datafusion-expr from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #686: URL: https://github.com/apache/datafusion-python/pull/686 Bumps [datafusion-expr](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed7568f

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2106006470 Wow -- according to my benchmarks this change makes a non trivial difference in performance. We just keep driving tese numbers down ``` group

[I] Add to_date function to scalar functions doc [datafusion]

2024-05-11 Thread via GitHub
Omega359 opened a new issue, #10461: URL: https://github.com/apache/datafusion/issues/10461 ### Describe the bug The to_date function is missing from the scalar functions doc in the user guide. ### To Reproduce See https://datafusion.apache.org/user-guide/sql/scalar_fun

[I] Add to_unixtime function to scalar functions doc [datafusion]

2024-05-11 Thread via GitHub
Omega359 opened a new issue, #10462: URL: https://github.com/apache/datafusion/issues/10462 ### Describe the bug The to_unixtime function is missing from the scalar functions doc in the user guide. ### To Reproduce See https://datafusion.apache.org/user-guide/sql/scalar

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2106058219 Do you mean df-python always panics if `stride` is not given? ``` #[pyfunction] #[pyo3(signature = (array, begin, end, stride = 1))] fn array_slice(array: PyExp

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597524103 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597524103 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

[PR] Add simplifier additional between [datafusion]

2024-05-11 Thread via GitHub
yyy1000 opened a new pull request, #10463: URL: https://github.com/apache/datafusion/pull/10463 ## Which issue does this PR close? Closes #10456. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Add simplifier additional between [datafusion]

2024-05-11 Thread via GitHub
yyy1000 commented on PR #10463: URL: https://github.com/apache/datafusion/pull/10463#issuecomment-2106065020 The example in #10456 in this PR like below ``` > create table t (c int) as values (1), (3), (5); 0 row(s) fetched. Elapsed 0.031 seconds. > explain verbose sel

Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead commented on code in PR #10356: URL: https://github.com/apache/datafusion/pull/10356#discussion_r1597526491 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -66,26 +67,28 @@ impl AnalyzerRule for TypeCoercion { } fn analyze(&self, plan: LogicalPl

Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead commented on code in PR #10356: URL: https://github.com/apache/datafusion/pull/10356#discussion_r1597526532 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -98,25 +101,75 @@ fn analyze_internal( // select t2.c2 from t1 where t1.c1 in (select t2.c1 from

Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead commented on code in PR #10356: URL: https://github.com/apache/datafusion/pull/10356#discussion_r1597527103 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -98,25 +101,75 @@ fn analyze_internal( // select t2.c2 from t1 where t1.c1 in (select t2.c1 from

Re: [PR] Add simplifier additional between [datafusion]

2024-05-11 Thread via GitHub
yyy1000 closed pull request #10463: Add simplifier additional between URL: https://github.com/apache/datafusion/pull/10463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Apply guarantee rewriter to sql workflow [datafusion]

2024-05-11 Thread via GitHub
yyy1000 commented on issue #10456: URL: https://github.com/apache/datafusion/issues/10456#issuecomment-2106074548 On a second glance, I feel it's difficult. 😥 When simplifying a logicalplan, it seems impossible to get the underlying data which could making `guarantees`. -- This is an

Re: [PR] UpdateD pool.rs [datafusion]

2024-05-11 Thread via GitHub
github-actions[bot] closed pull request #6943: UpdateD pool.rs URL: https://github.com/apache/datafusion/pull/6943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Add `async` UDF example [datafusion]

2024-05-11 Thread via GitHub
github-actions[bot] closed pull request #6713: Add `async` UDF example URL: https://github.com/apache/datafusion/pull/6713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Improve round-robin repartitioning [datafusion]

2024-05-11 Thread via GitHub
github-actions[bot] commented on PR #6047: URL: https://github.com/apache/datafusion/pull/6047#issuecomment-2106085595 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

[I] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 opened a new issue, #417: URL: https://github.com/apache/datafusion-comet/issues/417 ### What is the problem the feature request solves? some config name of columnar shuffle is not consistent and refine it ### Describe the potential solution _No response_

[I] Update the DataFusion in Python website [datafusion-python]

2024-05-11 Thread via GitHub
Weijun-H opened a new issue, #687: URL: https://github.com/apache/datafusion-python/issues/687 **Describe the bug** The logo and the links on the website are outdated and deprecated. We should check and update them. -- This is an automated message from the Apache Git Service. To re

[PR] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 opened a new pull request, #418: URL: https://github.com/apache/datafusion-comet/pull/418 ## Which issue does this PR close? Closes #417. ## Rationale for this change ## What changes are included in this PR? ## How are these chan

Re: [PR] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 commented on PR #418: URL: https://github.com/apache/datafusion-comet/pull/418#issuecomment-2106099066 @viirya Help to start CI, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597540519 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597542555 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597542555 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

  1   2   >