Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-03 Thread via GitHub
mustafasrepo commented on code in PR #11196: URL: https://github.com/apache/datafusion/pull/11196#discussion_r1663629379 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -3188,18 +3191,16 @@ logical_plan physical_plan 01)SortPreservingMergeExec: [rn1@5 ASC NULLS LAST] 0

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-03 Thread via GitHub
mustafasrepo commented on code in PR #11196: URL: https://github.com/apache/datafusion/pull/11196#discussion_r1663629379 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -3188,18 +3191,16 @@ logical_plan physical_plan 01)SortPreservingMergeExec: [rn1@5 ASC NULLS LAST] 0

[I] Implement user defined planner for `sql_compound_identifier_to_expr` [datafusion]

2024-07-03 Thread via GitHub
dharanad opened a new issue, #11244: URL: https://github.com/apache/datafusion/issues/11244 ### Is your feature request related to a problem or challenge? As discussed in https://github.com/apache/datafusion/issues/11207 The idea is to move hardcoded the `sql_compound_identifier_to_ex

[I] Implement user defined planner for `sql_substring_to_expr ` [datafusion]

2024-07-03 Thread via GitHub
dharanad opened a new issue, #11245: URL: https://github.com/apache/datafusion/issues/11245 ### Is your feature request related to a problem or challenge? As discussed in https://github.com/apache/datafusion/issues/11207 The idea is to move hardcoded the `sql_substring_to_expr` operat

[I] Implement user defined planner for `sql_position_to_expr` [datafusion]

2024-07-03 Thread via GitHub
dharanad opened a new issue, #11246: URL: https://github.com/apache/datafusion/issues/11246 ### Is your feature request related to a problem or challenge? As discussed in https://github.com/apache/datafusion/issues/11207 The idea is to move hardcoded the `sql_position_to_expr` operato

Re: [I] [Epic] Complete pulling out special SQL planning from the Sql Parser [datafusion]

2024-07-03 Thread via GitHub
dharanad commented on issue #11207: URL: https://github.com/apache/datafusion/issues/11207#issuecomment-2205268590 Create issues for the remaining tasks, tried adding a description based on my understanding of the issue. Also update the same for the older ones * `sql_compound_identifi

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
jayzhan211 commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205272844 > What I'm trying to get at is that there's currently unnecessary computation happening within `BinaryExpr` (perhaps in many cases, just calculating the `left` side would be

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-03 Thread via GitHub
jayzhan211 commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2205306156 I can think of two alternative approaches 1. We can just add `wildcard` to count if we found that args is empty ``` select_exprs: [AggregateFunction(AggregateFunction {

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
acking-you commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205337547 > > What I'm trying to get at is that there's currently unnecessary computation happening within `BinaryExpr` (perhaps in many cases, just calculating the `left` side would b

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
acking-you commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205396425 > > possibility that left might already be false and op is And, or that left might be true and op is Or. > > In general I think there is a tradeoff between doing short

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
eejbyfeldt commented on PR #11232: URL: https://github.com/apache/datafusion/pull/11232#issuecomment-2205396789 > The fix looks good. It'd be better if we can add some e2e tests in sqllogictest. @viirya Good suggestion. I tried added some sqllogictest, they work. But as far as I can

[PR] [draft] add shot circuit in BinaryExpr [datafusion]

2024-07-03 Thread via GitHub
acking-you opened a new pull request, #11247: URL: https://github.com/apache/datafusion/pull/11247 ## Which issue does this PR close? Closes #11212 ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

[I] `where` clause incorrectly reject `NULL` literal (by SQLancer-NoREC) [datafusion]

2024-07-03 Thread via GitHub
2010YOUY01 opened a new issue, #11248: URL: https://github.com/apache/datafusion/issues/11248 ### Describe the bug ```shell DataFusion CLI v39.0.0 > select 1 where null; Error during planning: Cannot create filter with non-boolean predicate 'NULL' returning Null ``` Thi

Re: [I] Implement SQLancer (a end-to-end SQL fuzz testing library) [datafusion]

2024-07-03 Thread via GitHub
2010YOUY01 commented on issue #11030: URL: https://github.com/apache/datafusion/issues/11030#issuecomment-2205424485 This is the first interesting bug found: https://github.com/apache/datafusion/issues/11248: It did not crash the DataFusion engine, instead it silently returned an incorre

[I] Make error message better when `bitwise_*` operator takes wrong argument type [datafusion]

2024-07-03 Thread via GitHub
2010YOUY01 opened a new issue, #11249: URL: https://github.com/apache/datafusion/issues/11249 ### Describe the bug This is not a bug, but the error message can be better: The following queries' problem is just `bitwise_*` operator takes the not-supported argument type, so it should

[I] Improve error message for wrong argument type in operators [datafusion]

2024-07-03 Thread via GitHub
2010YOUY01 opened a new issue, #11250: URL: https://github.com/apache/datafusion/issues/11250 ### Is your feature request related to a problem or challenge? When input arguments of unsupported type, scalar functions will return user-friendly error message: ``` > select abs(1,2,3

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
Blizzara commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1663844385 ## datafusion/common/src/scalar/mod.rs: ## @@ -3399,6 +3451,43 @@ impl fmt::Display for ScalarValue { .join(",") )?

[PR] M`arrow_cast::parse_data_type` public [datafusion]

2024-07-03 Thread via GitHub
samuelcolvin opened a new pull request, #11251: URL: https://github.com/apache/datafusion/pull/11251 ## Rationale for this change Until https://github.com/apache/arrow-rs/issues/3821 gets closed and `parse_data_type` gets moved to arrow-rs, it would be extremely helpful for us if `pa

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
Blizzara commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1663863910 ## datafusion/common/src/scalar/mod.rs: ## @@ -3399,6 +3451,43 @@ impl fmt::Display for ScalarValue { .join(",") )?

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
Blizzara commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1663876483 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -1246,6 +1246,30 @@ fn round_trip_scalar_values() { ), ]))) .unwra

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
Blizzara commented on PR #11224: URL: https://github.com/apache/datafusion/pull/11224#issuecomment-2205557166 Thanks, looks good - I left couple notes but nothing major! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Error evaluating clause `where COL_BIGINT < 1e100` (Found by SQLancer-NoREC) [datafusion]

2024-07-03 Thread via GitHub
2010YOUY01 opened a new issue, #11252: URL: https://github.com/apache/datafusion/issues/11252 ### Describe the bug When a `BIGINT` typed column is comparing with a large `DOUBLE` literal, it's not evaluated correct if it's in where clause ``` > create table t0(v0 BIGINT); 0 r

Re: [PR] Make `arrow_cast::parse_data_type` public [datafusion]

2024-07-03 Thread via GitHub
samuelcolvin commented on PR #11251: URL: https://github.com/apache/datafusion/pull/11251#issuecomment-2205572291 I've assuming this will be refused due to the outstanding issue, so I've implemented https://github.com/apache/arrow-rs/pull/5994. -- This is an automated message from the Apa

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
Dandandan commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205591055 Thanks for providing some benchmarks / example @acking-you ! I think the missing optimizations seem to be: * Checking for array.true_count() to be all zero / all true.

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-03 Thread via GitHub
alamb commented on code in PR #11196: URL: https://github.com/apache/datafusion/pull/11196#discussion_r1663937416 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -173,6 +174,12 @@ impl EquivalenceProperties { self.oeq_class.clear(); } +/// Re

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-03 Thread via GitHub
alamb merged PR #11196: URL: https://github.com/apache/datafusion/pull/11196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-03 Thread via GitHub
alamb commented on PR #11196: URL: https://github.com/apache/datafusion/pull/11196#issuecomment-2205650075 Thanks @mustafasrepo ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Make `arrow_cast::parse_data_type` public [datafusion]

2024-07-03 Thread via GitHub
samuelcolvin closed pull request #11251: Make `arrow_cast::parse_data_type` public URL: https://github.com/apache/datafusion/pull/11251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Make `arrow_cast::parse_data_type` public [datafusion]

2024-07-03 Thread via GitHub
samuelcolvin commented on PR #11251: URL: https://github.com/apache/datafusion/pull/11251#issuecomment-2205654320 Closing in favor of apache/arrow-rs#5994. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Support FixedSizedBinaryArray Parquet Data Page Statistics [datafusion]

2024-07-03 Thread via GitHub
dharanad commented on code in PR #11200: URL: https://github.com/apache/datafusion/pull/11200#discussion_r1663943923 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -903,6 +917,21 @@ macro_rules! get_data_page_statistics {

Re: [I] perf: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-03 Thread via GitHub
vaibhawvipul commented on issue #504: URL: https://github.com/apache/datafusion-comet/issues/504#issuecomment-2205742130 I would like to take this up, if no one else is working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Stop changing the case for COPY TO option values [datafusion]

2024-07-03 Thread via GitHub
berkaysynnada commented on issue #10853: URL: https://github.com/apache/datafusion/issues/10853#issuecomment-2205769997 If no one else is working on this issue, I'll take it. My plan is to remove the automatic conversion of values to lowercase and to create a lookup table for keys. This tab

Re: [I] DataFusion weekly project plan (Andrew Lamb) - July 1, 2024 [datafusion]

2024-07-03 Thread via GitHub
samuelcolvin commented on issue #11190: URL: https://github.com/apache/datafusion/issues/11190#issuecomment-2205816686 I'd love to get https://github.com/apache/arrow-rs/pull/5994 merged and released fairly soon if at all possible 🤞 🙏 . -- This is an automated message from the Apache Git

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
goldmedal commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1664044352 ## datafusion/common/src/scalar/mod.rs: ## @@ -3399,6 +3451,43 @@ impl fmt::Display for ScalarValue { .join(",") )?

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
goldmedal commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1664044770 ## datafusion/common/src/scalar/mod.rs: ## @@ -3471,9 +3560,6 @@ impl fmt::Debug for ScalarValue { ScalarValue::List(_) => write!(f, "List({self})")

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
acking-you commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205869619 > Checking for array.true_count() to be all zero / all true. I think this is an optimization that might be better to implement in the arrow-rs and/or (and_kleene, or_kleene)

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
goldmedal commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1664051732 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -1246,6 +1246,30 @@ fn round_trip_scalar_values() { ), ]))) .unwr

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
jayzhan211 commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205885405 > Thanks for providing some benchmarks / example @acking-you ! > > I think the missing optimizations seem to be: > > * Checking for array.true_count() to be all z

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
Dandandan commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205894485 > I'm not sure if modifying and / or kleene logic help, since it needs two boolean array, but in this case we expect to short circuit if the left hand side has boolean array w

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
goldmedal commented on code in PR #11224: URL: https://github.com/apache/datafusion/pull/11224#discussion_r1664071342 ## datafusion/common/src/scalar/mod.rs: ## @@ -3399,6 +3451,43 @@ impl fmt::Display for ScalarValue { .join(",") )?

Re: [PR] fix: Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-07-03 Thread via GitHub
ozankabak commented on PR #11203: URL: https://github.com/apache/datafusion/pull/11203#issuecomment-2205906700 Should we go ahead and merge? Are we waiting for more feedback? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
jayzhan211 commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2205933735 > > I'm not sure if modifying and / or kleene logic help, since it needs two boolean array, but in this case we expect to short circuit if the left hand side has boolean arra

[PR] Register SQL planners in `SessionState` constructor [datafusion]

2024-07-03 Thread via GitHub
dharanad opened a new pull request, #11253: URL: https://github.com/apache/datafusion/pull/11253 ## Which issue does this PR close? Closes #11216 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? Since

Re: [PR] feat: Add support for CreateNamedStruct [datafusion-comet]

2024-07-03 Thread via GitHub
eejbyfeldt commented on code in PR #620: URL: https://github.com/apache/datafusion-comet/pull/620#discussion_r1664128552 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2141,6 +2141,25 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] [draft] Add `LogicalType`, try to support user-defined types [datafusion]

2024-07-03 Thread via GitHub
notfilippo commented on code in PR #11160: URL: https://github.com/apache/datafusion/pull/11160#discussion_r1662578276 ## datafusion/common/src/scalar/mod.rs: ## @@ -3270,6 +3271,118 @@ impl TryFrom<&DataType> for ScalarValue { } } + +impl TryFrom for ScalarValue { +

Re: [PR] Add LogicalPlanSignature and use in the optimizer loop [datafusion]

2024-07-03 Thread via GitHub
findepi commented on code in PR #5623: URL: https://github.com/apache/datafusion/pull/5623#discussion_r1664186593 ## datafusion/optimizer/src/optimizer.rs: ## @@ -330,15 +324,14 @@ impl Optimizer { } log_plan(&format!("Optimized plan (pass {i})"), &new_

Re: [PR] Implement user defined planner for position [datafusion]

2024-07-03 Thread via GitHub
xinlifoobar closed pull request #11243: Implement user defined planner for position URL: https://github.com/apache/datafusion/pull/11243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] add shot circuit in BinaryExpr [datafusion]

2024-07-03 Thread via GitHub
dharanad commented on code in PR #11247: URL: https://github.com/apache/datafusion/pull/11247#discussion_r1664226218 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -257,7 +257,56 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) ->

Re: [I] Replace `println!` with `assert!` if possible in DataFusion examples [datafusion]

2024-07-03 Thread via GitHub
ShreyasSN commented on issue #11230: URL: https://github.com/apache/datafusion/issues/11230#issuecomment-2206129117 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [draft] Add `LogicalType`, try to support user-defined types [datafusion]

2024-07-03 Thread via GitHub
alamb commented on code in PR #11160: URL: https://github.com/apache/datafusion/pull/11160#discussion_r1664110285 ## datafusion-examples/examples/logical_type.rs: ## @@ -0,0 +1,108 @@ +use datafusion::datasource::TableProvider; +use datafusion::execution::context::SessionState;

Re: [PR] Register SQL planners in `SessionState` constructor [datafusion]

2024-07-03 Thread via GitHub
dharanad commented on code in PR #11253: URL: https://github.com/apache/datafusion/pull/11253#discussion_r1664269641 ## datafusion/core/src/execution/session_state.rs: ## @@ -595,7 +609,9 @@ impl SessionState { let mut provider = SessionContextProvider {

Re: [I] `BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios [datafusion]

2024-07-03 Thread via GitHub
Dandandan commented on issue #11212: URL: https://github.com/apache/datafusion/issues/11212#issuecomment-2206250962 Hm I get where you're going at, there is some more optimization possible if we add it to `BinaryExpr` as well -- This is an automated message from the Apache Git Service. To

Re: [PR] add shot circuit in BinaryExpr [datafusion]

2024-07-03 Thread via GitHub
Dandandan commented on code in PR #11247: URL: https://github.com/apache/datafusion/pull/11247#discussion_r1664274493 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -257,7 +257,56 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) ->

Re: [PR] add shot circuit in BinaryExpr [datafusion]

2024-07-03 Thread via GitHub
Dandandan commented on code in PR #11247: URL: https://github.com/apache/datafusion/pull/11247#discussion_r1664281688 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -257,7 +257,56 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) ->

Re: [I] Error evaluating clause `where COL_BIGINT < 1e100` (Found by SQLancer-NoREC) [datafusion]

2024-07-03 Thread via GitHub
Lordworms commented on issue #11252: URL: https://github.com/apache/datafusion/issues/11252#issuecomment-2206370890 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [draft] Add `LogicalType`, try to support user-defined types [datafusion]

2024-07-03 Thread via GitHub
notfilippo commented on code in PR #11160: URL: https://github.com/apache/datafusion/pull/11160#discussion_r1664325625 ## datafusion-examples/examples/logical_type.rs: ## @@ -0,0 +1,108 @@ +use datafusion::datasource::TableProvider; +use datafusion::execution::context::SessionSt

Re: [PR] [draft] Add `LogicalType`, try to support user-defined types [datafusion]

2024-07-03 Thread via GitHub
notfilippo commented on PR #11160: URL: https://github.com/apache/datafusion/pull/11160#issuecomment-2206428554 > The biggest challenge of making this kind of change I think will be to manage the rollout and migration with downstream crates / make the transition as smooth as possible.

Re: [I] Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-07-03 Thread via GitHub
alamb closed issue #10881: Incorrect LEFT JOIN evaluation result on OR conditions URL: https://github.com/apache/datafusion/issues/10881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-07-03 Thread via GitHub
alamb commented on PR #11203: URL: https://github.com/apache/datafusion/pull/11203#issuecomment-2206442466 🚀 I agree it was time to merge Thanks again everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] fix: Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-07-03 Thread via GitHub
alamb merged PR #11203: URL: https://github.com/apache/datafusion/pull/11203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Count distinct errors with "... is declared as non-nullable but contains null values" [datafusion]

2024-07-03 Thread via GitHub
findepi commented on issue #4828: URL: https://github.com/apache/datafusion/issues/4828#issuecomment-2206505843 @jonmmease is this still reproducible, or should we close (ie we're updated)? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
comphead commented on code in PR #11232: URL: https://github.com/apache/datafusion/pull/11232#discussion_r1664385720 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -3844,6 +3851,103 @@ mod tests { Ok(()) } +pub fn build_table_struct( +struc

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
comphead commented on code in PR #11232: URL: https://github.com/apache/datafusion/pull/11232#discussion_r1664385720 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -3844,6 +3851,103 @@ mod tests { Ok(()) } +pub fn build_table_struct( +struc

Re: [I] Implement user defined planner for `date_part` [datafusion]

2024-07-03 Thread via GitHub
alamb closed issue #11220: Implement user defined planner for `date_part` URL: https://github.com/apache/datafusion/issues/11220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
comphead commented on code in PR #11232: URL: https://github.com/apache/datafusion/pull/11232#discussion_r1664388623 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1336,6 +1350,29 @@ physical_plan 10)--RepartitionExec: partitioning=RoundRobinBatch(2), input_pa

Re: [PR] Use upstream DataType::from_str in arrow-cast [datafusion]

2024-07-03 Thread via GitHub
alamb commented on code in PR #11254: URL: https://github.com/apache/datafusion/pull/11254#discussion_r1664408993 ## datafusion/functions/src/core/arrow_cast.rs: ## @@ -139,767 +137,9 @@ fn data_type_from_args(args: &[Expr]) -> Result { &args[1] ); };

Re: [PR] Implement user defined planner for extract [datafusion]

2024-07-03 Thread via GitHub
alamb commented on PR #11215: URL: https://github.com/apache/datafusion/pull/11215#issuecomment-2206578988 Thanks again @xinlifoobar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Implement user defined planner for extract [datafusion]

2024-07-03 Thread via GitHub
alamb merged PR #11215: URL: https://github.com/apache/datafusion/pull/11215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] [Epic] Complete pulling out special SQL planning from the Sql Parser [datafusion]

2024-07-03 Thread via GitHub
samuelcolvin commented on issue #11207: URL: https://github.com/apache/datafusion/issues/11207#issuecomment-2206608686 Given how much `UserDefinedSQLPlanner` is being used for existing stuff within datafusion, perhaps it should be called just `SQLPlanner` or `CustomSQLPlanner`? -- This i

[PR] Use upstream DataType::from_str in arrow-cast [datafusion]

2024-07-03 Thread via GitHub
alamb opened a new pull request, #11254: URL: https://github.com/apache/datafusion/pull/11254 Draft until https://github.com/apache/arrow-rs/pull/5994 is released (likely `52.2.0`) ## Which issue does this PR close? N/A ## Rationale for this change @samuelcolvin

Re: [PR] Support DuckDB style stuct syntax [datafusion]

2024-07-03 Thread via GitHub
jonahgao commented on code in PR #11214: URL: https://github.com/apache/datafusion/pull/11214#discussion_r1664436314 ## datafusion/sql/src/expr/mod.rs: ## @@ -619,10 +620,45 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } },

Re: [PR] Add LogicalPlanSignature and use in the optimizer loop [datafusion]

2024-07-03 Thread via GitHub
mslapek commented on code in PR #5623: URL: https://github.com/apache/datafusion/pull/5623#discussion_r1664441817 ## datafusion/optimizer/src/optimizer.rs: ## @@ -330,15 +324,14 @@ impl Optimizer { } log_plan(&format!("Optimized plan (pass {i})"), &new_

Re: [PR] Register SQL planners in `SessionState` constructor [datafusion]

2024-07-03 Thread via GitHub
dharanad commented on code in PR #11253: URL: https://github.com/apache/datafusion/pull/11253#discussion_r1664269641 ## datafusion/core/src/execution/session_state.rs: ## @@ -595,7 +609,9 @@ impl SessionState { let mut provider = SessionContextProvider {

Re: [I] Register SQL planners in `SessionState::new()` [datafusion]

2024-07-03 Thread via GitHub
dharanad commented on issue #11216: URL: https://github.com/apache/datafusion/issues/11216#issuecomment-2206731237 > > hello @jayzhan211 , i would like to work on this. QQ: Is this issue any straight as it sounds or there is more to it? > > A little more than it. Ideally we need to up

[PR] Remove unmaintained python pre-commit configuration [datafusion]

2024-07-03 Thread via GitHub
findepi opened a new pull request, #11255: URL: https://github.com/apache/datafusion/pull/11255 The file seems to be copied over from apache-arrow. The contained commands do not seem applicable to datafusion. -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Improve performance of TPC-DS q72 [datafusion-comet]

2024-07-03 Thread via GitHub
parthchandra commented on issue #622: URL: https://github.com/apache/datafusion-comet/issues/622#issuecomment-2206748349 Spark produces the worst possible query plan for q72 which amplifies the difference in performance. The C2R overhead for comet is amplified because the conversion happen

[PR] chore: Rename shuffle write metric [datafusion-comet]

2024-07-03 Thread via GitHub
andygrove opened a new pull request, #624: URL: https://github.com/apache/datafusion-comet/pull/624 ## Which issue does this PR close? N/A ## Rationale for this change I was comparing Spark vs Comet in Spark UI and found the metrics confusing to compare.

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-07-03 Thread via GitHub
findepi commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2206748672 I probably missed some prior discussion. Do we want `date_bin` syntax? Can we support more standard `date_trunc('day', timestamp_tz)`? -- This is an automated message from t

[PR] Infer count() aggregation is not null [datafusion]

2024-07-03 Thread via GitHub
findepi opened a new pull request, #11256: URL: https://github.com/apache/datafusion/pull/11256 `count([DISTINCT] [expr])` aggregate function never returns null. Infer non-nullness of such aggregate expression. This allows elimination of the HAVING filter for a query such as SELE

Re: [PR] Infer count() aggregation is not null [datafusion]

2024-07-03 Thread via GitHub
findepi commented on PR #11256: URL: https://github.com/apache/datafusion/pull/11256#issuecomment-2206765177 this is a draft because there are no tests in this PR. still review feedback would be very welcomed -- am i going in the right direction? -- This is an automated message

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
viirya commented on PR #11232: URL: https://github.com/apache/datafusion/pull/11232#issuecomment-2206779271 > @viirya Good suggestion. I tried added some sqllogictest, they work. But as far as I can tell they are not using HashJoinExec (even if the exact same query joining on only field is)

Re: [PR] feat: Use unified allocator for execution iterators [datafusion-comet]

2024-07-03 Thread via GitHub
andygrove commented on code in PR #613: URL: https://github.com/apache/datafusion-comet/pull/613#discussion_r1664488466 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQuerySuite.scala: ## @@ -158,6 +158,11 @@ class CometTPCDSQuerySuite conf.set(CometConf.COMET_EXEC_

Re: [PR] feat: Use unified allocator for execution iterators [datafusion-comet]

2024-07-03 Thread via GitHub
viirya commented on code in PR #613: URL: https://github.com/apache/datafusion-comet/pull/613#discussion_r1664503124 ## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQuerySuite.scala: ## @@ -158,6 +158,11 @@ class CometTPCDSQuerySuite conf.set(CometConf.COMET_EXEC_ALL

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-03 Thread via GitHub
tshauck commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2206904296 @jayzhan211 thanks for the feedback. I like the idea because it would seem to avoid having to update the count function to accept zero args. The current approach is nice at least in

Re: [I] Spark-compatible CAST operation [datafusion]

2024-07-03 Thread via GitHub
Blizzara commented on issue #11201: URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2206915698 There's definitely difference in some functions as well, but those have been mostly minor so far (I haven't gotten to regexps or date/time stuff yet), like returning null vs th

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
eejbyfeldt commented on PR #11232: URL: https://github.com/apache/datafusion/pull/11232#issuecomment-2206921868 > > @viirya Good suggestion. I tried added some sqllogictest, they work. But as far as I can tell they are not using HashJoinExec (even if the exact same query joining on only fie

Re: [PR] Add LogicalPlanSignature and use in the optimizer loop [datafusion]

2024-07-03 Thread via GitHub
findepi commented on code in PR #5623: URL: https://github.com/apache/datafusion/pull/5623#discussion_r1664596130 ## datafusion/optimizer/src/optimizer.rs: ## @@ -330,15 +324,14 @@ impl Optimizer { } log_plan(&format!("Optimized plan (pass {i})"), &new_

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
eejbyfeldt commented on code in PR #11232: URL: https://github.com/apache/datafusion/pull/11232#discussion_r1664602748 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -3844,6 +3851,103 @@ mod tests { Ok(()) } +pub fn build_table_struct( +str

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
eejbyfeldt commented on code in PR #11232: URL: https://github.com/apache/datafusion/pull/11232#discussion_r1664603111 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1336,6 +1350,29 @@ physical_plan 10)--RepartitionExec: partitioning=RoundRobinBatch(2), input_

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
eejbyfeldt commented on PR #11232: URL: https://github.com/apache/datafusion/pull/11232#issuecomment-2206966811 Reading the explain more closely I see that the reason for it not using HashJoin seems to be that it fails to extract an equi-join condition when it is joining on structs. -- T

Re: [PR] Move basic SQL query examples to user guide [datafusion]

2024-07-03 Thread via GitHub
alamb commented on PR #11217: URL: https://github.com/apache/datafusion/pull/11217#issuecomment-2206972653 Thank you for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Move basic SQL query examples to user guide [datafusion]

2024-07-03 Thread via GitHub
alamb merged PR #11217: URL: https://github.com/apache/datafusion/pull/11217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] datafusion-examples CI run is failing: `final link failed: No space left on device` [datafusion]

2024-07-03 Thread via GitHub
alamb closed issue #11210: datafusion-examples CI run is failing: `final link failed: No space left on device` URL: https://github.com/apache/datafusion/issues/11210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Support FixedSizedBinaryArray Parquet Data Page Statistics [datafusion]

2024-07-03 Thread via GitHub
alamb merged PR #11200: URL: https://github.com/apache/datafusion/pull/11200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support `FixedSizedBinaryArray` Parquet Data Page Statistics [datafusion]

2024-07-03 Thread via GitHub
alamb closed issue #11184: Support `FixedSizedBinaryArray` Parquet Data Page Statistics URL: https://github.com/apache/datafusion/issues/11184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Fix hash join for nested types [datafusion]

2024-07-03 Thread via GitHub
eejbyfeldt commented on PR #11232: URL: https://github.com/apache/datafusion/pull/11232#issuecomment-2206985818 Seems like we need to add struct here: https://github.com/eejbyfeldt/datafusion/blob/2b851c5bbdaef7d1d78d91833d2a7e8d94809bc8/datafusion/expr/src/utils.rs#L830-L833 for hash join

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
alamb merged PR #11224: URL: https://github.com/apache/datafusion/pull/11224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support Map as a ScalarValue [datafusion]

2024-07-03 Thread via GitHub
alamb closed issue #11128: Support Map as a ScalarValue URL: https://github.com/apache/datafusion/issues/11128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] Implement ScalarValue::Map [datafusion]

2024-07-03 Thread via GitHub
alamb commented on PR #11224: URL: https://github.com/apache/datafusion/pull/11224#issuecomment-2206989059 Thanks @goldmedal and @Blizzara 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [Epic] Complete pulling out special SQL planning from the Sql Parser [datafusion]

2024-07-03 Thread via GitHub
alamb commented on issue #11207: URL: https://github.com/apache/datafusion/issues/11207#issuecomment-2206991217 > Given how much `UserDefinedSQLPlanner` is being used for existing stuff within datafusion, perhaps it should be called just `SQLPlanner` or `CustomSQLPlanner`? I agree

  1   2   >