Re: [PR] Enable Projection Pushdown Optimization for Recursive CTEs [datafusion]

2025-08-07 Thread via GitHub
kosiew commented on code in PR #16696: URL: https://github.com/apache/datafusion/pull/16696#discussion_r2262061331 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -826,6 +840,34 @@ pub fn is_projection_unnecessary( )) } +fn plan_contains_subquery_alias(pla

Re: [PR] Enable Projection Pushdown Optimization for Recursive CTEs [datafusion]

2025-08-07 Thread via GitHub
kosiew commented on code in PR #16696: URL: https://github.com/apache/datafusion/pull/16696#discussion_r2262061331 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -826,6 +840,34 @@ pub fn is_projection_unnecessary( )) } +fn plan_contains_subquery_alias(pla

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-08-07 Thread via GitHub
berkaysynnada commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-3166735927 > We found a regression related to this PR for `string_agg` not respecting `ORDER BY` > > * [`string_agg` does not respect `ORDER BY` on `49.0.0`  #17011](https://github.

Re: [I] Re-export `object_store` crate in DataFusion for easier downstream consumption [datafusion]

2025-08-07 Thread via GitHub
kosiew closed issue #17066: Re-export `object_store` crate in DataFusion for easier downstream consumption URL: https://github.com/apache/datafusion/issues/17066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Re-export `object_store` crate via DataFusion Core and Common [datafusion]

2025-08-07 Thread via GitHub
kosiew merged PR #17070: URL: https://github.com/apache/datafusion/pull/17070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Re-export `object_store` crate via DataFusion Core and Common [datafusion]

2025-08-07 Thread via GitHub
kosiew commented on PR #17070: URL: https://github.com/apache/datafusion/pull/17070#issuecomment-3166702525 Thanks @findepi , @alamb for your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Incorrect implementation of `PartialOrd` for `ScalarUDF`, `WindowUDF` and `AggregateUDF` [datafusion]

2025-08-07 Thread via GitHub
findepi commented on issue #17064: URL: https://github.com/apache/datafusion/issues/17064#issuecomment-3166680653 Removing the `PartialOrd` would be the best way to go. This would probably mean removing PartialOrd from Expr, perhaps reverting - https://github.com/apache/datafusion/pull

Re: [PR] Add PartialOrd for the DF subfields/structs for the WindowFunction expr [datafusion]

2025-08-07 Thread via GitHub
findepi commented on PR #12421: URL: https://github.com/apache/datafusion/pull/12421#issuecomment-3166678656 > Can we not implement PartialOrd for function? I don't think it make sense to have ordering concept for the function. I agree. What's worse, the current implementation

Re: [PR] Deprecate ScalarUDF::is_nullable [datafusion]

2025-08-07 Thread via GitHub
findepi merged PR #17074: URL: https://github.com/apache/datafusion/pull/17074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
findepi commented on code in PR #17067: URL: https://github.com/apache/datafusion/pull/17067#discussion_r2261960564 ## datafusion/ffi/src/udaf/mod.rs: ## @@ -384,6 +385,19 @@ pub struct ForeignAggregateUDF { unsafe impl Send for ForeignAggregateUDF {} unsafe impl Sync for Fore

Re: [PR] Deprecate ScalarUDF::is_nullable [datafusion]

2025-08-07 Thread via GitHub
findepi commented on PR #17074: URL: https://github.com/apache/datafusion/pull/17074#issuecomment-316368 > the new function does not by default delegate to the previously-internally-used function Typically the new function delegates to the old one and the old one gets deprecated.

Re: [PR] Remove elements deprecated since v 45 [datafusion]

2025-08-07 Thread via GitHub
findepi merged PR #17075: URL: https://github.com/apache/datafusion/pull/17075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
findepi commented on code in PR #17067: URL: https://github.com/apache/datafusion/pull/17067#discussion_r2261999283 ## datafusion/expr/src/udf_eq.rs: ## @@ -0,0 +1,181 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [I] Derive UDAF (`AggregateUDFImpl`) equality from PartialEq, Hash [datafusion]

2025-08-07 Thread via GitHub
findepi closed issue #16866: Derive UDAF (`AggregateUDFImpl`) equality from PartialEq, Hash URL: https://github.com/apache/datafusion/issues/16866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
findepi merged PR #17067: URL: https://github.com/apache/datafusion/pull/17067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Add support for `UPDATE ... LIMIT ...` [datafusion-sqlparser-rs]

2025-08-07 Thread via GitHub
iffyio commented on PR #1991: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1991#issuecomment-3166640940 @xtuc could you merge in the latest from main in order to resolve the CI lint error? -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Improve MySQL `CREATE TRIGGER` parsing [datafusion-sqlparser-rs]

2025-08-07 Thread via GitHub
iffyio merged PR #1998: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Validate the memory consumption in SPM created by multi level merge [datafusion]

2025-08-07 Thread via GitHub
ding-young commented on PR #17029: URL: https://github.com/apache/datafusion/pull/17029#issuecomment-3166548954 > How much is the difference for the fuzz tests I added that check memory constrained envs? as it only tests couple of simple columns that are easier to reason about. I als

Re: [PR] Validate the memory consumption in SPM created by multi level merge [datafusion]

2025-08-07 Thread via GitHub
ding-young commented on PR #17029: URL: https://github.com/apache/datafusion/pull/17029#issuecomment-3166544088 Update: I took alternative approach similar to what @2010YOUY01 suggested. > I have an alternative idea to make this validation more fine-grained: Let's say there are 3 spills

Re: [PR] fix: [iceberg] more fixes for Iceberg integration APIs. [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra commented on code in PR #2078: URL: https://github.com/apache/datafusion-comet/pull/2078#discussion_r2261908326 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -453,4 +457,60 @@ private static LogicalTypeAnnotation reconstructLogicalType(

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-07 Thread via GitHub
adriangb commented on code in PR #17076: URL: https://github.com/apache/datafusion/pull/17076#discussion_r2261897103 ## datafusion-examples/examples/csv_json_opener.rs: ## @@ -68,10 +68,9 @@ async fn csv_opener() -> Result<()> { let config = CsvSource::new(true, b',', b'"')

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-07 Thread via GitHub
adriangb commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3166507414 I think this is a positive change. The fact that there is an `Option` that _always_ gets `expect()`ed seems like a smell that something is wrong. The diff is also (currently) +35 /

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-07 Thread via GitHub
adriangb commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3166512899 @blaginin would love your input on this change / maybe a bit of the whole plan! -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-08-07 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-3166494580 Fixed @xudong963, thanks so much for your review! I will leave this up until early next week to see if we get any more feedback. -- This is an automated message from the Ap

Re: [PR] Chore: refactor datetime related expressions out of QueryPlanSerde [datafusion-comet]

2025-08-07 Thread via GitHub
codecov-commenter commented on PR #2085: URL: https://github.com/apache/datafusion-comet/pull/2085#issuecomment-3166492901 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2085?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
coderfender commented on code in PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#discussion_r2261864821 ## native/core/src/execution/planner.rs: ## @@ -878,6 +879,7 @@ impl PhysicalPlanner { return_type: Option<&spark_expression::DataType>,

Re: [PR] chore: Refactor string expression serde, part 2 [datafusion-comet]

2025-08-07 Thread via GitHub
codecov-commenter commented on PR #2097: URL: https://github.com/apache/datafusion-comet/pull/2097#issuecomment-3166479464 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2097?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
coderfender commented on code in PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#discussion_r2261864821 ## native/core/src/execution/planner.rs: ## @@ -878,6 +879,7 @@ impl PhysicalPlanner { return_type: Option<&spark_expression::DataType>,

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on code in PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#discussion_r2261863173 ## native/core/src/execution/planner.rs: ## @@ -878,6 +879,7 @@ impl PhysicalPlanner { return_type: Option<&spark_expression::DataType>, op

Re: [PR] Chore: refactor datetime related expressions out of QueryPlanSerde [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2085: URL: https://github.com/apache/datafusion-comet/pull/2085#issuecomment-3166458797 > @CuteChuanChuan I tried fixing a merge conflict through the GitHub UI but introduced an error: > > ``` > Error: ] /__w/datafusion-comet/datafusion-comet/spark/src/

Re: [PR] chore: Refactor string expression serde [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove closed pull request #2065: chore: Refactor string expression serde URL: https://github.com/apache/datafusion-comet/pull/2065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Chore: refactor datetime related expressions out of QueryPlanSerde [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2085: URL: https://github.com/apache/datafusion-comet/pull/2085#issuecomment-3166456871 @CuteChuanChuan I tried fixing a merge conflict through the GitHub UI but introduced an error: ``` Error: ] /__w/datafusion-comet/datafusion-comet/spark/src/main/sca

[PR] chore: Refactor string serde, part 2 [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove opened a new pull request, #2097: URL: https://github.com/apache/datafusion-comet/pull/2097 ## Which issue does this PR close? N/A ## Rationale for this change Make code easier to maintain. ## What changes are included in this PR?

Re: [PR] feat: include end token in ALTER TABLE statement for span calculation [datafusion-sqlparser-rs]

2025-08-07 Thread via GitHub
IndexSeek commented on PR #1999: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1999#issuecomment-3166372942 I'm not sure about the linting violations, I addressed one, but it seemed unrelated to the change. -- This is an automated message from the Apache Git Service. To re

[PR] feat: include end token in ALTER TABLE statement for span calculation [datafusion-sqlparser-rs]

2025-08-07 Thread via GitHub
IndexSeek opened a new pull request, #1999: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1999 Resolves #1858 in span calculation for `ALTER TABLE` statements. The span for `ALTER TABLE` statements was ending too early and not including the semicolon. -- This is an automat

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3166312205 @coderfender I took a first pass through this, and I think this is looking good :+1: -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on code in PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#discussion_r2261759137 ## native/spark-expr/src/math_funcs/checked_arithmetic.rs: ## @@ -0,0 +1,129 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on code in PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#discussion_r2261755864 ## native/core/src/execution/planner.rs: ## @@ -231,7 +231,7 @@ impl PhysicalPlanner { ) -> Result, ExecutionError> { match spark_expr.expr_str

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3166305440 @coderfender looks like there are some clippy issues to be resolved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-07 Thread via GitHub
coderfender commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3166295484 @andygrove , Here is the summary of changes : 1. Spark's Try eval mode returns NULL in case there is a computation failure. Note that this is only supported / useful

[I] rpad expression panics if length input is not a literal value [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove opened a new issue, #2096: URL: https://github.com/apache/datafusion-comet/issues/2096 ### Describe the bug The rpad expression panics if the `length` input is not a literal value. For example, `rpad(string_column, 10)` works, but `rpad(string_column, int_column)` pan

Re: [PR] minor: CometBuffer code cleanup [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove merged PR #2090: URL: https://github.com/apache/datafusion-comet/pull/2090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Use Cached Metadata for ListingTable Statistics [datafusion]

2025-08-07 Thread via GitHub
jonathanc-n commented on PR #17022: URL: https://github.com/apache/datafusion/pull/17022#issuecomment-3166227299 Some changes might be needed after the removal of `cache_metadata` in #17062, should be small, we just no longer need to do the if check on cache metadata -- This is an automa

Re: [I] Tracking PR to update useDecimal128 in Iceberg [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra commented on issue #2095: URL: https://github.com/apache/datafusion-comet/issues/2095#issuecomment-3166216056 @hsiang-c fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[I] Tracking PR to update useDecimal128 in Iceberg [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra opened a new issue, #2095: URL: https://github.com/apache/datafusion-comet/issues/2095 Tracking https://github.com/apache/iceberg/pull/13665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Fix equality of parametrizable ArrayAgg function [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #17065: URL: https://github.com/apache/datafusion/pull/17065#issuecomment-3165553080 Thanks @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Tracking PR to update Iceberg to enable Comet native execution with Iceberg [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra commented on issue #2094: URL: https://github.com/apache/datafusion-comet/issues/2094#issuecomment-3166107295 @huang-hsiang, fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Tracking PR to update Iceberg to enable Comet native execution with Iceberg [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra opened a new issue, #2094: URL: https://github.com/apache/datafusion-comet/issues/2094 Some of the changes in https://github.com/apache/iceberg/pull/13378 are to enable Comet native execution in with Iceberg. These changes should be in an independent PR -- This is an automat

[I] Tracking PR to update Iceberg for Parquet shading issues [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra opened a new issue, #2093: URL: https://github.com/apache/datafusion-comet/issues/2093 We need a follow up PR to https://github.com/apache/iceberg/pull/13378 that updates Iceberg after the changes introduced in https://github.com/apache/datafusion-comet/pull/2078 -- This is

Re: [I] Comet iceberg requires lazy materialization to be turned off [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra commented on issue #2091: URL: https://github.com/apache/datafusion-comet/issues/2091#issuecomment-3166091812 Also see https://github.com/apache/datafusion-comet/issues/2092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[I] Document configuration flags needed for Comet Iceberg to work correctly [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra opened a new issue, #2092: URL: https://github.com/apache/datafusion-comet/issues/2092 We need to set - ``` "spark.comet.use.lazyMaterialization" -> "false" "spark.comet.schemaEvolution.enabled" -> "true" "spark.sql.iceberg.parquet.reader-type" -> "COMET" ``` T

[I] Comet iceberg requires lazy materialization to be turned off [datafusion-comet]

2025-08-07 Thread via GitHub
parthchandra opened a new issue, #2091: URL: https://github.com/apache/datafusion-comet/issues/2091 ### Describe the bug Running Iceberg integration tests with Comet enabled causes several tests to hit a NullPointerException in `CometDelegateVector.getUTF8String`. This is because the

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-07 Thread via GitHub
adamreeve commented on PR #16779: URL: https://github.com/apache/datafusion/pull/16779#issuecomment-3166069291 @mbutrovich I'm curious if you can share any details about how you want to use this. Do you need compatibility with PyArrow or Spark, and would the [parquet-key-management](https:/

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-07 Thread via GitHub
comphead commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2261309699 ## native/proto/src/proto/types.proto: ## @@ -0,0 +1,41 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] feat: Add comprehensive safety improvements to CometBuffer [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2089: URL: https://github.com/apache/datafusion-comet/pull/2089#issuecomment-3165388990 I am running benchmarks to see if these changes impact performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #17067: URL: https://github.com/apache/datafusion/pull/17067#discussion_r2261298736 ## datafusion/expr/src/udf_eq.rs: ## @@ -0,0 +1,181 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. Se

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-07 Thread via GitHub
comphead commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2254659597 ## native/core/src/execution/planner.rs: ## @@ -474,6 +480,240 @@ impl PhysicalPlanner { )))

Re: [PR] Deprecate ScalarUDF::is_nullable [datafusion]

2025-08-07 Thread via GitHub
Blizzara commented on PR #17074: URL: https://github.com/apache/datafusion/pull/17074#issuecomment-3166003379 Thanks! The deprecation is sure an improvement, but I wonder if overall this is_nullable -> return type from args change suffers from the same issue as the earlier ones, namely that

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-08-07 Thread via GitHub
BlakeOrth commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3165993099 I've been investigating performance when using the `ListingTable` with remote storage, and since `datafusion-cli` ultimately uses the `ListingTable` I'm curious if my findings

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
findepi commented on code in PR #17067: URL: https://github.com/apache/datafusion/pull/17067#discussion_r2261325924 ## datafusion/expr/src/udf_eq.rs: ## @@ -0,0 +1,181 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] (feat) add support for ArrayMin scalar function [datafusion-comet]

2025-08-07 Thread via GitHub
comphead commented on PR #1944: URL: https://github.com/apache/datafusion-comet/pull/1944#issuecomment-3165889517 I think this PR is good, @dharanad please modify `expression.md` to emphasize this functions is supported -- This is an automated message from the Apache Git Service. To resp

Re: [I] [iceberg] testCopyOnWriteMergeWithoutShufflesWithPredicate failure (MutableBuffer LayoutError) [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on issue #2087: URL: https://github.com/apache/datafusion-comet/issues/2087#issuecomment-3165880428 I suspect the root cause is trying to create a mutable buffer with a size greater than `isize::MAX`. From `arrow-buffer` `MutableBuffer`: ```rust pub

Re: [PR] feat: Use Cached Metadata for ListingTable Statistics [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #17022: URL: https://github.com/apache/datafusion/pull/17022#issuecomment-3165841759 @nuno-faria and @jonathanc-n -- do you have time to review this PR as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Docs: Add Tuning Guide for larger-than-memory queries [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #17069: URL: https://github.com/apache/datafusion/pull/17069#discussion_r2261433216 ## dev/update_config_docs.sh: ## @@ -149,6 +149,37 @@ SET datafusion.execution.target_partitions = '1'; [`ListingTable`]: https://docs.rs/datafusion/latest/dataf

Re: [PR] Add ExecutionPlan::reset_state [datafusion]

2025-08-07 Thread via GitHub
alamb merged PR #17028: URL: https://github.com/apache/datafusion/pull/17028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-07 Thread via GitHub
comphead commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2261455745 ## native/core/src/execution/planner.rs: ## @@ -474,6 +480,240 @@ impl PhysicalPlanner { )))

Re: [PR] Update README to state that this project is no longer maintained [datafusion-ray]

2025-08-07 Thread via GitHub
andygrove merged PR #88: URL: https://github.com/apache/datafusion-ray/pull/88 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] fix: [iceberg] more fixes for Iceberg integration APIs. [datafusion-comet]

2025-08-07 Thread via GitHub
huaxingao commented on code in PR #2078: URL: https://github.com/apache/datafusion-comet/pull/2078#discussion_r2261441438 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -453,4 +457,60 @@ private static LogicalTypeAnnotation reconstructLogicalType( t

[PR] minor: CometBuffer code cleanup [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove opened a new pull request, #2090: URL: https://github.com/apache/datafusion-comet/pull/2090 ## Which issue does this PR close? N/A ## Rationale for this change ## What changes are included in this PR? - move test code to test modul

Re: [I] Shared `DynamicFilterPhysicalExpr` causes recursive queries to fail [datafusion]

2025-08-07 Thread via GitHub
alamb closed issue #16998: Shared `DynamicFilterPhysicalExpr` causes recursive queries to fail URL: https://github.com/apache/datafusion/issues/16998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add ExecutionPlan::reset_state [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #17028: URL: https://github.com/apache/datafusion/pull/17028#issuecomment-3165772813 > done! #17060 > > should we wait for anything else before merging this? Nope, I think we are good -- let's merge it! -- This is an automated message from the Apache Git

Re: [PR] feat: add `datafusion-physical-adapter`, implement predicate adaptation missing fields of structs [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #16589: URL: https://github.com/apache/datafusion/pull/16589#discussion_r2261411619 ## datafusion/physical-expr-adapter/README.md: ## @@ -0,0 +1,14 @@ +# DataFusion Physical Expression Adapter + +This crate provides physical expression schema adapta

Re: [PR] Test and fix for issue #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions [datafusion]

2025-08-07 Thread via GitHub
alamb closed pull request #17016: Test and fix for issue #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions URL: https://github.com/apache/datafusion/pull/17016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] minor: CometBuffer code cleanup [datafusion-comet]

2025-08-07 Thread via GitHub
codecov-commenter commented on PR #2090: URL: https://github.com/apache/datafusion-comet/pull/2090#issuecomment-3165729700 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2090?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] minor: CometBuffer code cleanup [datafusion-comet]

2025-08-07 Thread via GitHub
mbutrovich commented on code in PR #2090: URL: https://github.com/apache/datafusion-comet/pull/2090#discussion_r2261391774 ## native/core/src/common/buffer.rs: ## @@ -202,7 +165,7 @@ impl CometBuffer { /// with 0. if `new_capacity` is less than the current capacity of this

Re: [PR] minor: CometBuffer improvements [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove closed pull request #2089: minor: CometBuffer improvements URL: https://github.com/apache/datafusion-comet/pull/2089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix(parquet): write single file if option is set [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #17009: URL: https://github.com/apache/datafusion/pull/17009#issuecomment-3165618492 I restarted the checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-07 Thread via GitHub
adamreeve commented on code in PR #16779: URL: https://github.com/apache/datafusion/pull/16779#discussion_r2250195720 ## datafusion-examples/examples/parquet_encrypted_with_kms.rs: ## @@ -0,0 +1,288 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] minor: CometBuffer code cleanup [datafusion-comet]

2025-08-07 Thread via GitHub
mbutrovich commented on code in PR #2090: URL: https://github.com/apache/datafusion-comet/pull/2090#discussion_r2261391774 ## native/core/src/common/buffer.rs: ## @@ -202,7 +165,7 @@ impl CometBuffer { /// with 0. if `new_capacity` is less than the current capacity of this

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-08-07 Thread via GitHub
alamb merged PR #16937: URL: https://github.com/apache/datafusion/pull/16937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-07 Thread via GitHub
mbutrovich commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2261258597 ## native/core/src/execution/planner.rs: ## @@ -474,6 +480,240 @@ impl PhysicalPlanner { )))

Re: [PR] feat: Use Cached Metadata for ListingTable Statistics [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #17022: URL: https://github.com/apache/datafusion/pull/17022#discussion_r2261307237 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -376,41 +391,123 @@ mod tests { ))); let (meta, _files) = store_parquet(vec![batch1

Re: [I] Support Extension Types / User Defined Types in DataFusion [datafusion]

2025-08-07 Thread via GitHub
alamb commented on issue #12644: URL: https://github.com/apache/datafusion/issues/12644#issuecomment-3165632416 > Is this the end state we want to achieve? In my mind it is better than what is currently possible. I agree we can come up with other better designs too > I agree th

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-07 Thread via GitHub
mbutrovich commented on code in PR #16779: URL: https://github.com/apache/datafusion/pull/16779#discussion_r2261383521 ## datafusion-examples/examples/parquet_encrypted_with_kms.rs: ## @@ -0,0 +1,301 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
timsaucer commented on code in PR #17067: URL: https://github.com/apache/datafusion/pull/17067#discussion_r2261360132 ## datafusion/ffi/src/udaf/mod.rs: ## @@ -384,6 +385,19 @@ pub struct ForeignAggregateUDF { unsafe impl Send for ForeignAggregateUDF {} unsafe impl Sync for Fo

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #16779: URL: https://github.com/apache/datafusion/pull/16779#discussion_r2261359928 ## datafusion-examples/examples/parquet_encrypted_with_kms.rs: ## @@ -0,0 +1,301 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] minor: CometBuffer improvements [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2089: URL: https://github.com/apache/datafusion-comet/pull/2089#issuecomment-3165656691 I will create a new simpler version of this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Derive UDAF equality from Eq, Hash [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #17067: URL: https://github.com/apache/datafusion/pull/17067#discussion_r2261336715 ## datafusion/expr/src/udf_eq.rs: ## @@ -0,0 +1,181 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. Se

Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #17018: URL: https://github.com/apache/datafusion/pull/17018#issuecomment-3165625411 > Not working , Perhaps you need to run `apt-get update` first to update the package registry -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Fix equality of parametrizable ArrayAgg function [datafusion]

2025-08-07 Thread via GitHub
findepi merged PR #17065: URL: https://github.com/apache/datafusion/pull/17065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] Equivalence properties lost if projection is pushed down in FileScanConfig [datafusion]

2025-08-07 Thread via GitHub
friendlymatthew commented on issue #17077: URL: https://github.com/apache/datafusion/issues/17077#issuecomment-3165593467 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: Add comprehensive safety improvements to CometBuffer [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on code in PR #2089: URL: https://github.com/apache/datafusion-comet/pull/2089#discussion_r2261314132 ## native/core/src/common/buffer.rs: ## @@ -31,16 +31,15 @@ use std::{ /// the unique owner for the memory it wraps. The holder of this buffer can read or

Re: [I] [Parquet Metadata Cache] remove `datafusion.execution.parquet.cache_metadata` config [datafusion]

2025-08-07 Thread via GitHub
alamb closed issue #17047: [Parquet Metadata Cache] remove `datafusion.execution.parquet.cache_metadata` config URL: https://github.com/apache/datafusion/issues/17047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

2025-08-07 Thread via GitHub
comphead commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-316417 Hey, sorry I missed this, this join is quite interesting concept, I'm planning to finish review #16996 this week, and switch to this PR next. -- This is an automated message from

Re: [PR] fix: Remove `datafusion.execution.parquet.cache_metadata` config [datafusion]

2025-08-07 Thread via GitHub
alamb merged PR #17062: URL: https://github.com/apache/datafusion/pull/17062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix equality of parametrizable ArrayAgg function [datafusion]

2025-08-07 Thread via GitHub
alamb commented on code in PR #17065: URL: https://github.com/apache/datafusion/pull/17065#discussion_r2261291705 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -227,6 +227,8 @@ impl AggregateUDFImpl for ArrayAgg { fn documentation(&self) -> Option<&Documentation

Re: [PR] feat: Add comprehensive safety improvements to CometBuffer [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on PR #2089: URL: https://github.com/apache/datafusion-comet/pull/2089#issuecomment-3165524853 I ran TPC-H and confirmed that there is no performance regression -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3165541527 Thanks @jonathanc-n -- unfortunately I am not likely to have time to review this as my focus hasn't been on the join implementation @comphead do you know anyone who is more focus

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-08-07 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-3165532117 We found a regression related to this PR for `string_agg` not respecting `ORDER BY` - https://github.com/apache/datafusion/issues/17011 @nuno-faria and @findepi have prepared

Re: [PR] feat: Add comprehensive safety improvements to CometBuffer [datafusion-comet]

2025-08-07 Thread via GitHub
andygrove commented on code in PR #2089: URL: https://github.com/apache/datafusion-comet/pull/2089#discussion_r2261272582 ## native/core/src/common/buffer.rs: ## @@ -71,25 +71,6 @@ impl CometBuffer { } } -pub fn from_ptr(ptr: *const u8, len: usize, capacity:

  1   2   3   >