Re: [I] sqllogictest tests fail when run locally [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 closed issue #16250: sqllogictest tests fail when run locally URL: https://github.com/apache/datafusion/issues/16250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] feat(small): Add `BaselineMetrics` to `generate_series()` table function [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 opened a new pull request, #16255: URL: https://github.com/apache/datafusion/pull/16255 ## Which issue does this PR close? - Closes #. ## Rationale for this change Currently `generate_series()` does not track metrics, see the result in `datafusion-

Re: [I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-04 Thread via GitHub
chenkovsky commented on issue #16254: URL: https://github.com/apache/datafusion/issues/16254#issuecomment-2942717364 In spark, it seems that it's hardcoded for NaN https://github.com/apache/spark/blob/4c69d0df96cb379cdf3f63331eae6ebfece008bd/docs/sql-ref-datatypes.md?plain=1#L297 -- This

Re: [PR] Adjust slttest to pass without RUST_BACKTRACE enabled [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 merged PR #16251: URL: https://github.com/apache/datafusion/pull/16251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Inconsistency with count distinct on NaN values [datafusion]

2025-06-04 Thread via GitHub
chenkovsky commented on issue #16254: URL: https://github.com/apache/datafusion/issues/16254#issuecomment-2942703718 after debugging, I found that the root cause is here https://github.com/apache/datafusion/blob/448c985ebbfbb24b0fdba5b9f18a701a6275188a/datafusion/physical-plan/src/aggregates

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2942862579 > > I am onboard with the approach in this PR, and it looks good to me overall. Just needs some finishing touches: > > > > * To avoid such a large diff (which is mostly pla

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128014148 ## datafusion/physical-plan/src/memory.rs: ## @@ -139,13 +140,18 @@ pub trait LazyBatchGenerator: Send + Sync + fmt::Debug + fmt::Display { /// /// This pla

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128017451 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4702,7 +4702,8 @@ physical_plan 01)CrossJoinExec 02)--DataSourceExec: partitions=1, partition_sizes=[

[PR] fix: NaN semantics in GROUP BY [datafusion]

2025-06-04 Thread via GitHub
chenkovsky opened a new pull request, #16256: URL: https://github.com/apache/datafusion/pull/16256 ## Which issue does this PR close? - Closes #16254. ## Rationale for this change NaN != NaN ## What changes are included in this PR? Use arrow is_eq to test eq

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128017451 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4702,7 +4702,8 @@ physical_plan 01)CrossJoinExec 02)--DataSourceExec: partitions=1, partition_sizes=[

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2128032920 ## datafusion/physical-optimizer/src/wrap_leaves_cancellation.rs: ## @@ -0,0 +1,138 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [I] Reduce page metadata loading to only what is necessary for query execution in ParquetOpen [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on issue #16200: URL: https://github.com/apache/datafusion/issues/16200#issuecomment-2938913313 Thank you @alamb @adriangb @etseidl , i agree, so first step may be to not copying the page index values into the rust structures for columns we don't need in the query.

Re: [I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-04 Thread via GitHub
2010YOUY01 commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2938981804 > I'd be interested in working on this, but I might need a little guidance since I'm new to the project. Thank you! Here are some additional info Each operator h

[I] Query execution time difference between deltatable QueryBuilder and using DataFusion directly. [datafusion-python]

2025-06-04 Thread via GitHub
debajyoti-truefoundry opened a new issue, #1140: URL: https://github.com/apache/datafusion-python/issues/1140 **Describe the bug** **What happened**: There are two ways of querying a Delta Table using DataFusion. 1. [Using DataFusion directly.](https://github.com/delta-io/delta-rs/b

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-04 Thread via GitHub
krishvishal commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2939080331 @comphead I've pushed additional fixes you've asked. I think the PR is ready. Let me know if it is okay. -- This is an automated message from the Apache Git Service. To respond

[PR] chore(deps): bump sysinfo from 0.35.1 to 0.35.2 [datafusion]

2025-06-04 Thread via GitHub
dependabot[bot] opened a new pull request, #16247: URL: https://github.com/apache/datafusion/pull/16247 Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.35.1 to 0.35.2. Changelog Sourced from https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md";>sysi

[PR] Improve async Python integration with better error handling, signal checks, and KeyboardInterrupt support in query execution [datafusion-python]

2025-06-04 Thread via GitHub
kosiew opened a new pull request, #1141: URL: https://github.com/apache/datafusion-python/pull/1141 ## Which issue does this PR close? - Closes #1136. ## Rationale for this change This PR improves Python bindings for the DataFusion integration by addressing several short

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-04 Thread via GitHub
LiaCastaneda commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2126024222 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -443,6 +528,9 @@ impl MetricValue { .and_then(|ts| ts.timestamp_nanos_opt())

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-06-04 Thread via GitHub
rluvaton closed pull request #1793: chore: add assertion that not using comet scan but using native scan URL: https://github.com/apache/datafusion-comet/pull/1793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939442458 I am onboard with the approach in this PR, and it looks good to me overall. Just needs some finishing touches: - To avoid such a large diff (which is mostly plans in tests), I su

Re: [PR] Simplify FileSource / SchemaAdapterFactory API [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16214: URL: https://github.com/apache/datafusion/pull/16214#issuecomment-2939472367 Thank you @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-06-04 Thread via GitHub
gabotechs opened a new issue, #16248: URL: https://github.com/apache/datafusion/issues/16248 ### Is your feature request related to a problem or challenge? TODO: attach all the issues replicating Substrait conversion issues here ### Describe the solution you'd like _No re

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-04 Thread via GitHub
LiaCastaneda commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2126024222 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -443,6 +528,9 @@ impl MetricValue { .and_then(|ts| ts.timestamp_nanos_opt())

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939494008 > @zhuqi-lucas great work. I've continued playing around with alternative structures in the meantime, but I keep coming back to your `YieldStream` as the most elegant solution. It's se

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2126279454 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -344,7 +344,7 @@ impl Drop for ScopedTimerGuard<'_> { /// Among other differences, the metric types have dif

Re: [PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-06-04 Thread via GitHub
gabotechs commented on code in PR #16183: URL: https://github.com/apache/datafusion/pull/16183#discussion_r2126283562 ## datafusion/sqllogictest/bin/sqllogictests.rs: ## @@ -102,6 +103,11 @@ async fn run_tests() -> Result<()> { // to stdout and return OK so they can con

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16195: URL: https://github.com/apache/datafusion/pull/16195#discussion_r2126288404 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -443,6 +528,9 @@ impl MetricValue { .and_then(|ts| ts.timestamp_nanos_opt())

Re: [I] Query execution time difference between deltatable QueryBuilder and using DataFusion directly. [datafusion-python]

2025-06-04 Thread via GitHub
timsaucer commented on issue #1140: URL: https://github.com/apache/datafusion-python/issues/1140#issuecomment-2939558230 For whoever picks this up, it may be worth first investigating if it's due to the table provider or the ffi interface. I would probably try to reproduce in a pure rust e

Re: [PR] Perf: load default Utf8View for CSV datatype [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16243: URL: https://github.com/apache/datafusion/pull/16243#issuecomment-2939581266 > > It looks like no performance improvement for h2o_window benchmark result... > > Now that I think about it, the h2o benchmark may not have any string columns 🤔 >

Re: [PR] Perf: load default Utf8View for CSV datatype [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16243: URL: https://github.com/apache/datafusion/pull/16243#issuecomment-2939540830 > It looks like no performance improvement for h2o_window benchmark result... Now that I think about it, the h2o benchmark may not have any string columns 🤔 Do the TPCH b

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939607781 @alamb @zhuqi-lucas I've made an alternative sketch of a possible 'intrusive' approach at https://github.com/pepijnve/datafusion/commit/9c74748b7fc1033f29ce6053d898f269e5fb4b90 W

Re: [I] Incorrect count `null` in dict values [datafusion]

2025-06-04 Thread via GitHub
alamb commented on issue #16228: URL: https://github.com/apache/datafusion/issues/16228#issuecomment-2939646176 I vaguely remember there being some question about what a DictionaryArray where the index was non null but pointed to a null in the underlying values field. I can't remembe

Re: [PR] Add dicts to aggregation fuzz testing [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16232: URL: https://github.com/apache/datafusion/pull/16232#discussion_r2126344160 ## datafusion/core/tests/fuzz_cases/record_batch_generator.rs: ## @@ -656,11 +713,110 @@ impl RecordBatchGenerator { self,

Re: [PR] Add dicts to aggregation fuzz testing [datafusion]

2025-06-04 Thread via GitHub
blaginin commented on PR #16232: URL: https://github.com/apache/datafusion/pull/16232#issuecomment-2939655137 Thanks for the review! > Will this test fail intermittently? It won't fail because of `0` here https://github.com/apache/datafusion/pull/16232/files#diff-08d7a

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939662555 > @alamb @zhuqi-lucas I've made an alternative sketch of a possible 'intrusive' approach at https://github.com/pepijnve/datafusion/commit/9c74748b7fc1033f29ce6053d898f269e5fb4b90 Wo

Re: [PR] Add dicts to aggregation fuzz testing [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16232: URL: https://github.com/apache/datafusion/pull/16232#discussion_r2126366811 ## datafusion/core/tests/fuzz_cases/record_batch_generator.rs: ## @@ -656,11 +713,110 @@ impl RecordBatchGenerator { self,

Re: [PR] Add dicts to aggregation fuzz testing [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16232: URL: https://github.com/apache/datafusion/pull/16232#issuecomment-2939665729 > Once https://github.com/apache/datafusion/issues/16228 is fixed, we'll be able to generate null values in both keys and values (will extend the ticket once this PR is merged)

[PR] feat: sort by with direction for hive [datafusion-sqlparser-rs]

2025-06-04 Thread via GitHub
chenkovsky opened a new pull request, #1873: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1873 In Hive SQL, we can add ASC, DESC after sort by columns. but currently this parser doesn't support it. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy.

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939714394 > I think it looks good, but it also seems like it doesn't fundamentally change @ozankabak 's concern of having to annotate all streams with cancel aware behavior 🤔 That's co

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939732997 > > @alamb @zhuqi-lucas I've made an alternative sketch of a possible 'intrusive' approach at [pepijnve@9c74748](https://github.com/pepijnve/datafusion/commit/9c74748b7fc1033f29c

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939817555 @alamb, I don't see how your two points materialize, but maybe I'm missing something. Let's analyze each one by one. >1. Only ExecutionPlans that don't have some other yield

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939913792 > Why do you think this is the case? The rule only inserts YieldExecs if the plan involves pipeline-breaking, and only once at leaves. I think the penalty/overhead will always be stric

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2939939329 Exactly 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add dicts to aggregation fuzz testing [datafusion]

2025-06-04 Thread via GitHub
blaginin merged PR #16232: URL: https://github.com/apache/datafusion/pull/16232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add dicts to aggregation fuzz testing [datafusion]

2025-06-04 Thread via GitHub
blaginin commented on code in PR #16232: URL: https://github.com/apache/datafusion/pull/16232#discussion_r2126542789 ## datafusion/core/tests/fuzz_cases/record_batch_generator.rs: ## @@ -656,11 +713,110 @@ impl RecordBatchGenerator { self,

Re: [PR] Always add parentheses when formatting `BinaryExpr` with `SchemaDisplay` [datafusion]

2025-06-04 Thread via GitHub
hendrikmakait commented on PR #16209: URL: https://github.com/apache/datafusion/pull/16209#issuecomment-2939950601 @xudong963: CI is green now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: Fix incorrect logic in CometExecRule aggregate handling [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove closed pull request #1841: fix: Fix incorrect logic in CometExecRule aggregate handling URL: https://github.com/apache/datafusion-comet/pull/1841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940030602 > The rule only inserts YieldExecs if the plan involves pipeline-breaking, and only once at leaves. The assumption this design makes is that a `Pending` result produced at a

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940060922 Thank you folks, i will forward this solution and do next clean up and optimize. For the interleave and SortPreservingMergeExec, etc corner cases, i agree that we can add

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-04 Thread via GitHub
sfluor commented on PR #16195: URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2940051045 Thanks for the review @alamb ! I removed the panic and went with a default value of 0 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940101271 > 48.0.0 rc1 is ready: https://github.com/apache/datafusion/tree/48.0.0-rc1 Thanks @xudong963. I'll start testing this with Comet. I don't seem to have received a

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-04 Thread via GitHub
suibianwanwank commented on code in PR #16234: URL: https://github.com/apache/datafusion/pull/16234#discussion_r2126673547 ## datafusion/physical-expr/src/window/window_expr.rs: ## @@ -186,6 +186,10 @@ pub trait AggregateWindowExpr: WindowExpr { accumulator: &mut Box,

[PR] chore: Upgrade to DataFusion 48.0.0-rc1 [datafusion-comet]

2025-06-04 Thread via GitHub
andygrove opened a new pull request, #1842: URL: https://github.com/apache/datafusion-comet/pull/1842 ## Which issue does this PR close? N/A ## Rationale for this change Test the release candidate. ## What changes are included in this PR?

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940167521 > This assumption turns out to be quite easy to break as in the interleave example. I would say it is not *that* easy to break, took you some effort to contrive an example.

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940197051 > I don't seem to have received an email about the start of the vote for this release candidate. I was thinking of starting the vote after testing, maybe Friday. WDYT?

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126721050 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -159,38 +200,17 @@ impl Display for PhysicalSortExpr { } } -impl PhysicalSortExpr { -/// e

Re: [I] `sort_fuzz` testing DX improvements [datafusion]

2025-06-04 Thread via GitHub
blaginin commented on issue #16233: URL: https://github.com/apache/datafusion/issues/16233#issuecomment-2940217723 hey @jonathanc-n > set an environment variable to the seed I think it's a good idea to print the seed used, but not sure we should implicitly set the env variable.

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126725189 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -334,177 +313,142 @@ fn to_str(options: &SortOptions) -> &str { } } -///`LexOrdering` contains

[PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb opened a new pull request, #16249: URL: https://github.com/apache/datafusion/pull/16249 ## Which issue does this PR close? - Related to https://github.com/apache/arrow-rs/issues/6692 - Related to #3463 ## Rationale for this change I am trying to move the coa

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126733166 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -334,177 +313,142 @@ fn to_str(options: &SortOptions) -> &str { } } -///`LexOrdering` contains

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940237225 > was thinking of starting the vote after testing, maybe Friday. WDYT? It would be better for the whole community to have the opportunity to participate in the testing,

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126736424 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -334,177 +313,142 @@ fn to_str(options: &SortOptions) -> &str { } } -///`LexOrdering` contains

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940253204 Also, the vote can run for longer than three days if we need more time for testing. Three days is just the minimum time. -- This is an automated message from the Apache Git

[I] sqllogictest tests fail when run locally [datafusion]

2025-06-04 Thread via GitHub
alamb opened a new issue, #16250: URL: https://github.com/apache/datafusion/issues/16250 ### Describe the bug When I run sqllogictests locally on my laptop it fails. It appears to pass on CI This is annoying as it makes it harder to evaluate changes locally ### To Reprod

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126760915 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -129,23 +138,55 @@ impl PhysicalSortExpr { to_str(&self.options) ) } -} -

Re: [PR] Add `--substrait-round-trip` option in sqllogictests [datafusion]

2025-06-04 Thread via GitHub
gabotechs commented on PR #16183: URL: https://github.com/apache/datafusion/pull/16183#issuecomment-2939578353 > I suggest adding a CI test that exercises this option with a passing .slt file Something like this https://github.com/apache/datafusion/pull/16183/commits/1a05ba290752813

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16249: URL: https://github.com/apache/datafusion/pull/16249#discussion_r2126736135 ## datafusion/physical-plan/src/coalesce/mod.rs: ## @@ -488,110 +329,6 @@ mod tests { .unwrap() } -#[test] Review Comment: all moved upstream

Re: [PR] Always add parentheses when formatting `BinaryExpr` with `SchemaDisplay` [datafusion]

2025-06-04 Thread via GitHub
alamb commented on code in PR #16209: URL: https://github.com/apache/datafusion/pull/16209#discussion_r2126757732 ## datafusion-examples/examples/date_time_functions.rs: ## @@ -122,15 +122,15 @@ async fn query_make_date() -> Result<()> { let df = ctx.sql("select make_date(y

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2940311662 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Always add parentheses when formatting `BinaryExpr` with `SchemaDisplay` [datafusion]

2025-06-04 Thread via GitHub
xudong963 commented on PR #16209: URL: https://github.com/apache/datafusion/pull/16209#issuecomment-2940212614 > @xudong963: CI is green now. cc @alamb for another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Migrate `core` tests to `insta` [datafusion]

2025-06-04 Thread via GitHub
Chen-Yuan-Lai commented on issue #15791: URL: https://github.com/apache/datafusion/issues/15791#issuecomment-2940292401 @blaginin Thanks for your suggestion, I will try it soon. > Also, feel free to split this ticket into several PRs - I think that way you'll get feedback faster :)

Re: [PR] chore: Upgrade to DataFusion 48.0.0-rc1 [datafusion-comet]

2025-06-04 Thread via GitHub
codecov-commenter commented on PR #1842: URL: https://github.com/apache/datafusion-comet/pull/1842#issuecomment-2940327336 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1842?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126810121 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -516,162 +460,240 @@ impl Display for LexOrdering { } } -impl FromIterator for LexOrdering { -

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126812465 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -516,162 +460,240 @@ impl Display for LexOrdering { } } -impl FromIterator for LexOrdering { -

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-04 Thread via GitHub
paleolimbot commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2126762149 ## datafusion/expr/src/expr.rs: ## @@ -274,16 +275,16 @@ use sqlparser::ast::{ /// assert!(rewritten.transformed); /// // to 42 = 5 AND b = 6 /// assert_eq!

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126826372 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -516,162 +460,240 @@ impl Display for LexOrdering { } } -impl FromIterator for LexOrdering { -

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126827741 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -516,162 +460,240 @@ impl Display for LexOrdering { } } -impl FromIterator for LexOrdering { -

Re: [PR] chore(deps): bump sysinfo from 0.35.1 to 0.35.2 [datafusion]

2025-06-04 Thread via GitHub
comphead merged PR #16247: URL: https://github.com/apache/datafusion/pull/16247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126830302 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -782,14 +756,92 @@ where } } } + +#[derive(Debug)] Review Comment: Right -- will do

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940255974 > It would be better for the whole community to have the opportunity to participate in the testing, which is part of the voting process. My concern is that otherwise, it may l

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126840903 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -16,115 +16,76 @@ // under the License. use std::fmt::Display; -use std::hash::Hash; +use std:

[I] Spark Test fails `vectorized reader: missing all struct fields` [datafusion-comet]

2025-06-04 Thread via GitHub
comphead opened a new issue, #1843: URL: https://github.com/apache/datafusion-comet/issues/1843 ### Describe the bug when reading with the specified schema ``` test("vectorized reader: missing all struct fields") { Seq(true, false).foreach { offheapEnabled => wi

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126871133 ## datafusion/physical-expr/src/equivalence/ordering.rs: ## @@ -164,51 +124,40 @@ impl OrderingEquivalenceClass { /// For example, if `orderings[idx]` is `[

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126835220 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -1195,13 +1234,90 @@ impl AggregateUDFImpl for LastValue { } } +#[derive(Debug)] +pub struct T

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2940446605 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_upstream_coalesce Benchmark clickbench_extended.json -

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940459402 ``` 1. statement is expected to fail with error: (multiline) DataFusion error: Error during planning: Failed to coerce arguments to satisfy a call to 'approx_per

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126883202 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -173,9 +173,6 @@ SELECT approx_percentile_cont(c12) WITHIN GROUP (ORDER BY c12) FROM aggregate_te st

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940461831 It seems to be related to https://github.com/apache/datafusion/issues/16250 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
timsaucer commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940466020 I really want to get in these two PRs: - https://github.com/apache/datafusion/pull/14775 - https://github.com/apache/datafusion/pull/16170 I'm going to focus on get

Re: [PR] Draft: Use upstream arrow `coalesce` kernel in DataFusion [datafusion]

2025-06-04 Thread via GitHub
alamb commented on PR #16249: URL: https://github.com/apache/datafusion/pull/16249#issuecomment-2940467360 TLDR is that the performance looks good. I'll fixup the tests shortly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-04 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2940467339 > I am onboard with the approach in this PR, and it looks good to me overall. Just needs some finishing touches: > > * To avoid such a large diff (which is mostly plans in

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126891646 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1053,25 +1051,9 @@ impl TableProvider for ListingTable { file_extension: self.options().f

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126895689 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3208,10 +3215,11 @@ fn do_not_preserve_ordering_through_repartition() -> Result<()>

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126896630 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -210,24 +175,27 @@ async fn test_remove_unnecessary_sort5() -> Result<()> { let left_s

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-04 Thread via GitHub
mbutrovich commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2940482148 It looks like DataFusion is currently using a mix of Arrow-rs 55.0 and 55.1.0: ``` arrow = { version = "55.1.0", features = [ "prettyprint", "chrono-tz",

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126879246 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -2581,7 +2581,7 @@ async fn test_count_wildcard_on_sort() -> Result<()> { | | TableSc

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-04 Thread via GitHub
ozankabak commented on code in PR #16217: URL: https://github.com/apache/datafusion/pull/16217#discussion_r2126889241 ## datafusion/sqllogictest/test_files/topk.slt: ## @@ -370,7 +370,7 @@ query TT explain select number, letter, age, number as column4, letter as column5 from p

[I] CometColumnarExchange throws exception when reading Delta table [datafusion-comet]

2025-06-04 Thread via GitHub
Kontinuation opened a new issue, #1844: URL: https://github.com/apache/datafusion-comet/issues/1844 ### Describe the bug Reading Delta table with Comet enabled fails with the following error: ``` 25/05/30 09:53:47 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)

Re: [I] CometColumnarExchange throws exception when reading Delta table [datafusion-comet]

2025-06-04 Thread via GitHub
mbutrovich commented on issue #1844: URL: https://github.com/apache/datafusion-comet/issues/1844#issuecomment-2940502662 Possibly related to #1823 (i.e., not necessarily delta specific)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] CometColumnarExchange throws exception when reading Delta table [datafusion-comet]

2025-06-04 Thread via GitHub
Kontinuation commented on issue #1844: URL: https://github.com/apache/datafusion-comet/issues/1844#issuecomment-2940507433 > Possibly related to [#1823](https://github.com/apache/datafusion-comet/issues/1823) (i.e., not necessarily delta specific)? Probably, the stack trace and erro

  1   2   >