Re: [D] DISCUSSION: Anyone around for the Databricks Data & AI Summit in San Francisco June 9–12 2025 [datafusion]

2025-06-30 Thread via GitHub
GitHub user alamb closed the discussion with a comment: DISCUSSION: Anyone around for the Databricks Data & AI Summit in San Francisco June 9–12 2025 Here are the slide links for the talks. I have raw video but it isn't processed into coherent videos -- if anyone is interested in doing that le

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-06-30 Thread via GitHub
dependabot[bot] commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-3018758042 A newer version of apache-avro exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as nor

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-30 Thread via GitHub
alamb commented on code in PR #75: URL: https://github.com/apache/datafusion-site/pull/75#discussion_r2174839499 ## content/blog/2025-06-30-cancellation.md: ## @@ -0,0 +1,490 @@ +--- +layout: post +title: Using Rust async for Execution and Cancelling Long-Running Queries Review

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-30 Thread via GitHub
alamb merged PR #75: URL: https://github.com/apache/datafusion-site/pull/75 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [I] Blog post about DataFusion Async / Stream execution model / cancellation [datafusion]

2025-06-30 Thread via GitHub
alamb closed issue #16396: Blog post about DataFusion Async / Stream execution model / cancellation URL: https://github.com/apache/datafusion/issues/16396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] [Discuss] Release cadence / patch releases / Long Term Supported (lts) minor releases [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #5269: URL: https://github.com/apache/datafusion/issues/5269#issuecomment-3018641436 I think the upgrade situation is better than it was previously. For example we now document major upgrades and * https://datafusion.apache.org/library-user-guide/upgrading.htm

[PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-06-30 Thread via GitHub
dependabot[bot] opened a new pull request, #16621: URL: https://github.com/apache/datafusion/pull/16621 Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]:

Re: [PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-30 Thread via GitHub
ding-young commented on PR #16512: URL: https://github.com/apache/datafusion/pull/16512#issuecomment-3018735400 On my machine, avg bandwidth (throughput) is TBD... And, when I ran `strace -c -e trace=write,read cargo bench --bench spill_io compression` only for plain encoding

[I] [DISCUSS] DataFusion minor releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
alamb opened a new issue, #16622: URL: https://github.com/apache/datafusion/issues/16622 ### Is your feature request related to a problem or challenge? One of the dreams of the composable data ecosystem is to quickly assemble a system from various components (DataFusion, data formats

Re: [I] [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #13648: URL: https://github.com/apache/datafusion/issues/13648#issuecomment-3018739845 I added a ticket to discuss less frequent major releases - https://github.com/apache/datafusion/issues/16622 -- This is an automated message from the Apache Git Servic

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-30 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-3018782433 The blog is live! https://datafusion.apache.org/blog/2025/06/30/cancellation/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
zhuqi-lucas commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3018792068 > An update here is I am working on some scripts to start measuring performance over time of datafusion so we can get a better handle on how performance is changing over time

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3019026179 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_arrow_55.2.0_upgrade_real Benchmark clickbench_1.json -

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
timsaucer commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3019038058 I tend to favor option 1. I am currently in this same position where each version of DF requires updates to Lance and datafusion-python. In the last update, within a week of g

Re: [I] Unnest logical plan lacks decent projection push down [datafusion]

2025-06-30 Thread via GitHub
bert-beyondloops commented on issue #16623: URL: https://github.com/apache/datafusion/issues/16623#issuecomment-3019062655 Experimenting with a potential fix by treating the unnest logical plan the same as a limit plan seems to solve this issue? : optimize_projections/mod.rs :

Re: [PR] Convert Option> to Vec [datafusion]

2025-06-30 Thread via GitHub
crepererum commented on code in PR #16615: URL: https://github.com/apache/datafusion/pull/16615#discussion_r2174591239 ## datafusion/substrait/src/logical_plan/consumer/rel/aggregate_rel.rs: ## @@ -89,12 +89,9 @@ pub async fn from_aggregate_rel( _ => fal

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3018826509 > Perhaps part of a solution could be to invert dependencies between the project. For example, Iceberg-DataFusion and Delta-DataFusion integrations could live in this repo.

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
findepi commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3018759835 > I would like downstream libraries to have more time and schedule flexibility when upgrading DataFusion and other dependent crates, so that it is easier to construct a system f

Re: [PR] fix: support scalar function nested in get_field [datafusion]

2025-06-30 Thread via GitHub
goldmedal commented on code in PR #16610: URL: https://github.com/apache/datafusion/pull/16610#discussion_r2174859892 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -2596,3 +2600,36 @@ fn test_not_ilike_filter_with_escape() { @"SELECT person.first_name FROM person

Re: [PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-30 Thread via GitHub
ding-young commented on code in PR #16512: URL: https://github.com/apache/datafusion/pull/16512#discussion_r2174808708 ## datafusion/physical-plan/benches/spill_io.rs: ## @@ -119,5 +127,450 @@ fn bench_spill_io(c: &mut Criterion) { group.finish(); } -criterion_group!(ben

Re: [D] DISCUSSION: Anyone around for the Databricks Data & AI Summit in San Francisco June 9–12 2025 [datafusion]

2025-06-30 Thread via GitHub
GitHub user phillipleblanc closed a discussion: DISCUSSION: Anyone around for the Databricks Data & AI Summit in San Francisco June 9–12 2025 I'll be traveling to San Francisco to attend the Databricks Data & AI Summit in San Francisco this June. I can't commit to hosting a full meetup, but i

[PR] Avoid treating incomparable scalars as equal [datafusion]

2025-06-30 Thread via GitHub
findepi opened a new pull request, #16624: URL: https://github.com/apache/datafusion/pull/16624 Fix calls to `ScalarValue::partial_cmp` that treat `None` return value as equivalent to equality. The `partial_cmp` returns `None` when values cannot be compared, for example they have incompatib

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3018996087 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_arrow_55.2.0_upgrade_real Benchmark clickbench_extended.json --

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3018996214 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-30 Thread via GitHub
findepi commented on PR #16539: URL: https://github.com/apache/datafusion/pull/16539#issuecomment-3018609552 > The #15486 was merged fairly recently and didn't change the non-scalar code path. I think we should restore the decimal cast behavior to as it was before #15486. @alamb

Re: [I] Aggregation fuzz testing [datafusion]

2025-06-30 Thread via GitHub
alamb closed issue #12114: Aggregation fuzz testing URL: https://github.com/apache/datafusion/issues/12114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [I] Aggregation fuzz testing [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #12114: URL: https://github.com/apache/datafusion/issues/12114#issuecomment-3018622566 I am closing this as I think we have pretty good fuzz testing now and a framework to extend coverage as needed Thanks everyone -- This is an automated message from the Ap

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-06-30 Thread via GitHub
sfluor commented on code in PR #16519: URL: https://github.com/apache/datafusion/pull/16519#discussion_r2174894499 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -341,12 +341,20 @@ impl Accumulator for ArrayAggAccumulator { Some(values) => {

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3018876114 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
milenkovicm commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3018885402 IMHO, option 1 may be better, it is not as frequent as current release cycle but still frequent enough to take breaking changes in relatively small and digestible chunks.

Re: [PR] fix: The inconsistency between scalar and array on the cast of timestamp [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16539: URL: https://github.com/apache/datafusion/pull/16539#issuecomment-3018897937 > > The #15486 was merged fairly recently and didn't change the non-scalar code path. I think we should restore the decimal cast behavior to as it was before #15486. > > @alamb

[I] Unnest logical plan lacks decent projection push down [datafusion]

2025-06-30 Thread via GitHub
bert-beyondloops opened a new issue, #16623: URL: https://github.com/apache/datafusion/issues/16623 ### Describe the bug When creating the unnest logical plan directly, the projection push down optimiser does not eliminate unused columns. Example plan : CREATE TABLE foo

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3018906835 > I agree, not so sure either :) though I was also thinking about other areas where we might do casting in filter expression and therefore limit the pushdown usefulness. Needs some exa

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3018912318 Instead of merging this PR, I would like to propose an alternative (just add comments): - https://github.com/apache/datafusion/pull/16605 -- This is an automated message from the

Re: [PR] Fix wrong domain push down [datafusion]

2025-06-30 Thread via GitHub
duongcongtoai closed pull request #16611: Fix wrong domain push down URL: https://github.com/apache/datafusion/pull/16611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3018926449 And to be clear, in my mind option 1 still has us doing monthly releases, we just restrict which releases have breaking API changes -- This is an automated message from the Apac

[I] Performance of `distinct on (columns)` [datafusion]

2025-06-30 Thread via GitHub
debajyoti-truefoundry opened a new issue, #16620: URL: https://github.com/apache/datafusion/issues/16620 ### Describe the bug The query filter selects `492435` rows. As there may be duplicates, I need to execute a distinct query on a column. Then order by timestamp, and retrieve the

Re: [PR] Support remaining pipe operators [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
simonvandel commented on code in PR #1879: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1879#discussion_r2174449622 ## src/parser/mod.rs: ## @@ -11200,6 +11255,117 @@ impl<'a> Parser<'a> { let sample = self.parse_table_sample(TableSampleModi

Re: [PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-30 Thread via GitHub
ding-young commented on PR #16512: URL: https://github.com/apache/datafusion/pull/16512#issuecomment-3018158695 @2010YOUY01 Thank you for detailed review! The bandwidth result is quite interesting.. Btw, I'd like to measure the bandwidth, but it looks like Criterion doesn’t expose the m

[I] Inconsistent `time_elapsed_scanning_total` result across different queries scanning the same data [datafusion]

2025-06-30 Thread via GitHub
debajyoti-truefoundry opened a new issue, #16619: URL: https://github.com/apache/datafusion/issues/16619 ### Describe the bug Initially, I thought `time_elapsed_scanning_total` is just the amount of CPU time spent in fetching and decoding the Parquet files, as described below: htt

Re: [PR] feat: python based catalog and schema provider [datafusion-python]

2025-06-30 Thread via GitHub
renato2099 commented on PR #1156: URL: https://github.com/apache/datafusion-python/pull/1156#issuecomment-3018187257 Hi @timsaucer , I am sorry I wasn't able to complete this in time, but I had it still on my radar. I pushed my version yesterday after the holidays :) https://github.

Re: [PR] Support remaining pipe operators [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
iffyio commented on code in PR #1879: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1879#discussion_r2174369941 ## src/parser/mod.rs: ## @@ -11200,6 +11255,117 @@ impl<'a> Parser<'a> { let sample = self.parse_table_sample(TableSampleModifier:

Re: [I] Format for Value renders incorrect escaping of quote characters in BigQuery [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
brunal commented on issue #1695: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1695#issuecomment-3018526700 Another issue `impl Display for EscapeQuotedString` is that `'\'''` correctly gets parsed as 1 backlash and 1 single-quote, but displaying it returns `'\''` which is

Re: [PR] chore: update datafusion to 48 [datafusion-ballista]

2025-06-30 Thread via GitHub
milenkovicm merged PR #1270: URL: https://github.com/apache/datafusion-ballista/pull/1270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] docs: Apply method chaining in example [datafusion-ballista]

2025-06-30 Thread via GitHub
milenkovicm merged PR #1276: URL: https://github.com/apache/datafusion-ballista/pull/1276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
findepi commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-301918 If we look at the problem through the lens of project consumers, the expectation to never break anything is natural. If we look at the problem of feature implementers, t

Re: [I] [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #13648: URL: https://github.com/apache/datafusion/issues/13648#issuecomment-3019110593 That is a great idea @jonmmease -- I wonder if you have seen the new upgrade notes for the last few releases - https://datafusion.apache.org/library-user-guide/upgrading.html

Re: [I] [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) [datafusion]

2025-06-30 Thread via GitHub
jonmmease commented on issue #13648: URL: https://github.com/apache/datafusion/issues/13648#issuecomment-3019095705 As I've come to use Claude Code more for development, I've found that it's pretty good at performing version updates for Rust crates using Release Note + Compiler Errors + Tes

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3019119788 This is a pretty sweet idea from @jonmmease about making upgrades easier (use LLM agents): https://github.com/apache/datafusion/issues/13648#issuecomment-3019095705 (it is

Re: [PR] Allow usage of table functions in relations [datafusion]

2025-06-30 Thread via GitHub
alamb merged PR #16571: URL: https://github.com/apache/datafusion/pull/16571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Allow table fucntion in joins [datafusion]

2025-06-30 Thread via GitHub
alamb closed issue #16568: Allow table fucntion in joins URL: https://github.com/apache/datafusion/issues/16568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Allow usage of table functions in relations [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16571: URL: https://github.com/apache/datafusion/pull/16571#issuecomment-3019132989 Thank you @osipovartem -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] fix: support scalar function nested in get_field [datafusion]

2025-06-30 Thread via GitHub
chenkovsky commented on code in PR #16610: URL: https://github.com/apache/datafusion/pull/16610#discussion_r2175074960 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -2596,3 +2600,36 @@ fn test_not_ilike_filter_with_escape() { @"SELECT person.first_name FROM person

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-06-30 Thread via GitHub
findepi commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3019616554 thanks for your review @ozankabak > * `FIRST_VALUE` and `LAST_VALUE` implementations use `requirement_satisfied` and `with_requirement_satisfied` names for this thing, IMO it

Re: [I] Support multiple order aware aggregate functions in a query [datafusion]

2025-06-30 Thread via GitHub
ozankabak commented on issue #8582: URL: https://github.com/apache/datafusion/issues/8582#issuecomment-3019612464 Almost, left some pointers in the relevant PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] chore: Start 0.10.0 development [datafusion-comet]

2025-06-30 Thread via GitHub
andygrove opened a new pull request, #1958: URL: https://github.com/apache/datafusion-comet/pull/1958 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-06-30 Thread via GitHub
findepi commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3019675013 > will align for array_agg Sorry, with `requirement_satisfied` i am afraid it's easy to not to know what exact requirement is guaranteed to be satisfied. I propose that we chan

Re: [I] Flaky SLT test union_by_name.slt:343 [datafusion]

2025-06-30 Thread via GitHub
findepi commented on issue #16585: URL: https://github.com/apache/datafusion/issues/16585#issuecomment-3019317553 This makes running `cargo test` locally less friendly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]

2025-06-30 Thread via GitHub
ahmed-mez commented on PR #16506: URL: https://github.com/apache/datafusion/pull/16506#issuecomment-3019500939 Apologies for the delay. I added a commit with a [reproducer test case](https://github.com/apache/datafusion/pull/16506/commits/1a9584b64349acfea70b6e82dbaabbdeaa625758) as

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-06-30 Thread via GitHub
findepi commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3020324723 I think global array_agg is not a very interesting scenario and a grouped array_agg no longer requires global sorting. Can we agree the latter is an improvement? Perhaps a big onc

[PR] Redshift utf8 idents [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
yoavcloud opened a new pull request, #1915: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1915 Added support for UTF-8 multibyte chars in Redshift identifiers as described here: https://docs.aws.amazon.com/redshift/latest/dg/r_names.html#r_names-standard-identifiers -- This

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3020365227 So for Version 2 (a LTS branch) do we have any proposal for the cadence of releases? Like would we do 2 releases each month now? 1. LTS release 2. Major release from

Re: [PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16621: URL: https://github.com/apache/datafusion/pull/16621#issuecomment-3020390335 Dupe of https://github.com/apache/datafusion/pull/16575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb merged PR #16575: URL: https://github.com/apache/datafusion/pull/16575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-06-30 Thread via GitHub
dependabot[bot] commented on PR #16621: URL: https://github.com/apache/datafusion/pull/16621#issuecomment-3020390716 This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests. To ignore these dependencies, configure [ig

Re: [PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-06-30 Thread via GitHub
alamb closed pull request #16621: chore(deps): bump the arrow-parquet group with 7 updates URL: https://github.com/apache/datafusion/pull/16621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Migrate core test to insta, part 2 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on code in PR #16617: URL: https://github.com/apache/datafusion/pull/16617#discussion_r2175721562 ## datafusion/core/tests/physical_optimizer/combine_partial_final_agg.rs: ## @@ -43,22 +44,18 @@ use datafusion_physical_plan::ExecutionPlan; /// Runs the Combine

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3020387121 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Fix spurious failure in convert_batches test helper [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16627: URL: https://github.com/apache/datafusion/pull/16627#issuecomment-3020392989 THANK YOU -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Fix: make datafusion-cli running inconsistent with clickbench benchma… [datafusion]

2025-06-30 Thread via GitHub
adriangb commented on PR #16599: URL: https://github.com/apache/datafusion/pull/16599#issuecomment-3019162455 > @adriangb is also thinking about / working on a more general predicate optimziation in > > * [Add PhysicalExpr optimizer and cast unwrapping  #16530](https://github.com/apac

[PR] Aggregate UDF cleanup [datafusion]

2025-06-30 Thread via GitHub
findepi opened a new pull request, #16628: URL: https://github.com/apache/datafusion/pull/16628 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
andygrove commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3019338793 I favor option 2 (as already pointed out in the original issue). I don't think that we should artificially slow down development against the `main` branch. We already h

Re: [I] Continue optimizing the CursorValues compare for StringViewArray [datafusion]

2025-06-30 Thread via GitHub
zhuqi-lucas commented on issue #16629: URL: https://github.com/apache/datafusion/issues/16629#issuecomment-3019593203 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[I] Continue optimizing the CursorValues compare for StringViewArray [datafusion]

2025-06-30 Thread via GitHub
zhuqi-lucas opened a new issue, #16629: URL: https://github.com/apache/datafusion/issues/16629 ### Is your feature request related to a problem or challenge? The arrow-rs compare improvement has been merged, we can apply it to CursorValues compare for StringViewArray in datafusion.

[PR] Perf: fast CursorValues compare for StringViewArray using inline_key_… [datafusion]

2025-06-30 Thread via GitHub
zhuqi-lucas opened a new pull request, #16630: URL: https://github.com/apache/datafusion/pull/16630 …fast ## Which issue does this PR close? The arrow-rs compare improvement has been merged, we can apply it to CursorValues compare for StringViewArray in datafusion. Relat

Re: [I] [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #13648: URL: https://github.com/apache/datafusion/issues/13648#issuecomment-3019252931 We should probably update the changelog generator script to include a link to the upgrade guide 🤔 -- This is an automated message from the Apache Git Service. To respond to the

[PR] Support multiple ordered array_agg [datafusion]

2025-06-30 Thread via GitHub
findepi opened a new pull request, #16625: URL: https://github.com/apache/datafusion/pull/16625 ## Which issue does this PR close? None. https://github.com/apache/datafusion/issues/8582 is related. ## Rationale for this change Before the change, `array_agg` with ordering

Re: [PR] chore: [branch-0.9] Prepare 0.9.0 release [datafusion-comet]

2025-06-30 Thread via GitHub
andygrove merged PR #1956: URL: https://github.com/apache/datafusion-comet/pull/1956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] [EPIC] A collection of items to improve DataFuson stability (reduce effort required to upgrade) [datafusion]

2025-06-30 Thread via GitHub
jonmmease commented on issue #13648: URL: https://github.com/apache/datafusion/issues/13648#issuecomment-3019224293 Thanks, no I hadn't come across these, and the look great!. My habit was just to search for the [CHANGELOG](https://github.com/apache/datafusion/blob/main/dev/changelog/48.0.0

[I] update the changelog generator script to include a link to the upgrade guide 🤔 [datafusion]

2025-06-30 Thread via GitHub
alamb opened a new issue, #16626: URL: https://github.com/apache/datafusion/issues/16626 @jonmmease pointed out that the upgrade guide was not included in the CHANGELOG so he did not know about it. > We should probably update the changelog generator script to include a link to the

Re: [I] Flaky SLT test union_by_name.slt:343 [datafusion]

2025-06-30 Thread via GitHub
findepi commented on issue #16585: URL: https://github.com/apache/datafusion/issues/16585#issuecomment-3019386943 This reproduces locally quite well with ```bash for i in `seq 100`; do cargo test --test sqllogictests -- union_by_name done ``` -- This is an au

[PR] Improve field naming in first_value, last_value implementation [datafusion]

2025-06-30 Thread via GitHub
findepi opened a new pull request, #16631: URL: https://github.com/apache/datafusion/pull/16631 ## Which issue does this PR close? None ## Rationale for this change The naming is good, but can be perfected. Relates to https://github.com/apache/datafusion/pull/166

Re: [I] Run DataFusion benchmarks regularly and track performance history over time [datafusion]

2025-06-30 Thread via GitHub
alamb commented on issue #5504: URL: https://github.com/apache/datafusion/issues/5504#issuecomment-3020218014 With the help of Claude / copilot agent I have made some pretty good progress here THe code is here https://github.com/alamb/datafusion-benchmarking The rendered output

Re: [PR] chore: Drop support for RightSemi and RightAnti join types [datafusion-comet]

2025-06-30 Thread via GitHub
parthchandra commented on PR #1935: URL: https://github.com/apache/datafusion-comet/pull/1935#issuecomment-3020266365 > > I was surprised to find that Spark does not have Right Semi and Right Anti. :) > > I found it surprising, too. What must have limited the decision to not incorpo

Re: [PR] Update to arrow/parquet 55.2.0 [datafusion]

2025-06-30 Thread via GitHub
alamb commented on PR #16575: URL: https://github.com/apache/datafusion/pull/16575#issuecomment-3020294645 I am going to merge this PR in to unblock other things (like the PR for external parquet indexes) I am not really sure about the performance impact as I see some non trivial flu

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
comphead commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3019727409 More inclining to version 2 with slight modifications. Similar to Rustc flow https://web.mit.edu/rust-lang_v1.25/arch/amd64_ubuntu1404/share/doc/rust/html/book/second-edition/ch

Re: [I] Support SQL pipe operator [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
iffyio closed issue #1758: Support SQL pipe operator URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Support remaining pipe operators [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
iffyio merged PR #1879: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Support multiple ordered array_agg aggregations [datafusion]

2025-06-30 Thread via GitHub
ozankabak commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3019757986 > Sorry, with `requirement_satisfied` i am afraid it's easy to not to know what exact requirement is guaranteed to be satisfied. I propose that we change the FIRST_VALUE and LAST_V

[PR] Fix spurious failure in convert_batches test helper [datafusion]

2025-06-30 Thread via GitHub
findepi opened a new pull request, #16627: URL: https://github.com/apache/datafusion/pull/16627 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16585 ## Rationale for this change When query involves e.g. UNION ALL, it may pr

Re: [I] [DISCUSS] DataFusion less frequent major / breaking releases (ease using multiple third-party extensions (like delta, or iceberg) ) [datafusion]

2025-06-30 Thread via GitHub
andygrove commented on issue #16622: URL: https://github.com/apache/datafusion/issues/16622#issuecomment-3019408850 The responsibility for creating the PRs to backport fixes to the release branch should fall to the downstream users who are waiting on those fixes. There is additional work fo

Re: [PR] chore: Start 0.10.0 development [datafusion-comet]

2025-06-30 Thread via GitHub
codecov-commenter commented on PR #1958: URL: https://github.com/apache/datafusion-comet/pull/1958#issuecomment-3019835072 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1958?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Avoid treating incomparable scalars as equal [datafusion]

2025-06-30 Thread via GitHub
findepi commented on code in PR #16624: URL: https://github.com/apache/datafusion/pull/16624#discussion_r2175762054 ## datafusion/common/src/scalar/mod.rs: ## @@ -3343,6 +3339,14 @@ impl ScalarValue { arr1 == &right } +/// Compare `self` with `other` and retu

[PR] fix: DuckDB accepts 2nd characters argument to TRIM [datafusion-sqlparser-rs]

2025-06-30 Thread via GitHub
ryanschneider opened a new pull request, #1916: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1916 Like BigQuery and Snowflake, DuckDB also supports the 2nd `characters` argument to `TRIM`: https://duckdb.org/docs/stable/sql/functions/text.html#trimstring-characters - D

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-30 Thread via GitHub
andygrove commented on PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#issuecomment-3019807679 The 4.0.0 diff will now need to be updated to reflect the changes made to 4.0.0-preview1 in https://github.com/apache/datafusion-comet/pull/1936 I suggest that we don't

Re: [I] Unnest logical plan lacks decent projection push down [datafusion]

2025-06-30 Thread via GitHub
bert-beyondloops commented on issue #16623: URL: https://github.com/apache/datafusion/issues/16623#issuecomment-3019808661 The PR serves as an initial implementation. All feedback welcome. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[PR] Fix: optimize projections for unnest logical plan. [datafusion]

2025-06-30 Thread via GitHub
bert-beyondloops opened a new pull request, #16632: URL: https://github.com/apache/datafusion/pull/16632 ## Which issue does this PR close? - Closes #16623. ## Rationale for this change See issue ## What changes are included in this PR? ## Are these changes

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-30 Thread via GitHub
andygrove commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2175499403 ## .github/actions/java-test/action.yaml: ## @@ -68,7 +68,7 @@ runs: env: COMET_PARQUET_SCAN_IMPL: ${{ inputs.scan_impl }} run: | -

Re: [PR] Avoid treating incomparable scalars as equal [datafusion]

2025-06-30 Thread via GitHub
alamb commented on code in PR #16624: URL: https://github.com/apache/datafusion/pull/16624#discussion_r2175729099 ## datafusion/common/src/scalar/mod.rs: ## @@ -3343,6 +3339,14 @@ impl ScalarValue { arr1 == &right } +/// Compare `self` with `other` and return

[PR] docs: Update benchmark results for 0.9.0 [datafusion-comet]

2025-06-30 Thread via GitHub
andygrove opened a new pull request, #1959: URL: https://github.com/apache/datafusion-comet/pull/1959 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

  1   2   >