[PR] Doc: Update upgrade guide for the rewritten NLJ operator [datafusion]

2025-08-15 Thread via GitHub
2010YOUY01 opened a new pull request, #17202: URL: https://github.com/apache/datafusion/pull/17202 ## Which issue does this PR close? - Closes #. ## Rationale for this change https://github.com/apache/datafusion/pull/16996 did a complete rewrite for Nested Lo

[PR] chore(deps): bump async-trait from 0.1.88 to 0.1.89 [datafusion]

2025-08-15 Thread via GitHub
dependabot[bot] opened a new pull request, #17203: URL: https://github.com/apache/datafusion/pull/17203 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.88 to 0.1.89. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's rel

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-08-15 Thread via GitHub
kosiew commented on PR #17090: URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3191074728 @adriangb PR is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Fix HashJoinExec sideways information passing for partitioned queries [datafusion]

2025-08-15 Thread via GitHub
nuno-faria commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3191090428 Thanks @adriangb for looking into it. I found some issues with the fix. 1. It appears to be causing a regression with the regular dynamic filter pushdown. Now more ro

Re: [I] Dynamic Filter Pushdown is being applied to the wrong table [datafusion]

2025-08-15 Thread via GitHub
nuno-faria commented on issue #17196: URL: https://github.com/apache/datafusion/issues/17196#issuecomment-3191093292 > So, the issue is the same as the one [#17188](https://github.com/apache/datafusion/issues/17188)? We need to consider the partitioned mode I don't think so, in this

Re: [PR] Fix dynamic filter pushdown in HashJoinExec::swap_inputs [datafusion]

2025-08-15 Thread via GitHub
nuno-faria commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2278692439 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -804,8 +805,8 @@ impl ExecutionPlan for HashJoinExec { self.mode, self.null_

Re: [PR] Fix dynamic filter pushdown in HashJoinExec::swap_inputs [datafusion]

2025-08-15 Thread via GitHub
xudong963 commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2278705941 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -804,8 +805,8 @@ impl ExecutionPlan for HashJoinExec { self.mode, self.null_e

Re: [PR] Fix dynamic filter pushdown in HashJoinExec::swap_inputs [datafusion]

2025-08-15 Thread via GitHub
xudong963 commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2278705941 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -804,8 +805,8 @@ impl ExecutionPlan for HashJoinExec { self.mode, self.null_e

Re: [PR] Improve GitHub actions/python workflows [datafusion-ballista]

2025-08-15 Thread via GitHub
milenkovicm commented on PR #1289: URL: https://github.com/apache/datafusion-ballista/pull/1289#issuecomment-3191393556 @Huy1Ng would it to too big change if we have this merged without MSRV and rust edition changes ? -- This is an automated message from the Apache Git Service. To re

Re: [I] Disproportionate memory use for `DISTINCT ON` query [datafusion]

2025-08-15 Thread via GitHub
Dandandan commented on issue #17169: URL: https://github.com/apache/datafusion/issues/17169#issuecomment-3191395841 > I.e. (very imprecise) calculate a key for each row as its visited. Remember just the latest key. Skip a row if the key matches the latest or update the latest key if not (pl

Re: [I] Blog post about using external indexes with Parquet [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17010: URL: https://github.com/apache/datafusion/issues/17010#issuecomment-3191436457 And the post is live: - https://datafusion.apache.org/blog/2025/08/15/external-parquet-indexes/ -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: add `datafusion-physical-adapter`, implement predicate adaptation missing fields of structs [datafusion]

2025-08-15 Thread via GitHub
adriangb merged PR #16589: URL: https://github.com/apache/datafusion/pull/16589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Simplify approach to avoiding memory corruption due to buffer reuse [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove commented on PR #2156: URL: https://github.com/apache/datafusion-comet/pull/2156#issuecomment-3191484295 > Looks good. I don't see this changing the behavior of native_iceberg_compat though; I assume that is covered by the previous PR? I haven't made any changes specific to

Re: [PR] chore: CometExecRule code cleanup [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove commented on code in PR #2159: URL: https://github.com/apache/datafusion-comet/pull/2159#discussion_r2279000583 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -543,20 +527,7 @@ case class CometExecRule(session: SparkSession) extends Rule[Spa

Re: [PR] docs: Update to support try arithmetic functions [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove merged PR #2143: URL: https://github.com/apache/datafusion-comet/pull/2143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Simplify approach to avoiding memory corruption due to buffer reuse [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove commented on PR #2156: URL: https://github.com/apache/datafusion-comet/pull/2156#issuecomment-3191530617 @mbutrovich could you review? See https://github.com/apache/datafusion-comet/pull/2156#issuecomment-3191484295 for an explanation of what other PRs were already merged and wha

Re: [PR] Refactor: Do not silently ignore errors in `stats_projection` [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17154: URL: https://github.com/apache/datafusion/pull/17154#issuecomment-319151 Thank you for the review @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Refactor: Do not silently ignore errors in `stats_projection` [datafusion]

2025-08-15 Thread via GitHub
alamb merged PR #17154: URL: https://github.com/apache/datafusion/pull/17154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Make parquet_encryption a non-default feature [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17137: URL: https://github.com/apache/datafusion/pull/17137#issuecomment-3191571473 > Should we also enable `parquet_encryption` for the `cargo test (macos-aarch64)` here? > > https://github.com/apache/datafusion/blob/5c370fa620eb05d07ad9ef70b5a8a959c46cefe6/.g

Re: [PR] feat: implement partition_statistics for HashJoinExec [datafusion]

2025-08-15 Thread via GitHub
xudong963 commented on code in PR #16956: URL: https://github.com/apache/datafusion/pull/16956#discussion_r2278540331 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -892,21 +892,56 @@ impl ExecutionPlan for HashJoinExec { } fn partition_statistics(&self, p

Re: [PR] feat: implement partition_statistics for HashJoinExec [datafusion]

2025-08-15 Thread via GitHub
xudong963 commented on code in PR #16956: URL: https://github.com/apache/datafusion/pull/16956#discussion_r2278539946 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -4752,4 +4787,96 @@ mod tests { fn columns(schema: &Schema) -> Vec { schema.fields().iter

Re: [I] Implement `partition_statistics` API for more operators [datafusion]

2025-08-15 Thread via GitHub
xudong963 commented on issue #15873: URL: https://github.com/apache/datafusion/issues/15873#issuecomment-3190957279 Thanks for all your help, the issue has made great progress! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] fix: respect inexact flags in row group metadata [datafusion]

2025-08-15 Thread via GitHub
CookiePieWw commented on PR #16412: URL: https://github.com/apache/datafusion/pull/16412#issuecomment-3190871170 Hi @alamb, this pr tried to extract the exactness flags in row group metadata, could you please take a look :) -- This is an automated message from the Apache Git Service. To r

Re: [PR] Fix: Show backtrace for ArrowError [datafusion]

2025-08-15 Thread via GitHub
2010YOUY01 commented on code in PR #17204: URL: https://github.com/apache/datafusion/pull/17204#discussion_r2278557812 ## datafusion-cli/tests/cli_integration.rs: ## @@ -332,3 +332,31 @@ SELECT COUNT(*) FROM hits; .env_remove("AWS_ENDPOINT") .pass_stdin(input))

[PR] Fix: Show backtrace for ArrowError [datafusion]

2025-08-15 Thread via GitHub
2010YOUY01 opened a new pull request, #17204: URL: https://github.com/apache/datafusion/pull/17204 ## Which issue does this PR close? - Closes #. ## Rationale for this change When an error is encountered inside DataFusion, the `backtrace` feature can be enabl

Re: [I] RFC: What table provider features would be helpful in an example? [datafusion]

2025-08-15 Thread via GitHub
timsaucer commented on issue #16821: URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3191270083 Also a discussion of table schema vs projection per this thread: https://the-asf.slack.com/archives/C04RJ0C85UZ/p1755228833949099?thread_ts=1755228436.402809&cid=C04RJ0C85UZ

Re: [PR] fix: respect inexact flags in row group metadata [datafusion]

2025-08-15 Thread via GitHub
CookiePieWw commented on code in PR #16412: URL: https://github.com/apache/datafusion/pull/16412#discussion_r2278845368 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1967,6 +2011,31 @@ fn create_max_min_accs( (max_values, min_values) } +/// Checks if any oc

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-08-15 Thread via GitHub
adriangb commented on PR #17090: URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3191433426 This is a monumental piece of work, I’m astounded! Thank you so much for working on this. I’ll try to review it but I immediately will ask if we can somehow split it up into

Re: [I] Add a way to get what takes memory [datafusion]

2025-08-15 Thread via GitHub
wiedld commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3192306155 @alamb -- as for more documentation, do you means adding code examples with the [TrackConsumersPool](https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.Tra

Re: [I] Blog post about using external indexes with Parquet [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17010: URL: https://github.com/apache/datafusion/issues/17010#issuecomment-3192311229 Yes, of course . I am very sorry about that. Do you mean this one? I tried really hard to copy/paste from github and I don't see the typo. I must be missing something http

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-08-15 Thread via GitHub
adriangb commented on code in PR #17090: URL: https://github.com/apache/datafusion/pull/17090#discussion_r2279678882 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -217,8 +234,25 @@ impl DynamicFilterPhysicalExpr { current.expr = new_expr;

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-15 Thread via GitHub
adriangb commented on PR #17201: URL: https://github.com/apache/datafusion/pull/17201#issuecomment-3192506155 hmm msvr is failing for 1.85.1 but if try locally: ``` ❯ cargo "+1.85.1" check Blocking waiting for file lock on package cache error: rustc 1.85.1 is not

Re: [PR] Fix: ListingTableFactory hive column detection [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth commented on PR #17050: URL: https://github.com/apache/datafusion/pull/17050#issuecomment-3193044063 @alamb Since #17049 and #17212 are now separate issues, would you like me to close this PR and split the fixes into new PRs so the PRs are more directly aligned with the issues?

[I] Enable the `ListFilesCache` to be available for partitioned tables [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth opened a new issue, #17211: URL: https://github.com/apache/datafusion/issues/17211 ### Is your feature request related to a problem or challenge? When using "high latency" storage (e.g. remote object stores, such as AWS S3) listing objects and collecting object metadata can

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-08-15 Thread via GitHub
adriangb commented on PR #17090: URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3192892953 > Testing (with nulls) is especially important too Yes a LOT of specific tests + fuzz tests are going to be needed to be certain we don't introduce bugs -- This is an autom

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3192913887 > I believe the most valuable think you can do is write up the usecases / file tickets @alamb done, I've filed #17211 which describes my use case and some of my current

[I] `ListingTableFactory` fails to read data when the final path element contains a `.` [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth opened a new issue, #17212: URL: https://github.com/apache/datafusion/issues/17212 ### Describe the bug If a path to an external table contains a `.` in the final path element (i.e. folders are named with `.` delimited versioning) tables created via the `ListingTableFactory

Re: [PR] Fix: ListingTableFactory hive column detection [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth commented on PR #17050: URL: https://github.com/apache/datafusion/pull/17050#issuecomment-3193006060 closes #17212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: implement_ansi_eval_mode_arithmetic [datafusion-comet]

2025-08-15 Thread via GitHub
kazuyukitanimura commented on PR #2136: URL: https://github.com/apache/datafusion-comet/pull/2136#issuecomment-3192287131 > Change the error message to match exactly with Spark @coderfender I agree. Do you plan to change `native/spark-expr/src/error.rs` ? Hopefully we can make a mes

Re: [I] Disproportionate memory use for `DISTINCT ON` query [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17169: URL: https://github.com/apache/datafusion/issues/17169#issuecomment-3192288176 > Also I saw that The memory usage is only from the accumulators in our case `first` Do you know if the memory usage is from needing a single row but the accumulator holdin

Re: [I] [datafusion-cli`] Add a way to see what object store requests are made [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17207: URL: https://github.com/apache/datafusion/issues/17207#issuecomment-3192293792 > As I was reading through this I was feeling somewhat concerned about how easy it might be to accidentally miss calls that should be instrumented, however, I think the implementa

Re: [PR] perf: Only perform deep copies for Parquet scans [experiment] [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove closed pull request #2158: perf: Only perform deep copies for Parquet scans [experiment] URL: https://github.com/apache/datafusion-comet/pull/2158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17090: URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3192672292 > Amazing work overall! A lot of the diff is updating debug outputs / slt tests. I think it will help a lot to split this up into multiple PRs so that e.g. that can be reviewed separat

Re: [PR] chore: fix typos [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17135: URL: https://github.com/apache/datafusion/pull/17135#issuecomment-3192673736 Is this one ready to merge? It looks like there are some conflicts to resolve and some unresolved comments -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Docs: Consolidate feature proposal content into roadmap [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17156: URL: https://github.com/apache/datafusion/pull/17156#issuecomment-3192674402 Thank you @comphead and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Docs: Consolidate feature proposal content into roadmap [datafusion]

2025-08-15 Thread via GitHub
alamb merged PR #17156: URL: https://github.com/apache/datafusion/pull/17156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] CometBatchIterator undefined behavior [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove commented on issue #2162: URL: https://github.com/apache/datafusion-comet/issues/2162#issuecomment-3192677819 There isn't really an issue after all. This was mostly about my lack of understanding of how ownership works with FFI. I attempt to explain this in https://github.com/apa

Re: [I] CometBatchIterator undefined behavior [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove closed issue #2162: CometBatchIterator undefined behavior URL: https://github.com/apache/datafusion-comet/issues/2162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [datafusion-cli`] Add a way to see what object store requests are made [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17207: URL: https://github.com/apache/datafusion/issues/17207#issuecomment-3192683132 > Do you think the metrics would be exposed through the API, or just the CLI? If I'm somewhat selfish here, I would personally like to see the API side of this data be exposed in

Re: [PR] Testing: Try test optimize performance for coalesce [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17193: URL: https://github.com/apache/datafusion/pull/17193#issuecomment-3192686724 > I will polish code and doc if we think this is the right direction. Sounds good to me. I am sorry I have somewhat lost track of the current status Shall we polish

Re: [I] CI failure on main at Cargo.lock:3676 [datafusion]

2025-08-15 Thread via GitHub
mbutrovich closed issue #17208: CI failure on main at Cargo.lock:3676 URL: https://github.com/apache/datafusion/issues/17208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix(ci): update `datafusion-physical-expr-adapter` version to 49.0.1in Cargo.lock [datafusion]

2025-08-15 Thread via GitHub
mbutrovich merged PR #17209: URL: https://github.com/apache/datafusion/pull/17209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Testing: Try test optimize performance for coalesce [datafusion]

2025-08-15 Thread via GitHub
alamb commented on code in PR #17193: URL: https://github.com/apache/datafusion/pull/17193#discussion_r2279854645 ## datafusion/physical-plan/src/coalesce/mod.rs: ## @@ -15,290 +15,158 @@ // specific language governing permissions and limitations // under the License. -use a

Re: [PR] fix: [branch-0.9] Backport FFI fix [datafusion-comet]

2025-08-15 Thread via GitHub
codecov-commenter commented on PR #2164: URL: https://github.com/apache/datafusion-comet/pull/2164#issuecomment-3192688564 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2164?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: implement QUALIFY clause [datafusion]

2025-08-15 Thread via GitHub
alamb commented on code in PR #16933: URL: https://github.com/apache/datafusion/pull/16933#discussion_r2279861071 ## docs/source/user-guide/sql/select.md: ## @@ -261,6 +262,14 @@ Example: SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10 ``` +## QUALIFY clause

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3191747706 > Perhaps we can take inspiration from [@kosiew](https://github.com/kosiew) and [#17021](https://github.com/apache/datafusion/pull/17021) and add some way to monitor what is happe

Re: [PR] docs: Add Arrow FFI documentation [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove closed pull request #2140: docs: Add Arrow FFI documentation URL: https://github.com/apache/datafusion-comet/pull/2140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Disproportionate memory use for `DISTINCT ON` query [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17169: URL: https://github.com/apache/datafusion/issues/17169#issuecomment-3191783825 Indeed -- if you already know the data is sorted on the `DISTINCT ON` keys, you can do deduplication with a single pass through the data with minimal memory requirements, followin

Re: [I] Disproportionate memory use for `DISTINCT ON` query [datafusion]

2025-08-15 Thread via GitHub
rluvaton commented on issue #17169: URL: https://github.com/apache/datafusion/issues/17169#issuecomment-3192021448 Implementing GroupAcculumators for First for every type using Rows should improve memory (I implemented it and saw benefit) -- This is an automated message from the Apache Gi

[I] Implement low-level debugging to show flow of batches through the system [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove opened a new issue, #2161: URL: https://github.com/apache/datafusion-comet/issues/2161 ### What is the problem the feature request solves? Debugging Comet is difficult 😞 I would like to be able to see the flow of batches through the system when some debug flag (or co

Re: [PR] Remove redundant `plan` from extension's check_invariants [datafusion]

2025-08-15 Thread via GitHub
alamb commented on code in PR #17199: URL: https://github.com/apache/datafusion/pull/17199#discussion_r2279420009 ## datafusion/expr/src/logical_plan/invariants.rs: ## @@ -74,7 +74,7 @@ pub fn assert_executable_invariants(plan: &LogicalPlan) -> Result<()> { fn assert_valid_ext

Re: [PR] Remove redundant `plan` from extension's check_invariants [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #17199: URL: https://github.com/apache/datafusion/pull/17199#issuecomment-3192074463 I worry that this is a backwards incompatible API and if someone has used this API in their downstream application, there is no way to update their code to get the same behavior (aka g

Re: [PR] Blog: Limit max width [datafusion-site]

2025-08-15 Thread via GitHub
alamb merged PR #101: URL: https://github.com/apache/datafusion-site/pull/101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [I] Set max width on blog to make it easier to read in larger displays [datafusion-site]

2025-08-15 Thread via GitHub
alamb closed issue #100: Set max width on blog to make it easier to read in larger displays URL: https://github.com/apache/datafusion-site/issues/100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add drop behavior to DROP PRIMARY/FOREIGN KEY [datafusion-sqlparser-rs]

2025-08-15 Thread via GitHub
mvzink commented on code in PR #2002: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2002#discussion_r2279791312 ## src/ast/ddl.rs: ## @@ -658,8 +659,31 @@ impl fmt::Display for AlterTableOperation { } ) } -

Re: [PR] feat: support `Utf8View` for more args of `regexp_replace` [datafusion]

2025-08-15 Thread via GitHub
mbutrovich commented on code in PR #17195: URL: https://github.com/apache/datafusion/pull/17195#discussion_r2279783095 ## datafusion/functions/src/regex/regexpreplace.rs: ## @@ -238,15 +258,17 @@ fn regex_replace_posix_groups(replacement: &str) -> String { /// # Ok(()) /// #

Re: [PR] trivial: remove unnecessary clone() [datafusion-comet]

2025-08-15 Thread via GitHub
parthchandra commented on PR #2066: URL: https://github.com/apache/datafusion-comet/pull/2066#issuecomment-3192596908 @isimluk could you rebase this on main? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] fix: [branch-0.9] Backport FFI fix [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove opened a new pull request, #2164: URL: https://github.com/apache/datafusion-comet/pull/2164 ## Which issue does this PR close? N/A ## Rationale for this change Backport recent FFI fixes that were discovered during Iceberg integration work.

Re: [PR] feat: implement QUALIFY clause [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #16933: URL: https://github.com/apache/datafusion/pull/16933#issuecomment-3192721835 > Hi @Vedin , thanks for bringing up these two use cases for `qualify`. I wasn’t aware of them before. However, I’ve been quite busy lately and won’t be able to include them in this PR

Re: [PR] feat: implement QUALIFY clause [datafusion]

2025-08-15 Thread via GitHub
alamb commented on PR #16933: URL: https://github.com/apache/datafusion/pull/16933#issuecomment-3192722107 Thanks again @haohuaijin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] QUALIFY clause [datafusion]

2025-08-15 Thread via GitHub
alamb closed issue #15485: QUALIFY clause URL: https://github.com/apache/datafusion/issues/15485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

Re: [PR] feat: implement QUALIFY clause [datafusion]

2025-08-15 Thread via GitHub
alamb merged PR #16933: URL: https://github.com/apache/datafusion/pull/16933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Support aggregates and constant filters in `QUALIFY` [datafusion]

2025-08-15 Thread via GitHub
alamb opened a new issue, #17210: URL: https://github.com/apache/datafusion/issues/17210 @haohuaijin added support for `QUALIFY` in - https://github.com/apache/datafusion/pull/16933 @Vedin has pointed out some follow on work here Hi @haohuaijin, I accidentally worked on

Re: [PR] feat: support `Utf8View` for more args of `regexp_replace` [datafusion]

2025-08-15 Thread via GitHub
mbutrovich commented on PR #17195: URL: https://github.com/apache/datafusion/pull/17195#issuecomment-3192718767 Tested clickbench q28 which has `regexp_replace` and no regression on the fast-path: main ``` SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1')

Re: [I] Bug auto detecting partitions with `ListingTableFactory` on Hive partitioned datasets [datafusion]

2025-08-15 Thread via GitHub
alamb commented on issue #17049: URL: https://github.com/apache/datafusion/issues/17049#issuecomment-3192732752 > [@alamb](https://github.com/alamb) I think the updated title is more descriptive. It does lose the notion that, even in non-partitioned datasets, paths with a `.` in the final e

Re: [PR] chore: CometExecRule code cleanup [datafusion-comet]

2025-08-15 Thread via GitHub
comphead commented on code in PR #2159: URL: https://github.com/apache/datafusion-comet/pull/2159#discussion_r2279878315 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -818,31 +816,43 @@ case class CometExecRule(session: SparkSession) extends Rule[Spa

Re: [PR] chore: CometExecRule code cleanup [datafusion-comet]

2025-08-15 Thread via GitHub
comphead commented on code in PR #2159: URL: https://github.com/apache/datafusion-comet/pull/2159#discussion_r2279877744 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -818,31 +816,43 @@ case class CometExecRule(session: SparkSession) extends Rule[Spa

Re: [PR] chore: CometExecRule code cleanup [datafusion-comet]

2025-08-15 Thread via GitHub
comphead commented on code in PR #2159: URL: https://github.com/apache/datafusion-comet/pull/2159#discussion_r2279876687 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -794,11 +779,24 @@ case class CometExecRule(session: SparkSession) extends Rule[Spa

Re: [PR] Add Memory Profiling Support to DataFusion CLI [datafusion]

2025-08-15 Thread via GitHub
alamb commented on code in PR #17021: URL: https://github.com/apache/datafusion/pull/17021#discussion_r2279414621 ## datafusion-cli/README.md: ## @@ -30,3 +30,33 @@ DataFusion CLI (`datafusion-cli`) is a small command line utility that runs SQL ## Where can I find more informa

Re: [I] Blog post about using external indexes with Parquet [datafusion]

2025-08-15 Thread via GitHub
JigaoLuo commented on issue #17010: URL: https://github.com/apache/datafusion/issues/17010#issuecomment-3192251977 Hi @alamb Thanks so much for inviting me and for sharing the post on LinkedIn! I really appreciate it. https://www.linkedin.com/posts/apache-datafusion_using-external-

[PR] fix(ci): update `datafusion-physical-expr-adapter` version to 49.0.1in Cargo.lock [datafusion]

2025-08-15 Thread via GitHub
miroim opened a new pull request, #17209: URL: https://github.com/apache/datafusion/pull/17209 ## Which issue does this PR close? - Closes #17208. ## Rationale for this change https://github.com/apache/datafusion/blob/0a024a2f0e64194042c2965804dce20669047113/Cargo.lo

Re: [I] Bug auto detecting partitions with `ListingTableFactory` on Hive partitioned datasets [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth commented on issue #17049: URL: https://github.com/apache/datafusion/issues/17049#issuecomment-3192236107 @alamb I think the updated title is more descriptive. It does lose the notion that, even in non-partitioned datasets, paths with a `.` in the final element could fail to disco

Re: [PR] feat: support `Utf8View` for more args of `regexp_replace` [datafusion]

2025-08-15 Thread via GitHub
mbutrovich commented on PR #17195: URL: https://github.com/apache/datafusion/pull/17195#issuecomment-3192332664 I'm afraid I might have lost the plot with all of those string args. I looked at other string functions to see if a huge `match` is how we handle this, but the worst I could find

Re: [PR] Fix dynamic filter pushdown in HashJoinExec::swap_inputs [datafusion]

2025-08-15 Thread via GitHub
adriangb commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2279613226 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -804,8 +805,8 @@ impl ExecutionPlan for HashJoinExec { self.mode, self.null_eq

Re: [I] Disproportionate memory use for `DISTINCT ON` query [datafusion]

2025-08-15 Thread via GitHub
rluvaton commented on issue #17169: URL: https://github.com/apache/datafusion/issues/17169#issuecomment-3192027129 Also I was that The memory usage is only from the accumulators in our case `first` -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] [datafusion-cli`] Add a way to see what object store requests are made [datafusion]

2025-08-15 Thread via GitHub
BlakeOrth commented on issue #17207: URL: https://github.com/apache/datafusion/issues/17207#issuecomment-3192198520 As I was reading through this I was feeling somewhat concerned about how easy it might be to accidentally miss calls that should be instrumented, however, I think the implemen

[PR] Add drop behavior to DROP PRIMARY/FOREIGN KEY [datafusion-sqlparser-rs]

2025-08-15 Thread via GitHub
yoavcloud opened a new pull request, #2002: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2002 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] chore: Improve Arrow FFI documentation [datafusion-comet]

2025-08-15 Thread via GitHub
andygrove opened a new pull request, #2163: URL: https://github.com/apache/datafusion-comet/pull/2163 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] feat: support `Utf8View` for more args of `regexp_replace` [datafusion]

2025-08-15 Thread via GitHub
mbutrovich commented on PR #17195: URL: https://github.com/apache/datafusion/pull/17195#issuecomment-3192570690 I'm not proud of the readability of the change, but at least there doesn't seem to be a meaningful performance regression: main ``` regexp_replace_1000 time: [1.

Re: [PR] Improve GitHub actions/python workflows [datafusion-ballista]

2025-08-15 Thread via GitHub
Huy1Ng commented on PR #1289: URL: https://github.com/apache/datafusion-ballista/pull/1289#issuecomment-3193094229 I keep the Python portion only. The later rust update seem to improve compiling time so would be nice to have that as well. -- This is an automated message from the Apache G

Re: [PR] Track peak_mem_used in ExternalSorter [datafusion]

2025-08-15 Thread via GitHub
github-actions[bot] closed pull request #16192: Track peak_mem_used in ExternalSorter URL: https://github.com/apache/datafusion/pull/16192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore(deps): Update sqlparser to 0.58 [datafusion]

2025-08-15 Thread via GitHub
Jefffrey commented on PR #16456: URL: https://github.com/apache/datafusion/pull/16456#issuecomment-3193271158 Moving to draft as I work to build upon this to update to 0.58 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] [iceberg] Iceberg data files not found when Comet vectorized reader is enabled [datafusion-comet]

2025-08-15 Thread via GitHub
hsiang-c commented on issue #2116: URL: https://github.com/apache/datafusion-comet/issues/2116#issuecomment-3193385318 Fixed by `spark.comet.exec.broadcastExchange.enabled=false` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] chore: add docs for how to use TrackConsumersPool [datafusion]

2025-08-15 Thread via GitHub
wiedld opened a new pull request, #17213: URL: https://github.com/apache/datafusion/pull/17213 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/16904 ## Rationale for this change Expand docs, with code examples, for the current fu

[PR] docs: Update confs to bypass Iceberg Spark issues [datafusion-comet]

2025-08-15 Thread via GitHub
hsiang-c opened a new pull request, #2166: URL: https://github.com/apache/datafusion-comet/pull/2166 - Document current limitation ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR?

Re: [PR] fix: respect inexact flags in row group metadata [datafusion]

2025-08-15 Thread via GitHub
xudong963 commented on code in PR #16412: URL: https://github.com/apache/datafusion/pull/16412#discussion_r2278570400 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1967,6 +2011,31 @@ fn create_max_min_accs( (max_values, min_values) } +/// Checks if any occu

Re: [PR] feat: add `datafusion-physical-adapter`, implement predicate adaptation missing fields of structs [datafusion]

2025-08-15 Thread via GitHub
kosiew commented on PR #16589: URL: https://github.com/apache/datafusion/pull/16589#issuecomment-3191016336 @alamb, @adriangb, Yes, PR is ok with me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Minor: Fix compiler warning when compiling `datafusion-cli` [datafusion]

2025-08-15 Thread via GitHub
2010YOUY01 opened a new pull request, #17205: URL: https://github.com/apache/datafusion/pull/17205 ## Which issue does this PR close? - Closes #. ## Rationale for this change When compiling `datafusion-cli` with `cargo run`, there is a compiler warning: ``

Re: [PR] Fix HashJoinExec sideways information passing for partitioned queries [datafusion]

2025-08-15 Thread via GitHub
nuno-faria commented on code in PR #17197: URL: https://github.com/apache/datafusion/pull/17197#discussion_r2278631368 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -288,3 +288,33 @@ physical_plan DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafu

[I] Support Postgres' `JSON_OBJECT` function `RETURNING` clause [datafusion-sqlparser-rs]

2025-08-15 Thread via GitHub
adamchainz opened a new issue, #2000: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2000 [`json_object()`](https://www.postgresql.org/docs/current/functions-json.html#:~:text=%20json_object%20() supports quite broad syntax: ``` json_object ( [ { key_expression { VA

  1   2   >