Re: [I] Cast from DATE to VARCHAR fails [datafusion]

2025-09-12 Thread via GitHub
findepi commented on issue #17533: URL: https://github.com/apache/datafusion/issues/17533#issuecomment-3284008393 `arrow_cast` to `Utf8` works `arrow_cast` to `Utf8View` fails -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Cast from DATE to VARCHAR fails [datafusion]

2025-09-12 Thread via GitHub
findepi commented on issue #17533: URL: https://github.com/apache/datafusion/issues/17533#issuecomment-3284075814 We probably need to fix `can_cast_types` too - https://github.com/apache/arrow-rs/pull/8328 -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Prepare for Merge Queue [datafusion]

2025-09-12 Thread via GitHub
Jefffrey commented on PR #17183: URL: https://github.com/apache/datafusion/pull/17183#issuecomment-3284090122 > Merge Queue only waits for the required steps - us all checks are important hence I made them all required. It'll become impossible to merge a PR if some of the checks are not p

[I] Panic when cast from `DATE` to `TIMESTAMP` overflows [datafusion]

2025-09-12 Thread via GitHub
findepi opened a new issue, #17534: URL: https://github.com/apache/datafusion/issues/17534 ### Describe the bug Casting a date to timestamp involves multiplication and may overflow. DataFusion should report query error in such case, rather than panicking. ### To Reproduce

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-12 Thread via GitHub
KR-bluejay commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2336556200 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

Re: [PR] feat(spark): implement Spark `try_parse_url` function [datafusion]

2025-09-12 Thread via GitHub
Jefffrey commented on code in PR #17485: URL: https://github.com/apache/datafusion/pull/17485#discussion_r2340394000 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -47,23 +46,7 @@ impl Default for ParseUrl { impl ParseUrl { pub fn new() -> Self { Self {

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-12 Thread via GitHub
milenkovicm commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2336689475 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

[I] Cast from DATE to VARCHAR fails [datafusion]

2025-09-12 Thread via GitHub
findepi opened a new issue, #17533: URL: https://github.com/apache/datafusion/issues/17533 ### Describe the bug ``` DataFusion CLI v50.0.0 > SELECT CAST(DATE '-12-31' AS varchar); This feature is not implemented: Unsupported CAST from Date32 to Utf8View ``` ### T

Re: [PR] Prepare for Merge Queue [datafusion]

2025-09-12 Thread via GitHub
blaginin commented on PR #17183: URL: https://github.com/apache/datafusion/pull/17183#issuecomment-3284132932 Will resolve now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Panic when cast from `DATE` to `TIMESTAMP` overflows [datafusion]

2025-09-12 Thread via GitHub
findepi commented on issue #17534: URL: https://github.com/apache/datafusion/issues/17534#issuecomment-3284162376 "attempt to multiply with overflow" is a standard Rust check not present in release builds.: ``` datafusion main$ cargo run --release --bin datafusion-cli -- --command

[PR] Support `CAST` from temporal to `Utf8View` [datafusion]

2025-09-12 Thread via GitHub
findepi opened a new pull request, #17535: URL: https://github.com/apache/datafusion/pull/17535 - fixes https://github.com/apache/datafusion/issues/17533 - requires https://github.com/apache/arrow-rs/pull/8328 -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

2025-09-12 Thread via GitHub
LiaCastaneda commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3284143582 > I think building an IN LIST in the dynamic filter (in addition to min/max) allows us to leverage the existing machinery that builds literal guarantees in PruningPredicate

Re: [PR] feat: pass the ordering information to native Scan [datafusion-comet]

2025-09-12 Thread via GitHub
codecov-commenter commented on PR #2375: URL: https://github.com/apache/datafusion-comet/pull/2375#issuecomment-3276620996 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2375?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] fix: Change `OuterReferenceColumn` to contain the entire outer field to prevent metadata loss [datafusion]

2025-09-12 Thread via GitHub
Kontinuation opened a new pull request, #17524: URL: https://github.com/apache/datafusion/pull/17524 ## Which issue does this PR close? - Closes #17422. ## Rationale for this change As reported by #17422, extension metadata was dropped in some queries involving subquerie

[PR] Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API [datafusion]

2025-09-12 Thread via GitHub
Jefffrey opened a new pull request, #17536: URL: https://github.com/apache/datafusion/pull/17536 ## Which issue does this PR close? - Closes #2409 and #2407 ## Rationale for this change So we can use these distinct aggregates via DataFrames ## What

Re: [PR] fix: synchronize partition bounds reporting in HashJoin [datafusion]

2025-09-12 Thread via GitHub
adriangb commented on PR #17452: URL: https://github.com/apache/datafusion/pull/17452#issuecomment-3271833195 > @adriangb I think there is opportunity to simplify the bounds collection for each partition. That is, we can probably just track the min/max across all partitions and build a sing

Re: [PR] Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API [datafusion]

2025-09-12 Thread via GitHub
Jefffrey commented on code in PR #17536: URL: https://github.com/apache/datafusion/pull/17536#discussion_r2343365209 ## datafusion/functions-aggregate/src/average.rs: ## @@ -62,6 +62,17 @@ make_udaf_expr_and_func!( avg_udaf ); +pub fn avg_distinct(expr: Expr) -> Expr { +

Re: [I] High CPU during dynamic filter bound computation: min_batch/max_batch [datafusion]

2025-09-12 Thread via GitHub
LiaCastaneda commented on issue #17486: URL: https://github.com/apache/datafusion/issues/17486#issuecomment-3270213815 cc @adriangb -- sorry for the direct ping. Since you did most of the dynamic filtering work, do you know if this behavior is expected? šŸ™‡ā€ā™€ļø -- This is an automated messa

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-12 Thread via GitHub
KR-bluejay commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2336699580 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

Re: [PR] Refactor HashJoinExec to progressively accumulate dynamic filter bounds instead of computing them after data is accumulated [datafusion]

2025-09-12 Thread via GitHub
adriangb commented on PR #17444: URL: https://github.com/apache/datafusion/pull/17444#issuecomment-3271240054 > Benchmark clickbench_extended.json > │ QQuery 1 │ 1216.87 ms │ 1446.94 ms │ 1.19x slower │ I think this is noise, Q1 is: https://github.com/apache

Re: [I] Refactor and improve job data cleanup logic [datafusion-ballista]

2025-09-12 Thread via GitHub
milenkovicm commented on issue #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316#issuecomment-3279895972 I think you could make an epic in this area There are few things I would like to propose regarding to files: 1. shuffle files are absolute paths, i woul

Re: [I] Refactor and improve job data cleanup logic [datafusion-ballista]

2025-09-12 Thread via GitHub
KR-bluejay commented on issue #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316#issuecomment-3279848468 I’d be happy to work on this as a follow-up, if this direction sounds good. -- This is an automated message from the Apache Git Service. To respond to the message, p

[PR] Disable `required_status_checks` for now [datafusion]

2025-09-12 Thread via GitHub
blaginin opened a new pull request, #17537: URL: https://github.com/apache/datafusion/pull/17537 Follow up https://github.com/apache/datafusion/pull/17183#issuecomment-3284090122 As a hotfix, disabled the checks to unblock @Jefffrey -- This is an automated message from the

[PR] Update Bug issue template to use Bug issue type [datafusion]

2025-09-12 Thread via GitHub
findepi opened a new pull request, #17540: URL: https://github.com/apache/datafusion/pull/17540 Same for Feature issue template. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[I] Consumer receives duplicate predicates when join mode is CollectLeft [datafusion]

2025-09-12 Thread via GitHub
LiaCastaneda opened a new issue, #17541: URL: https://github.com/apache/datafusion/issues/17541 ### Describe the bug I see duplicated OR clauses on the DynamicPhysicalExpr I get in the consumer for an execution plan like this: ``` ProjectionExec: expr=[c0@0 as c0, c1@1

Re: [PR] Blog: Add table of contents to blog article [datafusion-site]

2025-09-12 Thread via GitHub
alamb merged PR #107: URL: https://github.com/apache/datafusion-site/pull/107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Disable `required_status_checks` for now [datafusion]

2025-09-12 Thread via GitHub
blaginin merged PR #17537: URL: https://github.com/apache/datafusion/pull/17537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Rename Blaze to Auron [datafusion]

2025-09-12 Thread via GitHub
Jefffrey merged PR #17532: URL: https://github.com/apache/datafusion/pull/17532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Update Bug issue template to use Bug issue type [datafusion]

2025-09-12 Thread via GitHub
Jefffrey merged PR #17540: URL: https://github.com/apache/datafusion/pull/17540 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Blog: Add table of contents to blog article [datafusion-site]

2025-09-12 Thread via GitHub
nuno-faria commented on PR #107: URL: https://github.com/apache/datafusion-site/pull/107#issuecomment-3285025510 > Once we merge this PR, should we add TOC's to the older posts as well? Yeah I think so. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Fix predicate simplification for incompatible types in push_down_filter [datafusion]

2025-09-12 Thread via GitHub
alamb commented on code in PR #17521: URL: https://github.com/apache/datafusion/pull/17521#discussion_r2343741787 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -395,3 +395,28 @@ order by t1.k, t2.v; 1 1 1 1000 1000 1000 + +# Regression tes

Re: [I] Add Binary/LargeBinary/BinaryView/FixedSizeBinary to join_fuzz [datafusion]

2025-09-12 Thread via GitHub
alamb closed issue #17447: Add Binary/LargeBinary/BinaryView/FixedSizeBinary to join_fuzz URL: https://github.com/apache/datafusion/issues/17447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: Add binary to `join_fuzz` testing [datafusion]

2025-09-12 Thread via GitHub
alamb merged PR #17497: URL: https://github.com/apache/datafusion/pull/17497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add binary to `join_fuzz` testing [datafusion]

2025-09-12 Thread via GitHub
alamb commented on PR #17497: URL: https://github.com/apache/datafusion/pull/17497#issuecomment-3285061792 Thank you @jonathanc-n and @findepi I agree it might be worth figuring out some way to expand coverage that doesn't require so many individual tests to be added / updated šŸ¤”

Re: [I] application of simple optimizer rule produces incorrect results (DF 49 regression) [datafusion]

2025-09-12 Thread via GitHub
alamb commented on issue #17510: URL: https://github.com/apache/datafusion/issues/17510#issuecomment-3285068318 I think we can close this ticket as not a bug, is that ok @wkalt ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Nested Loop Join: Performance Regression in DataFusion 50 for Suboptimal Join Orderings [datafusion]

2025-09-12 Thread via GitHub
alamb closed issue #17488: Nested Loop Join: Performance Regression in DataFusion 50 for Suboptimal Join Orderings URL: https://github.com/apache/datafusion/issues/17488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-12 Thread via GitHub
xudong963 commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3284639539 Hi @comphead, i have a severe headache today and i'm be off. We may need to start the voting process today, could you please help me do it? Thanks! -- This is an aut

Re: [PR] Refactor TableProvider::scan into TableProvider::scan_with_args [datafusion]

2025-09-12 Thread via GitHub
alamb commented on code in PR #17336: URL: https://github.com/apache/datafusion/pull/17336#discussion_r2344142167 ## datafusion/catalog/src/table.rs: ## @@ -299,6 +334,119 @@ pub trait TableProvider: Debug + Sync + Send { } } +/// Arguments for scanning a table with [`Ta

Re: [I] Nested Loop Join: Performance Regression in DataFusion 50 for Suboptimal Join Orderings [datafusion]

2025-09-12 Thread via GitHub
alamb commented on issue #17488: URL: https://github.com/apache/datafusion/issues/17488#issuecomment-3285079247 Per the comments above, let's close this ticket. @tobixdev / @2010YOUY01 do you think it is worth tracking a potential performance improvement as a separate ticket, or is

Re: [PR] Introduce wildcard const for FixedSizeBinary type signature [datafusion]

2025-09-12 Thread via GitHub
findepi commented on PR #17531: URL: https://github.com/apache/datafusion/pull/17531#issuecomment-3285251362 I realized we may get away without introducing a generic type concept. The existing `TypeSignature::Coercible`'s `TypeSignatureClass` seems to be more or less it. did you try to

Re: [PR] Fix predicate simplification for incompatible types in push_down_filter [datafusion]

2025-09-12 Thread via GitHub
findepi merged PR #17521: URL: https://github.com/apache/datafusion/pull/17521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] Logical optimizer pushdown_filters rule fails with relatively simple query [datafusion]

2025-09-12 Thread via GitHub
findepi closed issue #17512: Logical optimizer pushdown_filters rule fails with relatively simple query URL: https://github.com/apache/datafusion/issues/17512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add assertion that ScalarUDFImpl implementation is consistent with declared return type [datafusion]

2025-09-12 Thread via GitHub
findepi commented on code in PR #17515: URL: https://github.com/apache/datafusion/pull/17515#discussion_r2344236784 ## datafusion/expr/src/udf.rs: ## @@ -233,7 +233,25 @@ impl ScalarUDF { /// /// See [`ScalarUDFImpl::invoke_with_args`] for details. pub fn invoke_w

Re: [PR] add a ci job for typo checking [datafusion]

2025-09-12 Thread via GitHub
findepi commented on PR #17339: URL: https://github.com/apache/datafusion/pull/17339#issuecomment-3285281075 This new check just got a stupid typo i made in a hurry. Thank you for catching me! -- This is an automated message from the Apache Git Service. To respond to the message, please l

[PR] Using `encode_arrow_schema` from arrow-rs. [datafusion]

2025-09-12 Thread via GitHub
samueleresca opened a new pull request, #17543: URL: https://github.com/apache/datafusion/pull/17543 ## Which issue does this PR close? - Closes #17542 17542 ## What changes are included in this PR? - Removing the custom `encode_arrow_schema` implementation. - Removing

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-12 Thread via GitHub
alamb commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3285476190 I have not forgotten about this PR -- I am just busy with other projects now. I will come back to this soon (TM) -- This is an automated message from the Apache Git Service. To respo

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-09-12 Thread via GitHub
alamb commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3285523017 the PR to add correlated subquery support is here: - https://github.com/apache/datafusion/pull/17110 I think with that let's claim this project is done. Thank you

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-09-12 Thread via GitHub
alamb closed issue #16059: [Epic]: Google Summer of Code 2025 Correlated Subquery Support URL: https://github.com/apache/datafusion/issues/16059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: [1941-Part2]: Introduce map_to_list scalar function [datafusion-comet]

2025-09-12 Thread via GitHub
comphead commented on PR #2312: URL: https://github.com/apache/datafusion-comet/pull/2312#issuecomment-3285609973 Thanks @rishvin appreciate if you can take #2388 fill up the context and we can go through PRs once again -- This is an automated message from the Apache Git Service. To re

Re: [PR] Revert #17295 (Support from-first SQL syntax) [datafusion]

2025-09-12 Thread via GitHub
adriangb commented on PR #17520: URL: https://github.com/apache/datafusion/pull/17520#issuecomment-3285732843 > Need to revert doc changes too: done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[I] Numeric overflow should result in query error [datafusion]

2025-09-12 Thread via GitHub
findepi opened a new issue, #17539: URL: https://github.com/apache/datafusion/issues/17539 ### Describe the bug In programming languages it's rather typical that numeric overflows are not a runtime error. In SQL, it's rather typical that numeric overflows are identified during

[I] TPC-DS query #88 fails with disabled AQE [datafusion-comet]

2025-09-12 Thread via GitHub
and124578963 opened a new issue, #2389: URL: https://github.com/apache/datafusion-comet/issues/2389 ### Describe the bug When running the [TPC-DS query 88](https://github.com/apache/doris/blob/master/tools/tpcds-tools/queries/sf1000/query88.sql) with **"spark.sql.adaptive.enabled": "

Re: [PR] Revert #17295 (Support from-first SQL syntax) [datafusion]

2025-09-12 Thread via GitHub
alamb commented on PR #17520: URL: https://github.com/apache/datafusion/pull/17520#issuecomment-3285898307 Thank you @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[I] [iceberg] Tracking PR for deleted rows support [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra opened a new issue, #2390: URL: https://github.com/apache/datafusion-comet/issues/2390 Tracking https://github.com/apache/iceberg/pull/14062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] fix: Preserves field metadata when creating logical plan for VALUES expression [datafusion]

2025-09-12 Thread via GitHub
Kontinuation commented on PR #17525: URL: https://github.com/apache/datafusion/pull/17525#issuecomment-3286053055 CC @paleolimbot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[I] Implement CometInMemoryTableScanExec [datafusion-comet]

2025-09-12 Thread via GitHub
andygrove opened a new issue, #2391: URL: https://github.com/apache/datafusion-comet/issues/2391 ### What is the problem the feature request solves? In queries with InMemoryTableScanExec, we have to perform a CometColumnarToRowExec to convert to row format, and then the rest of the qu

Re: [I] Only compute bounds/ dynamic filters if consumer asks for it [datafusion]

2025-09-12 Thread via GitHub
adriangb commented on issue #17527: URL: https://github.com/apache/datafusion/issues/17527#issuecomment-3285593615 I think the pro of the subscriber option is: 1. Like you say it can be applied to only some filters and not others. 2. It avoids making the general filter pushdown API more

Re: [PR] Blog: Add table of contents to blog article [datafusion-site]

2025-09-12 Thread via GitHub
alamb commented on PR #107: URL: https://github.com/apache/datafusion-site/pull/107#issuecomment-3284773235 Once we merge this PR, should we add TOC's to the older posts as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] Removing ad-hoc implementation of `encode_arrow_schema` [datafusion]

2025-09-12 Thread via GitHub
samueleresca opened a new issue, #17542: URL: https://github.com/apache/datafusion/issues/17542 ### Is your feature request related to a problem or challenge? The implementation of `encode_arrow_schema` is now included and exposed by arrow-rs (see: https://github.com/apache/arrow-rs/p

Re: [PR] Fix predicate simplification for incompatible types in push_down_filter [datafusion]

2025-09-12 Thread via GitHub
alamb commented on code in PR #17521: URL: https://github.com/apache/datafusion/pull/17521#discussion_r2344321562 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -395,3 +395,28 @@ order by t1.k, t2.v; 1 1 1 1000 1000 1000 + +# Regression tes

Re: [PR] fix: output_ordering converted to Vec> [datafusion]

2025-09-12 Thread via GitHub
destrex271 commented on code in PR #17439: URL: https://github.com/apache/datafusion/pull/17439#discussion_r2345062474 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -383,7 +383,7 @@ impl FileScanConfigBuilder { /// Set the output ordering of the files pub f

Re: [PR] feat: feature specific tests [datafusion-comet]

2025-09-12 Thread via GitHub
codecov-commenter commented on PR #2372: URL: https://github.com/apache/datafusion-comet/pull/2372#issuecomment-3286531985 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2372?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Regression: projection pushdown doesn't work as expected in DF50 [datafusion]

2025-09-12 Thread via GitHub
alamb closed issue #17513: Regression: projection pushdown doesn't work as expected in DF50 URL: https://github.com/apache/datafusion/issues/17513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Revert #17295 (Support from-first SQL syntax) [datafusion]

2025-09-12 Thread via GitHub
alamb commented on PR #17520: URL: https://github.com/apache/datafusion/pull/17520#issuecomment-3286659694 This one looks good thanks everyone! I also made a backport PR to `branch-50` branch so we can include it in the RC - https://github.com/apache/datafusion/pull/17544 -- Thi

[PR] [branch-50] fix: Implement AggregateUDFImpl::reverse_expr for StringAgg (#17165) (#17473) [datafusion]

2025-09-12 Thread via GitHub
alamb opened a new pull request, #17544: URL: https://github.com/apache/datafusion/pull/17544 ## Which issue does this PR close? - related to https://github.com/apache/datafusion/issues/17513 - related to #16799 ## Rationale for this change We fixed a bug found in

[I] API Suggestion match casing for isnan and is_null [datafusion-python]

2025-09-12 Thread via GitHub
ntjohnson1 opened a new issue, #1235: URL: https://github.com/apache/datafusion-python/issues/1235 Maybe this is more of a question than a suggestion. Is there a reason they both aren't `is_nan` and `is_null`? It looks like this is a carry over from the underlying rust and it isn't clear to

[PR] Trying cargo machete to prune unused deps. [datafusion]

2025-09-12 Thread via GitHub
samueleresca opened a new pull request, #17545: URL: https://github.com/apache/datafusion/pull/17545 Very much a test with `cargo machete` to understand all the false positives detected by the tooling. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Using `encode_arrow_schema` from arrow-rs. [datafusion]

2025-09-12 Thread via GitHub
alamb merged PR #17543: URL: https://github.com/apache/datafusion/pull/17543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Revert #17295 (Support from-first SQL syntax) [datafusion]

2025-09-12 Thread via GitHub
adriangb commented on PR #17520: URL: https://github.com/apache/datafusion/pull/17520#issuecomment-3286713964 @simonvandel I'm sorry we had to revert your contribution. I hope you're able to contribute again and now that we have the regression test and clarity about the behavior it should b

Re: [I] Removing ad-hoc implementation of `encode_arrow_schema` [datafusion]

2025-09-12 Thread via GitHub
alamb closed issue #17542: Removing ad-hoc implementation of `encode_arrow_schema` URL: https://github.com/apache/datafusion/issues/17542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add `TableProvider::scan_with_args` [datafusion]

2025-09-12 Thread via GitHub
adriangb commented on PR #17336: URL: https://github.com/apache/datafusion/pull/17336#issuecomment-3286697680 @alamb I believe I've addressed your feedback, thank you for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Add assertion that ScalarUDFImpl implementation is consistent with declared return type [datafusion]

2025-09-12 Thread via GitHub
alamb commented on code in PR #17515: URL: https://github.com/apache/datafusion/pull/17515#discussion_r2343788618 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -222,7 +223,7 @@ fn spark_parse_url(args: &[ArrayRef]) -> Result { ) }

Re: [PR] use or instead of and for hash join filter pushdown [datafusion]

2025-09-12 Thread via GitHub
adriangb closed pull request #17461: use or instead of and for hash join filter pushdown URL: https://github.com/apache/datafusion/pull/17461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add `TableProvider::scan_with_args` [datafusion]

2025-09-12 Thread via GitHub
alamb commented on code in PR #17336: URL: https://github.com/apache/datafusion/pull/17336#discussion_r2345346409 ## datafusion/catalog/src/table.rs: ## @@ -171,6 +171,38 @@ pub trait TableProvider: Debug + Sync + Send { limit: Option, ) -> Result>; +/// Crea

Re: [PR] Fix predicate simplification for incompatible types in push_down_filter [datafusion]

2025-09-12 Thread via GitHub
findepi commented on code in PR #17521: URL: https://github.com/apache/datafusion/pull/17521#discussion_r2344231059 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -395,3 +395,28 @@ order by t1.k, t2.v; 1 1 1 1000 1000 1000 + +# Regression t

[PR] fix: ignore non-existent columns when adding filter equivalence info in `FileScanConfig` [datafusion]

2025-09-12 Thread via GitHub
rkrishn7 opened a new pull request, #17546: URL: https://github.com/apache/datafusion/pull/17546 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/17511 ## Rationale for this change When building equal conditions in a data sourc

Re: [PR] fix: ignore non-existent columns when adding filter equivalence info in `FileScanConfig` [datafusion]

2025-09-12 Thread via GitHub
rkrishn7 commented on PR #17546: URL: https://github.com/apache/datafusion/pull/17546#issuecomment-3286735863 The fact that `reassign_predicate_columns` can return invalid column expressions seems like a big footgun. I didn't change within this PR due to usage elsewhere but we might want to

Re: [PR] fix: ignore non-existent columns when adding filter equivalence info in `FileScanConfig` [datafusion]

2025-09-12 Thread via GitHub
rkrishn7 commented on PR #17546: URL: https://github.com/apache/datafusion/pull/17546#issuecomment-3286768072 Re-ran TPCH benchmark with the same configuration as the referenced issue and all the tests pass now. Will add a regression test here in a bit! -- This is an automated mess

Re: [PR] docs: Update documentation on Epics and Sponsoring Maintainers [datafusion]

2025-09-12 Thread via GitHub
comphead commented on PR #17505: URL: https://github.com/apache/datafusion/pull/17505#issuecomment-3285580676 > > Is it worth changing the term "sponsoring" to something like "driving" or "championing", to avoid any confusion with the monetary meaning of sponsoring? > > That is a rea

[PR] refactor: Scala hygiene - remove `scala.collection.JavaConverters` [datafusion-comet]

2025-09-12 Thread via GitHub
hsiang-c opened a new pull request, #2393: URL: https://github.com/apache/datafusion-comet/pull/2393 ## Which issue does this PR close? Partially closes #. https://github.com/apache/datafusion-comet/issues/2255 ## Rationale for this change - The scope of

Re: [I] Enable feature specific unit tests [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on issue #2360: URL: https://github.com/apache/datafusion-comet/issues/2360#issuecomment-3286886609 @andygrove @wForget this is ready for review. The test referred to above passes with `hdfs-opendal` -- This is an automated message from the Apache Git Service. To r

Re: [PR] refactor: Scala hygiene - remove `scala.collection.JavaConverters` [datafusion-comet]

2025-09-12 Thread via GitHub
codecov-commenter commented on PR #2393: URL: https://github.com/apache/datafusion-comet/pull/2393#issuecomment-3286897970 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2393?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Enable the `ListFilesCache` to be available for partitioned tables [datafusion]

2025-09-12 Thread via GitHub
BlakeOrth commented on issue #17211: URL: https://github.com/apache/datafusion/issues/17211#issuecomment-3286924997 @alamb I've found the in-review documentation on process updates in: - https://github.com/apache/datafusion/pull/17505 In an effort to follow the project's processes

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA - Sep 15, 2025 [datafusion]

2025-09-12 Thread via GitHub
GitHub user alamb added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA - Sep 15, 2025 Looking forward to next week! GitHub link: https://github.com/apache/datafusion/discussions/16265#discussioncomment-14388367 This is an automatically sent email for git

Re: [I] Release sqlparser-rs version `0.59.0` around 2025-09-15 [datafusion-sqlparser-rs]

2025-09-12 Thread via GitHub
alamb commented on issue #1956: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1956#issuecomment-3286957810 @iffyio -- I am thinking of making a release next week -- is there anything in particular you think we should wait for? -- This is an automated message from the Apa

Re: [I] Slow aggregrate query with `array_agg`, Polars is 4 times faster for equal query [datafusion]

2025-09-12 Thread via GitHub
Omega359 commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3286959549 There was a previous attempt at a GroupsAccumulator for ArrayAgg (https://github.com/apache/datafusion/pull/11096) but the performance of that PR was mixed. -- This is an au

Re: [PR] refactor: Scala hygiene - remove `scala.collection.JavaConverters` [datafusion-comet]

2025-09-12 Thread via GitHub
hsiang-c commented on PR #2393: URL: https://github.com/apache/datafusion-comet/pull/2393#issuecomment-3286990162 SparkSQL tests failed b/c of `java.lang.NoClassDefFoundError` ``` java.lang.NoClassDefFoundError: scala/jdk/javaapi/CollectionConverters [info] at org.apache.

Re: [PR] chore: Improve Initcap test and docs [datafusion-comet]

2025-09-12 Thread via GitHub
andygrove merged PR #2387: URL: https://github.com/apache/datafusion-comet/pull/2387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Panic happens when adding a decimal256 to a float (SQLancer) [datafusion]

2025-09-12 Thread via GitHub
Jefffrey closed issue #16689: Panic happens when adding a decimal256 to a float (SQLancer) URL: https://github.com/apache/datafusion/issues/16689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: implement lazy evaluation in Coalesce function [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on code in PR #2270: URL: https://github.com/apache/datafusion-comet/pull/2270#discussion_r2345625780 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -394,6 +394,20 @@ class CometExpressionSuite extends CometTestBase with Adapti

Re: [PR] feat: feature specific tests [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on PR #2372: URL: https://github.com/apache/datafusion-comet/pull/2372#issuecomment-3287165660 @wForget ptal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Add test for decimal256 and float math [datafusion]

2025-09-12 Thread via GitHub
Jefffrey merged PR #17530: URL: https://github.com/apache/datafusion/pull/17530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] [ANSI] Include original SQL in error messages [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on issue #2215: URL: https://github.com/apache/datafusion-comet/issues/2215#issuecomment-3287216142 I think a different approach is required here. The error in this example comes from [QueryExecutionErrors](https://github.com/apache/spark/blob/master/sql/catalyst/src

Re: [I] [native_iceberg_compat] Add support for custom S3 endpoints [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on issue #2261: URL: https://github.com/apache/datafusion-comet/issues/2261#issuecomment-3287221701 @andygrove I don't think this is a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat: Support more date part expressions [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on code in PR #2316: URL: https://github.com/apache/datafusion-comet/pull/2316#discussion_r2345685485 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1681,14 +1681,17 @@ class CometExpressionSuite extends CometTestBase with Ada

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-12 Thread via GitHub
parthchandra commented on PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#issuecomment-3287286359 oops. Yes, I did mean #2372. Let's get that merged and update the test in this PR. -- This is an automated message from the Apache Git Service. To respond to the message

[PR] Always run CI checks [datafusion]

2025-09-12 Thread via GitHub
blaginin opened a new pull request, #17538: URL: https://github.com/apache/datafusion/pull/17538 Follow up on https://github.com/apache/datafusion/pull/17183 There are two ways to fix the "steps not run" problem: - 1: always run them. CI will become slower but we'll have merge queue

Re: [PR] feat: feature specific tests [datafusion-comet]

2025-09-12 Thread via GitHub
wForget commented on code in PR #2372: URL: https://github.com/apache/datafusion-comet/pull/2372#discussion_r2345712676 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromFakeHadoopFsSuite.scala: ## @@ -74,7 +74,18 @@ class ParquetReadFromFakeHadoopFsSuite extends C

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-12 Thread via GitHub
wForget commented on PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#issuecomment-3287309576 > Should we also be adding the test suites to pr_build_macos.yml ? Thanks, I will add it -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] feat: Support more date part expressions [datafusion-comet]

2025-09-12 Thread via GitHub
wForget commented on code in PR #2316: URL: https://github.com/apache/datafusion-comet/pull/2316#discussion_r2345737849 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1681,14 +1681,17 @@ class CometExpressionSuite extends CometTestBase with Adaptive

  1   2   >