Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on code in PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#discussion_r2339283125 ## .github/workflows/pr_build_linux.yml: ## @@ -169,4 +169,39 @@ jobs: suites: ${{ matrix.suite.value }} maven_opts: ${{ matrix.profile.m

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on code in PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#discussion_r2339283125 ## .github/workflows/pr_build_linux.yml: ## @@ -169,4 +169,39 @@ jobs: suites: ${{ matrix.suite.value }} maven_opts: ${{ matrix.profile.m

Re: [I] `datafusion-cli` tests fails locally [datafusion]

2025-09-11 Thread via GitHub
Weijun-H commented on issue #17458: URL: https://github.com/apache/datafusion/issues/17458#issuecomment-3279083802 > I tried the latest commit [351675d](https://github.com/apache/datafusion/commit/351675ddc27c42684a079b3a89fe2dee581d89a2), and it's still failing on my machine. Perhaps you h

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-11 Thread via GitHub
xudong963 commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3279091025 > add a test for projection pushdown regression? Do you mean adding a test to the extended test suite? -- This is an automated message from the Apache Git Service. To

Re: [I] Regression: projection pushdown doesn't work as expected in DF50 [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on issue #17513: URL: https://github.com/apache/datafusion/issues/17513#issuecomment-3280143135 I ran a bisect and found that https://github.com/apache/datafusion/pull/17295 introduced this bug -- This is an automated message from the Apache Git Service. To respond to t

[PR] chore: Revert "chore: [1941-Part1]: Introduce `map_sort` scalar function (#2… [datafusion-comet]

2025-09-11 Thread via GitHub
comphead opened a new pull request, #2381: URL: https://github.com/apache/datafusion-comet/pull/2381 …262)" This reverts commit f2baf95e9a1653991e21846d7e89e77ee656ba4b. The function `map_sort` doesn't exist for Spark, `array_sort` does but not `map_sort`. This might be useful

Re: [PR] Fix predicate simplification for incompatible types in push_down_filter [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on PR #17521: URL: https://github.com/apache/datafusion/pull/17521#issuecomment-3281483577 @findepi thank you so much for reviewing, I've addressed your comments and added more unit style tests -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Revert #17295 (Support from-first SQL syntax) [datafusion]

2025-09-11 Thread via GitHub
comphead commented on code in PR #17520: URL: https://github.com/apache/datafusion/pull/17520#discussion_r2341523858 ## datafusion/core/tests/sql/select.rs: ## @@ -344,3 +344,27 @@ async fn test_version_function() { assert_eq!(version.value(0), expected_version); } + +//

Re: [PR] Revert #17295 (Support from-first SQL syntax) [datafusion]

2025-09-11 Thread via GitHub
comphead commented on code in PR #17520: URL: https://github.com/apache/datafusion/pull/17520#discussion_r2341521150 ## datafusion/sqllogictest/test_files/projection.slt: ## @@ -252,3 +252,30 @@ physical_plan statement ok drop table t; + +# Regression test for 17513 Review

Re: [PR] Support from-first SQL syntax [datafusion]

2025-09-11 Thread via GitHub
alamb commented on PR #17295: URL: https://github.com/apache/datafusion/pull/17295#issuecomment-3281536219 > So maybe the answer is that we should make `select from t1;` equivalent to "select no columns from t1" while keeping `from t1;` equivalent to `select * from t1;`? I think we s

Re: [PR] fix: correct typos in `CONTRIBUTING.md` [datafusion]

2025-09-11 Thread via GitHub
alamb merged PR #17507: URL: https://github.com/apache/datafusion/pull/17507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] job data cleanup does not work if `pull-staged` strategy selected [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #1219: URL: https://github.com/apache/datafusion-ballista/issues/1219#issuecomment-3279558490 I believe this issue is related to #602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Check function return value's type [datafusion]

2025-09-11 Thread via GitHub
findepi commented on PR #17515: URL: https://github.com/apache/datafusion/pull/17515#issuecomment-3279567818 ``` External error: 8 errors in file /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt 1. query failed: DataFusion error: Exe

Re: [I] Could not deserialize ballista_core::serde::generated::ballista::JobStatus (invalid wire type) [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #417: URL: https://github.com/apache/datafusion-ballista/issues/417#issuecomment-3279567271 I believe this is a stale issue, please re-open if I'm wrong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Could not deserialize ballista_core::serde::generated::ballista::JobStatus (invalid wire type) [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm closed issue #417: Could not deserialize ballista_core::serde::generated::ballista::JobStatus (invalid wire type) URL: https://github.com/apache/datafusion-ballista/issues/417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Support DataFrame aggregate functions [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #620: URL: https://github.com/apache/datafusion-ballista/issues/620#issuecomment-3279550220 i believe this is supported, closing this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat(spark): implement spark hash function murmur/xxhash64 [datafusion]

2025-09-11 Thread via GitHub
chenkovsky commented on code in PR #17093: URL: https://github.com/apache/datafusion/pull/17093#discussion_r2341000595 ## datafusion/spark/src/function/hash/murmur3.rs: ## @@ -0,0 +1,152 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] chore: Refactor Literal serde [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove commented on code in PR #2377: URL: https://github.com/apache/datafusion-comet/pull/2377#discussion_r2341014477 ## spark/src/main/scala/org/apache/comet/serde/literals.scala: ## @@ -0,0 +1,190 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] feat(spark): implement Spark `try_parse_url` function [datafusion]

2025-09-11 Thread via GitHub
rafafrdz commented on code in PR #17485: URL: https://github.com/apache/datafusion/pull/17485#discussion_r2341015483 ## datafusion/sqllogictest/test_files/spark/url/try_parse_url.slt: ## @@ -0,0 +1,72 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more co

Re: [PR] Switch from xz2 to liblzma to reduce duplicate dependencies [datafusion]

2025-09-11 Thread via GitHub
timsaucer commented on PR #17509: URL: https://github.com/apache/datafusion/pull/17509#issuecomment-3280880292 It looks like upstream is on a 2 month release cadence, so we'll see if that lands first or the `arrow-avro` -- This is an automated message from the Apache Git Service. To respo

Re: [PR] feat: Support comet native log level conf [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove commented on PR #2379: URL: https://github.com/apache/datafusion-comet/pull/2379#issuecomment-3280901691 @wForget Could you a note to the [debugging guide](https://datafusion.apache.org/comet/contributor-guide/debugging.html)? -- This is an automated message from the Apache Git

Re: [PR] feat: Support comet native log level conf [datafusion-comet]

2025-09-11 Thread via GitHub
codecov-commenter commented on PR #2379: URL: https://github.com/apache/datafusion-comet/pull/2379#issuecomment-3280847027 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2379?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat(spark): implement Spark `try_parse_url` function [datafusion]

2025-09-11 Thread via GitHub
rafafrdz commented on code in PR #17485: URL: https://github.com/apache/datafusion/pull/17485#discussion_r2341044098 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -47,23 +46,7 @@ impl Default for ParseUrl { impl ParseUrl { pub fn new() -> Self { Self {

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-09-11 Thread via GitHub
irenjj commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3280949927 Here is a brief summarize of the project: COMPLETED WORK Simple implementation of decorrelation framework - algorithm implementation for converting correlated subqueries

Re: [PR] Refactor TableProvider::scan into TableProvider::scan_with_args [datafusion]

2025-09-11 Thread via GitHub
alamb commented on PR #17336: URL: https://github.com/apache/datafusion/pull/17336#issuecomment-3281002276 > @alamb could you give this a look? I don't think this is blocked by anything and we already discussed it a bit offline. I will try and review it either later today or tomorrow

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-11 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3280060456 🤖: Benchmark completed Details ``` Comparing HEAD and issue_17411 Benchmark clickbench_extended.json ┏━

[PR] fix: Add AWS environment variable checks for S3 tests [datafusion]

2025-09-11 Thread via GitHub
Weijun-H opened a new pull request, #17519: URL: https://github.com/apache/datafusion/pull/17519 ## Which issue does this PR close? - Closes #17458 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] DictionaryKeyOverflowError on DataFrame.write_parquet [datafusion]

2025-09-11 Thread via GitHub
valkum commented on issue #17445: URL: https://github.com/apache/datafusion/issues/17445#issuecomment-3280075902 But the example is only using 10 distinct keys. If the writer is using a smaller Dict (Uint16 vs. Int8), a dict somewhere is collecting values as distinct that aren't distinct.

[PR] Fix regression in SELECT FROM syntax with WHERE clause [datafusion]

2025-09-11 Thread via GitHub
adriangb opened a new pull request, #17520: URL: https://github.com/apache/datafusion/pull/17520 ## Summary - Fixes #17513 - Fixes regression introduced by #17295 where `SELECT FROM table WHERE condition` incorrectly returned all columns instead of empty projection ## Problem

Re: [I] High CPU during dynamic filter bound computation: min_batch/max_batch [datafusion]

2025-09-11 Thread via GitHub
LiaCastaneda commented on issue #17486: URL: https://github.com/apache/datafusion/issues/17486#issuecomment-3280252533 There are a couple of other things I noticed while using the API that I think are worth mentioning, but I can file them as separate issues. I’m also happy to take a stab at

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-11 Thread via GitHub
tobixdev commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3281151036 > It seems to me like we have two known blocking issues now for 50.0.0 > > * [ ] [Regression: projection pushdown doesn't work as expected in DF50 #17513](https://gi

Re: [PR] Fix predicate simplification for incompatible types in push_down_filter [datafusion]

2025-09-11 Thread via GitHub
findepi commented on code in PR #17521: URL: https://github.com/apache/datafusion/pull/17521#discussion_r2341196453 ## datafusion/optimizer/src/simplify_expressions/simplify_predicates.rs: ## @@ -204,7 +204,13 @@ fn find_most_restrictive_predicate( if let Some(sca

Re: [PR] feat: Support comet native log level conf [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on PR #2379: URL: https://github.com/apache/datafusion-comet/pull/2379#issuecomment-3281141190 > @wForget Could you a note to the [debugging guide](https://datafusion.apache.org/comet/contributor-guide/debugging.html)? Sure. Also, can I change these log properties t

Re: [PR] Blog: Add table of contents to blog article [datafusion-site]

2025-09-11 Thread via GitHub
alamb commented on code in PR #107: URL: https://github.com/apache/datafusion-site/pull/107#discussion_r2341210098 ## plugins/extract_toc/README.md: ## @@ -0,0 +1,137 @@ +Extract Table of Content + + +A Pelican plugin to extract table of contents (ToC) fr

[PR] Check function return value's type [datafusion]

2025-09-11 Thread via GitHub
findepi opened a new pull request, #17515: URL: https://github.com/apache/datafusion/pull/17515 The planner takes into account the return type a function promises to return. It even passes it back on invoke as a reminder/convenience. Verify that each function delivers on the promise.

Re: [I] Refactor and improve job data cleanup logic [datafusion-ballista]

2025-09-11 Thread via GitHub
KR-bluejay commented on issue #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316#issuecomment-3280417513 Got it, thanks for clarifying! — that makes more sense. Splitting responsibilities so that the scheduler focuses on job operations, while Flight store owns its lo

Re: [PR] Fix regression: SELECT FROM syntax should return empty projection [datafusion]

2025-09-11 Thread via GitHub
findepi commented on code in PR #17520: URL: https://github.com/apache/datafusion/pull/17520#discussion_r2341308087 ## datafusion/sqllogictest/test_files/projection.slt: ## @@ -252,3 +252,30 @@ physical_plan statement ok drop table t; + +# Regression test for 17513 + +query

Re: [I] application of simple optimizer rule produces incorrect results (DF 49 regression) [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on issue #17510: URL: https://github.com/apache/datafusion/issues/17510#issuecomment-3281271880 I don't think this has anything to do with dynamic filters. I reduced the MRE to: ```rust #[tokio::test] async fn test_regression() { use arrow::{

Re: [PR] Fix regression: SELECT FROM syntax should return empty projection [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on PR #17520: URL: https://github.com/apache/datafusion/pull/17520#issuecomment-3281326431 Yeah it seems like what this is coming to is: - We can't differentiate between `SELECT FROM t;` and `FROM t;` - We want the opposite behavior between them (agreed?) - Thus we

Re: [PR] feat: [1941-Part3]: Introduce map_from_list scalar function [datafusion-comet]

2025-09-11 Thread via GitHub
comphead commented on PR #2328: URL: https://github.com/apache/datafusion-comet/pull/2328#issuecomment-3281329374 I'm also reviewing this function in DF which might solve the same problem https://github.com/apache/datafusion/pull/17456 -- This is an automated message from the Apache Git

Re: [PR] docs: Update documentation on Epics and Sponsoring Maintainers [datafusion]

2025-09-11 Thread via GitHub
alamb commented on code in PR #17505: URL: https://github.com/apache/datafusion/pull/17505#discussion_r2340023611 ## docs/source/contributor-guide/roadmap.md: ## @@ -62,14 +63,35 @@ For more information: ### Discussing New Features If you plan to work on a new feature that d

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
comphead commented on code in PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#discussion_r2341395637 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromFakeHadoopFsSuite.scala: ## @@ -74,7 +74,9 @@ class ParquetReadFromFakeHadoopFsSuite extends C

Re: [I] Add pretty printing to more sql constructs [datafusion-sqlparser-rs]

2025-09-11 Thread via GitHub
hengfeiyang commented on issue #1850: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1850#issuecomment-3281365296 For this sql, the format also not correct: ```sql SELECT cid,COUNT(*)AS cnt FROM tbl WHERE hostname IN('www.google.com','www.abc.com')AND cid IN(SELECT DIST

Re: [PR] bug: Support null as argument to to_local_time [datafusion]

2025-09-11 Thread via GitHub
petern48 commented on code in PR #17491: URL: https://github.com/apache/datafusion/pull/17491#discussion_r2341431611 ## datafusion/functions/src/datetime/to_local_time.rs: ## @@ -119,6 +119,7 @@ impl ToLocalTimeFunc { let arg_type = time_value.data_type(); ma

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on code in PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#discussion_r2341439806 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromFakeHadoopFsSuite.scala: ## @@ -74,7 +74,9 @@ class ParquetReadFromFakeHadoopFsSuite extends Co

Re: [PR] Switch from xz2 to liblzma to reduce duplicate dependencies [datafusion]

2025-09-11 Thread via GitHub
findepi commented on PR #17509: URL: https://github.com/apache/datafusion/pull/17509#issuecomment-3279408081 is this "fixes https://github.com/apache/datafusion/issues/15342"; ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat(spark): implement Spark `try_parse_url` function [datafusion]

2025-09-11 Thread via GitHub
rafafrdz commented on code in PR #17485: URL: https://github.com/apache/datafusion/pull/17485#discussion_r2340230080 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -47,23 +46,7 @@ impl Default for ParseUrl { impl ParseUrl { pub fn new() -> Self { Self {

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

2025-09-11 Thread via GitHub
Dandandan commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3281103345 > Which yeah is exactly what you were saying about large hash tables and CPU cache. But fair enough yeah, I really don't know which implementation is going to be faster and if

Re: [PR] Update title for dynamic filters blog post [datafusion-site]

2025-09-11 Thread via GitHub
alamb merged PR #109: URL: https://github.com/apache/datafusion-site/pull/109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] feat(spark): implement Spark `try_parse_url` function [datafusion]

2025-09-11 Thread via GitHub
rafafrdz commented on code in PR #17485: URL: https://github.com/apache/datafusion/pull/17485#discussion_r2341172796 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -47,23 +46,7 @@ impl Default for ParseUrl { impl ParseUrl { pub fn new() -> Self { Self {

Re: [PR] Support from-first SQL syntax [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on PR #17295: URL: https://github.com/apache/datafusion/pull/17295#issuecomment-3281120327 @alamb we discussed a bit in https://github.com/apache/datafusion/pull/17520#issuecomment-3280334178, I'm not sure that this PR introduced a bug. If I understand correctly `select f

Re: [I] Invalid argument error: lz4 IPC decompression requires the lz4 feature [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #929: URL: https://github.com/apache/datafusion-ballista/issues/929#issuecomment-3279506204 I believe this can be closed, please re-open if I'm wrong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Support for speculative execution tasks [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm closed issue #863: Support for speculative execution tasks URL: https://github.com/apache/datafusion-ballista/issues/863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Refactor and improve job data cleanup logic [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316#issuecomment-3280478588 please do start with tasks you'd like to progress, like file paths and deduplication. I dont think there will be big merging impact of #1315 -- This is an automat

Re: [I] TPC-DS query 72 slow on modest (10) scale factors [datafusion]

2025-09-11 Thread via GitHub
Dandandan commented on issue #17494: URL: https://github.com/apache/datafusion/issues/17494#issuecomment-3280535980 `chain_traverse` traverses the chain for all hashes matching on the build side matching the hash on the probe side. If the build side is larger and every probe value matches m

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on code in PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#discussion_r2339669982 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -1138,4 +1138,9 @@ abstract class CometTestBase usingDataSourceExec(conf) &&

Re: [PR] feat: Support log for Decimal128 and Decimal256 [datafusion]

2025-09-11 Thread via GitHub
Jefffrey commented on code in PR #17023: URL: https://github.com/apache/datafusion/pull/17023#discussion_r2340008443 ## datafusion/functions/src/math/log.rs: ## @@ -58,21 +64,91 @@ impl Default for LogFunc { impl LogFunc { pub fn new() -> Self { -use DataType::*;

Re: [I] High CPU during dynamic filter bound computation: min_batch/max_batch [datafusion]

2025-09-11 Thread via GitHub
LiaCastaneda commented on issue #17486: URL: https://github.com/apache/datafusion/issues/17486#issuecomment-3279760485 yes, thanks for the fix! (even if it wasn’t meant to be one 😄 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-11 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3279787468 🤖: Benchmark completed Details ``` groupissue_17411 main -

Re: [I] High CPU during dynamic filter bound computation: min_batch/max_batch [datafusion]

2025-09-11 Thread via GitHub
alamb commented on issue #17486: URL: https://github.com/apache/datafusion/issues/17486#issuecomment-3279790910 > > [@LiaCastaneda](https://github.com/LiaCastaneda) could you try this on main now? I wonder if [#17444](https://github.com/apache/datafusion/pull/17444) made a difference. >

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-11 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3279787742 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubun

[I] Refactor and improve job data cleanup logic [datafusion-ballista]

2025-09-11 Thread via GitHub
KR-bluejay opened a new issue, #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** There are several improvement points in the current job-data deletion flow:

Re: [I] TPC-DS query 72 slow on modest (10) scale factors [datafusion]

2025-09-11 Thread via GitHub
Dandandan commented on issue #17494: URL: https://github.com/apache/datafusion/issues/17494#issuecomment-3280564775 But in this case, I agree the biggest improvement probably comes from a global join ordering algorithm rather than tweaks to the hash join algorithm. -- This is an automated

[I] Move `tempfile` crate to dev dependencies in datafusion core crate [datafusion]

2025-09-11 Thread via GitHub
samueleresca opened a new issue, #17522: URL: https://github.com/apache/datafusion/issues/17522 ### Is your feature request related to a problem or challenge? `tempfile` crate is currently included as a normal dependency in the `datafusion` crate. It would be possible to move the depe

Re: [I] Refactor and improve job data cleanup logic [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316#issuecomment-3280353118 > 3. **Flight as file owner (Actions)** >The idea is interesting, but I would not adopt it now. Ballista’s strength should come from **parallel/distributed

Re: [PR] Fix regression in SELECT FROM syntax with WHERE clause [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on PR #17520: URL: https://github.com/apache/datafusion/pull/17520#issuecomment-3280234425 @simonvandel and @Jefffrey could you review please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] chore: Refactor UnaryMinus serde [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove merged PR #2378: URL: https://github.com/apache/datafusion-comet/pull/2378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] `datafusion-cli` tests fails locally [datafusion]

2025-09-11 Thread via GitHub
2010YOUY01 commented on issue #17458: URL: https://github.com/apache/datafusion/issues/17458#issuecomment-3277612090 I tried the latest commit 351675ddc, and it's still failing on my machine. Perhaps you have those env vars set in your environment? -- This is an automated message from the

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on code in PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#discussion_r2341428228 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadFromHdfsSuite.scala: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

[PR] chore: generate changelog for ballista 49 [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm opened a new pull request, #1317: URL: https://github.com/apache/datafusion-ballista/pull/1317 # Which issue does this PR close? Closes #1305. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

Re: [PR] docs: Fix more comet versions in docs [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove merged PR #2374: URL: https://github.com/apache/datafusion-comet/pull/2374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] Support for where x in (...) dynamic filters in hash join [datafusion]

2025-09-11 Thread via GitHub
LiaCastaneda opened a new issue, #17523: URL: https://github.com/apache/datafusion/issues/17523 ### Is your feature request related to a problem or challenge? Currently, DataFusion only supports bounds dynamic filters, which work well when the data sources have metadata or are sorted

Re: [I] Expose Logical and Physical plan details in the REST API [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on issue #1292: URL: https://github.com/apache/datafusion-ballista/issues/1292#issuecomment-3280512148 There is rest api already in place, it provides some information but it does not expose details about logical, physical plans , and stages. have a look at current ap

[PR] chore: Output `BaseAggregateExec` accurate unsupported names [datafusion-comet]

2025-09-11 Thread via GitHub
comphead opened a new pull request, #2383: URL: https://github.com/apache/datafusion-comet/pull/2383 ## Which issue does this PR close? Part of #2382 Closes #. ## Rationale for this change Currently the code doesn't give a clue what group expression is not su

Re: [PR] feat: Add dynamic `enabled` and `allowIncompat` configs for all supported expressions [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove commented on code in PR #2329: URL: https://github.com/apache/datafusion-comet/pull/2329#discussion_r2342366333 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -664,6 +664,29 @@ object CometConf extends ShimCometConf { .booleanConf .creat

[PR] feat: [branch-0.10] Add dynamic `enabled` and `allowIncompat` configs for all supported expressions (#2329) [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove opened a new pull request, #2385: URL: https://github.com/apache/datafusion-comet/pull/2385 …rted expressions (#2329) ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR?

Re: [PR] feat: pass the ordering information to native Scan [datafusion-comet]

2025-09-11 Thread via GitHub
comphead commented on code in PR #2375: URL: https://github.com/apache/datafusion-comet/pull/2375#discussion_r2342592238 ## native/core/src/execution/operators/scan.rs: ## @@ -434,6 +434,19 @@ impl ScanExec { Ok(selection_indices_arrays) } + +pub fn with_orde

Re: [PR] feat: [branch-0.10] Add dynamic `enabled` and `allowIncompat` configs for all supported expressions (#2329) [datafusion-comet]

2025-09-11 Thread via GitHub
codecov-commenter commented on PR #2385: URL: https://github.com/apache/datafusion-comet/pull/2385#issuecomment-3283003498 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2385?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Output `BaseAggregateExec` accurate unsupported names [datafusion-comet]

2025-09-11 Thread via GitHub
codecov-commenter commented on PR #2383: URL: https://github.com/apache/datafusion-comet/pull/2383#issuecomment-3283019805 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2383?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Add support for Spark 4.0.x + Iceberg 1.9.x [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove opened a new issue, #2380: URL: https://github.com/apache/datafusion-comet/issues/2380 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- Th

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-11 Thread via GitHub
milenkovicm commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2339514980 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

Re: [PR] chore: Revert "chore: [1941-Part1]: Introduce `map_sort` scalar function (#2… [datafusion-comet]

2025-09-11 Thread via GitHub
mbutrovich merged PR #2381: URL: https://github.com/apache/datafusion-comet/pull/2381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[I] Support length function with non-string input [datafusion-comet]

2025-09-11 Thread via GitHub
wForget opened a new issue, #2348: URL: https://github.com/apache/datafusion-comet/issues/2348 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- This

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-11 Thread via GitHub
hsiang-c commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331586302 ## native/core/src/execution/operators/scan.rs: ## @@ -239,6 +239,87 @@ impl ScanExec { let mut timer = arrow_ffi_time.timer(); +// Check

Re: [I] Incorrect null literal handling for `to_local_time()` function (SQLancer) [datafusion]

2025-09-11 Thread via GitHub
alamb commented on issue #17472: URL: https://github.com/apache/datafusion/issues/17472#issuecomment-3270683608 I agree the correct result is NULL I think it would be a matter of adding support for `DataType::Null` here: https://github.com/apache/datafusion/blob/1197454ae100ca4

Re: [PR] Support from-first SQL syntax [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on PR #17295: URL: https://github.com/apache/datafusion/pull/17295#issuecomment-3281124360 So maybe the answer is that we should make `select from t1;` equivalent to "select no columns from t1" -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-11 Thread via GitHub
codecov-commenter commented on PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#issuecomment-3268322308 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2346?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-11 Thread via GitHub
SparkApplicationMaster commented on code in PR #17456: URL: https://github.com/apache/datafusion/pull/17456#discussion_r2342821730 ## datafusion/spark/src/function/map/map_from_arrays.rs: ## @@ -0,0 +1,208 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-11 Thread via GitHub
SparkApplicationMaster commented on code in PR #17456: URL: https://github.com/apache/datafusion/pull/17456#discussion_r2342822437 ## datafusion/spark/src/function/map/map_from_arrays.rs: ## @@ -0,0 +1,207 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-11 Thread via GitHub
wForget commented on PR #2350: URL: https://github.com/apache/datafusion-comet/pull/2350#issuecomment-3283460321 > Thanks for this PR @wForget. Should we wait to merge #2327 before we merge this? Thanks for the review. Do you mean #2372? If so, I think either way is fine — the curre

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-11 Thread via GitHub
alamb commented on PR #17299: URL: https://github.com/apache/datafusion/pull/17299#issuecomment-3276245534 Thank you again @xanderbailey @LiaCastaneda @vbarua and @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] chore: [iceberg] support Iceberg 1.9.1 [datafusion-comet]

2025-09-11 Thread via GitHub
hsiang-c opened a new pull request, #2386: URL: https://github.com/apache/datafusion-comet/pull/2386 ## Which issue does this PR close? Partially closes #. https://github.com/apache/datafusion-comet/issues/2380 ## Rationale for this change - Support Iceberg `

Re: [I] Initcap behaves differently in Spark and in DataFusion (also Comet) [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove commented on issue #1052: URL: https://github.com/apache/datafusion-comet/issues/1052#issuecomment-3283049018 This expression is now documented as incompatible and disabled by default, but users can opt it. I will close this issue. -- This is an automated message from the Apach

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-11 Thread via GitHub
KR-bluejay commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2339937761 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

Re: [PR] feat: [branch-0.10] Add dynamic `enabled` and `allowIncompat` configs for all supported expressions (#2329) [datafusion-comet]

2025-09-11 Thread via GitHub
andygrove commented on PR #2385: URL: https://github.com/apache/datafusion-comet/pull/2385#issuecomment-3282990957 > Assuming this is the same as #2329 Yes, I used `git cherry-pick`. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-11 Thread via GitHub
SparkApplicationMaster commented on code in PR #17456: URL: https://github.com/apache/datafusion/pull/17456#discussion_r2342584850 ## datafusion/spark/src/function/map/map_from_arrays.rs: ## @@ -0,0 +1,208 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] Document how schema projection works. [datafusion]

2025-09-11 Thread via GitHub
Jefffrey commented on code in PR #17250: URL: https://github.com/apache/datafusion/pull/17250#discussion_r2342968747 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2184,14 +2184,22 @@ impl Projection { /// will be computed. /// * `exprs`: A slice of `Expr` expressions r

Re: [PR] chore: Start 0.11.0 development [datafusion-comet]

2025-09-11 Thread via GitHub
mbutrovich merged PR #2365: URL: https://github.com/apache/datafusion-comet/pull/2365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3283565041 One note about a cool thing we could do with bloom filters: in theory you can compute an intersection in `O(size of the bloom filter)`. If we push down a bloom filter for col `

Re: [I] Support for where x in (...) dynamic filters in hash join [datafusion]

2025-09-11 Thread via GitHub
adriangb commented on issue #17523: URL: https://github.com/apache/datafusion/issues/17523#issuecomment-3283567409 Let's continue discussion in https://github.com/apache/datafusion/issues/17171 and close this issue if that's okay with you @LiaCastaneda ? -- This is an automated message f

  1   2   3   >