Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-11 Thread via GitHub
xudong963 commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2965317151 After this is fixed, we can publish sqllogictest, because it depends on spark ♾ ``` cd datafusion/sqllogictest && cargo publish --allow-dirty Updating crates.io i

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-11 Thread via GitHub
berkaysynnada commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2965256342 I couldn't find much time on upstream issues lately @adriangb, sorry for that. But I'll certainly take a look at this asap -- This is an automated message from the Apache Git

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
zhuqi-lucas commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2965191039 > Figured it out. File access is done via object store and object store uses `std::fs::File`, not `tokio::fs::File`. Even if it would, from browsing the code it doesn't look

Re: [PR] Document Table Constraint Enforcement Behavior in Custom Table Providers Guide [datafusion]

2025-06-11 Thread via GitHub
alamb merged PR #16340: URL: https://github.com/apache/datafusion/pull/16340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support `DISTINCT AS { STRUCT | VALUE }` for BigQuery [datafusion-sqlparser-rs]

2025-06-11 Thread via GitHub
alamb commented on PR #1880: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1880#issuecomment-2965080828 woohoo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16368: URL: https://github.com/apache/datafusion/pull/16368#issuecomment-2965079051 Sweet -- no updates requires -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore: refactor Substrait consumer's "rename_field" and implement the rest of types [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16345: URL: https://github.com/apache/datafusion/pull/16345#issuecomment-2965076495 It looks to me like this PR is ready to merge, though it does look like there is some potential follow on ideas in https://github.com/apache/datafusion/pull/16345#discussion_r21393202

Re: [PR] chore: refactor Substrait consumer's "rename_field" and implement the rest of types [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16345: URL: https://github.com/apache/datafusion/pull/16345#issuecomment-2965076627 Thanks again everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965063028 Thanks again @pepijnve @Dandandan @ozankabak and @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2965062758 > I just checked, and SortExec is also setting up RecordBatchReceiverStream. The worst case scenario in terms of elapsed time in the poll_next call is that all 10k streams are ready in

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2965049836 I have a PR to fix it here (found while writing tests): - https://github.com/apache/datafusion/pull/16384 -- This is an automated message from the Apache Git Service. To respo

Re: [I] [datafusion-spark] Example of using Spark compatible function library [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #15915: URL: https://github.com/apache/datafusion/issues/15915#issuecomment-2965058111 I made a PR for this issue -- it was a bit trickier than I expected - https://github.com/apache/datafusion/pull/16384 -- This is an automated message from the Apache Git Servi

[I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new issue, #16383: URL: https://github.com/apache/datafusion/issues/16383 ### Describe the bug Filing a ticket for something @xudong963 hit today (I can't remember where this was mentioned): When trying to publish the `datafusion-spark` crate it generates the f

Re: [PR] #5483 [datafusion]

2025-06-11 Thread via GitHub
github-actions[bot] closed pull request #15307: #5483 URL: https://github.com/apache/datafusion/pull/15307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [I] Investigate performance tradeoff in compressing spill files [datafusion]

2025-06-11 Thread via GitHub
2010YOUY01 commented on issue #16367: URL: https://github.com/apache/datafusion/issues/16367#issuecomment-2964856125 > * CPU / I/O tradeoff when `zstd` or `lz4_frame` compression is enabled i.e. compression ratio, extra latency spent for compression It's worth doing some micro-benches

Re: [PR] Add compression option to SpillManager [datafusion]

2025-06-11 Thread via GitHub
2010YOUY01 commented on code in PR #16268: URL: https://github.com/apache/datafusion/pull/16268#discussion_r2141401180 ## datafusion/common/src/config.rs: ## @@ -274,6 +276,60 @@ config_namespace! { } } +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum SpillCompres

Re: [PR] Secure GitHub Actions by using specific SHA hashes [datafusion]

2025-06-11 Thread via GitHub
github-actions[bot] closed pull request #15306: Secure GitHub Actions by using specific SHA hashes URL: https://github.com/apache/datafusion/pull/15306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Draft: Use take-in kernel in repartitioning [datafusion]

2025-06-11 Thread via GitHub
github-actions[bot] commented on PR #15392: URL: https://github.com/apache/datafusion/pull/15392#issuecomment-2964785768 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: implement GroupsAccumulator for `count(DISTINCT)` aggr [datafusion]

2025-06-11 Thread via GitHub
github-actions[bot] commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2964785938 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove merged PR #1860: URL: https://github.com/apache/datafusion-comet/pull/1860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Spark Test fails `vectorized reader: missing all struct fields` [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove closed issue #1843: Spark Test fails `vectorized reader: missing all struct fields` URL: https://github.com/apache/datafusion-comet/issues/1843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Optimize `NestedLoopJoinExec` Memory Usage [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on issue #16364: URL: https://github.com/apache/datafusion/issues/16364#issuecomment-2964570048 Yeah this was much needed thanks for bringing this up @UBarney -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] Disable `datafusion-cli tests for hash_collision tets [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new pull request, #16382: URL: https://github.com/apache/datafusion/pull/16382 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16378 ## Rationale for this change As described on https://github.com/apache/datafusion/is

Re: [I] CI failure on Datafusion extended tests / cargo test hash collisions (amd64) (push) [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16378: URL: https://github.com/apache/datafusion/issues/16378#issuecomment-2964559250 Here is a proposed fix: - https://github.com/apache/datafusion/pull/16382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-06-11 Thread via GitHub
parthchandra commented on PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#issuecomment-2964554260 Removed commented out code that is no longer needed. Thanks for the second review @andygrove. Will merge after ci completes -- This is an automated message from the Apach

Re: [PR] Fix CI tests by disabling AWS metadata [datafusion]

2025-06-11 Thread via GitHub
alamb commented on code in PR #16381: URL: https://github.com/apache/datafusion/pull/16381#discussion_r2141267569 ## .github/workflows/extended.yml: ## @@ -100,7 +100,10 @@ jobs: df -h - name: Run tests (excluding doctests) env: - RUST_BACKTRA

Re: [PR] Fix CI tests by disabling AWS metadata [datafusion]

2025-06-11 Thread via GitHub
alamb closed pull request #16381: Fix CI tests by disabling AWS metadata URL: https://github.com/apache/datafusion/pull/16381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix CI tests by disabling AWS metadata [datafusion]

2025-06-11 Thread via GitHub
alamb commented on code in PR #16381: URL: https://github.com/apache/datafusion/pull/16381#discussion_r2141267569 ## .github/workflows/extended.yml: ## @@ -100,7 +100,10 @@ jobs: df -h - name: Run tests (excluding doctests) env: - RUST_BACKTRA

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-11 Thread via GitHub
alamb closed issue #15771: Release DataFusion `48.0.0` (June 2025) URL: https://github.com/apache/datafusion/issues/15771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2964540689 We have released to crates.io so I think we are all good here now: https://github.com/apache/datafusion/issues/15915 -- This is an automated message from the Apache Git Service.

Re: [PR] feat: support FixedSizeList for array_has [datafusion]

2025-06-11 Thread via GitHub
alamb merged PR #16333: URL: https://github.com/apache/datafusion/pull/16333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: preserve null_equals_null flag in eliminate_cross_join rule [datafusion]

2025-06-11 Thread via GitHub
alamb merged PR #16356: URL: https://github.com/apache/datafusion/pull/16356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix: datafusion-sqllogictest 48.0.0 can't be published [datafusion]

2025-06-11 Thread via GitHub
alamb merged PR #16376: URL: https://github.com/apache/datafusion/pull/16376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] datafusion-sqllogictest 48.0.0 can't be published [datafusion]

2025-06-11 Thread via GitHub
alamb closed issue #16375: datafusion-sqllogictest 48.0.0 can't be published URL: https://github.com/apache/datafusion/issues/16375 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Fix CI tests by disabling AWS metadata [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new pull request, #16381: URL: https://github.com/apache/datafusion/pull/16381 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16378 ## Rationale for this change The way credentials are resolved on AWS builders ha

Re: [I] CI failure on Datafusion extended tests / cargo test hash collisions (amd64) (push) [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16378: URL: https://github.com/apache/datafusion/issues/16378#issuecomment-2964510793 I think the problem is that prior to https://github.com/apache/datafusion/pull/16300 we wouldn't actually fetch the credential provider until it was needed, which it is not for th

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove commented on code in PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#discussion_r2141233325 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -424,6 +425,187 @@ public void init() throws Throwable { isInitialized =

Re: [PR] fix: map parquet field_id correctly (native_iceberg_compat) [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove commented on code in PR #1815: URL: https://github.com/apache/datafusion-comet/pull/1815#discussion_r2141233094 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -424,6 +425,187 @@ public void init() throws Throwable { isInitialized =

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2141219041 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -704,7 +704,8 @@ fn split_join_requirements( | JoinType::Left | JoinType::Righ

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on code in PR #16380: URL: https://github.com/apache/datafusion/pull/16380#discussion_r2141169183 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1372,15 +1407,16 @@ pub fn equal_rows_arr( // The results are then folded (combined) using the

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2964427447 Figured it out. File access is done via object store and object store uses `std::fs::File`, not `tokio::fs::File`. Even if it would, from browsing the code it doesn't look to m

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on PR #16380: URL: https://github.com/apache/datafusion/pull/16380#issuecomment-2964423579 I've noticed that it is possible for `interleave` to perform worse than `take` despite the `Arc` clones from `take`. This happens twice as well for `equal_row_arr` and `build_bat

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2141190301 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2141190301 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] docs: Update Iceberg docs [wip] [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove closed pull request #1872: docs: Update Iceberg docs [wip] URL: https://github.com/apache/datafusion-comet/pull/1872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Tests are failing in Miri CI workflow [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove closed issue #1871: Tests are failing in Miri CI workflow URL: https://github.com/apache/datafusion-comet/issues/1871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] build: Disable some rounding tests when miri is enabled [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove merged PR #1873: URL: https://github.com/apache/datafusion-comet/pull/1873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2964398777 > We add a CI workflow that is triggered when Spark functions SLT files are changed, to make sure they are generated without unintended manual modification. I am not quite s

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2964396824 I just had a chat with @shehabgamin The current status is that we have not smoothed out the process to the point where contributors with minimal context can pick up a port

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on code in PR #16380: URL: https://github.com/apache/datafusion/pull/16380#discussion_r2141169183 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -1372,15 +1407,16 @@ pub fn equal_rows_arr( // The results are then folded (combined) using the

Re: [PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove commented on PR #1860: URL: https://github.com/apache/datafusion-comet/pull/1860#issuecomment-2964357089 miri failure due to https://github.com/apache/datafusion-comet/issues/1871 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] build: Disable some rounding tests when miri is enabled [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove commented on PR #1873: URL: https://github.com/apache/datafusion-comet/pull/1873#issuecomment-2964350953 @kazuyukitanimura @parthchandra @huaxingao could I get a committer review? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n commented on code in PR #16380: URL: https://github.com/apache/datafusion/pull/16380#discussion_r2141135830 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -95,9 +96,11 @@ struct JoinLeftData { /// The hash table with indices into `batch` hash_map

[PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-11 Thread via GitHub
jonathanc-n opened a new pull request, #16380: URL: https://github.com/apache/datafusion/pull/16380 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] CI failure on Datafusion extended tests / cargo test hash collisions (amd64) (push) [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16378: URL: https://github.com/apache/datafusion/issues/16378#issuecomment-2964249245 After adding more information in https://github.com/apache/datafusion/pull/16379 , we get: ``` exec::tests::copy_to_external_object_store_test stdout Err

Re: [PR] fix: Fix SparkSha2 to be compliant with Spark response and add support for Int32 [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16350: URL: https://github.com/apache/datafusion/pull/16350#issuecomment-2964195828 FYI @shehabgamin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support RangePartitioning with native shuffle [datafusion-comet]

2025-06-11 Thread via GitHub
mbutrovich commented on PR #1862: URL: https://github.com/apache/datafusion-comet/pull/1862#issuecomment-2964065141 Last thing I am waiting on is to do a new set of Spark diffs to turn off native RangePartitioning in the 3 bucketing-related tests. Because of the different random number gen

Re: [PR] Add more context to error message for datafusion-cli config failure [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16379: URL: https://github.com/apache/datafusion/pull/16379#issuecomment-2964036306 Let's get some data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-11 Thread via GitHub
blaginin commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2141000796 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::pretty::pr

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2964042212 Some more information to share and time to eat some humble pie for me. Google led me to a [withoutboats post](https://internals.rust-lang.org/t/runtime-agnostic-cooperative-tas

Re: [PR] Add more context to error message for datafusion-cli config failure [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16379: URL: https://github.com/apache/datafusion/pull/16379#issuecomment-2964037509 Let's get some data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add more context to error message for datafusion-cli config failure [datafusion]

2025-06-11 Thread via GitHub
alamb merged PR #16379: URL: https://github.com/apache/datafusion/pull/16379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: Update Iceberg docs [wip] [datafusion-comet]

2025-06-11 Thread via GitHub
codecov-commenter commented on PR #1872: URL: https://github.com/apache/datafusion-comet/pull/1872#issuecomment-2964016404 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1872?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-11 Thread via GitHub
pepijnve commented on PR #16319: URL: https://github.com/apache/datafusion/pull/16319#issuecomment-2963849416 I did a second identical run because those result seemed just too good to be true to me. This is much closer to what I was expecting: more or less status quo. That does mean I'm bac

Re: [PR] Fix: Map functions crash on out of bounds cases [datafusion]

2025-06-11 Thread via GitHub
krishvishal commented on PR #16203: URL: https://github.com/apache/datafusion/pull/16203#issuecomment-2963817531 @comphead, can you please tell if there is something I can do to move this forward? Are there any relevant benches I could either run or adapt for this case? -- This is an aut

Re: [I] June 2025 ASF Board Report [datafusion]

2025-06-11 Thread via GitHub
alamb closed issue #15182: June 2025 ASF Board Report URL: https://github.com/apache/datafusion/issues/15182 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2963580078 > Sorry, I usually do this around a white board. Yeah I agree doing this kind of thing with a whiteboard is easier. I am happy to set up a video call with the relevant part

Re: [PR] build: Disable some rounding tests when miri is enabled [datafusion-comet]

2025-06-11 Thread via GitHub
codecov-commenter commented on PR #1873: URL: https://github.com/apache/datafusion-comet/pull/1873#issuecomment-2963656751 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1873?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] CI failure on Datafusion extended tests / cargo test hash collisions (amd64) (push) [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new issue, #16378: URL: https://github.com/apache/datafusion/issues/16378 ### Describe the bug CI is failing on main after merging - https://github.com/apache/datafusion/pull/16300 Example failure: ```text failures: exec::tests::copy_to_extern

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-11 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2140683717 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

[PR] Fix: datafusion-sqllogictest 48.0.0 can't be published [datafusion]

2025-06-11 Thread via GitHub
xudong963 opened a new pull request, #16376: URL: https://github.com/apache/datafusion/pull/16376 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16375 ## Rationale for this change ## What changes are included in thi

Re: [I] Tests are failing in Miri CI workflow [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove commented on issue #1871: URL: https://github.com/apache/datafusion-comet/issues/1871#issuecomment-2963580260 Possibly relevent info: https://github.com/rust-lang/miri/issues/4208 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] June 2025 ASF Board Report [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #15182: URL: https://github.com/apache/datafusion/issues/15182#issuecomment-2963675796 REport was submitted. Here is the final copy: ``` ## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensi

Re: [PR] chore: refactor Substrait consumer's "rename_field" and implement the rest of types [datafusion]

2025-06-11 Thread via GitHub
Blizzara commented on code in PR #16345: URL: https://github.com/apache/datafusion/pull/16345#discussion_r2140656890 ## datafusion/substrait/src/logical_plan/consumer/utils.rs: ## @@ -81,98 +81,167 @@ pub(super) fn next_struct_field_name( } } -pub(super) fn rename_field(

Re: [PR] feat: use spawned tasks to reduce call stack depth and avoid busy waiting [datafusion]

2025-06-11 Thread via GitHub
pepijnve commented on PR #16319: URL: https://github.com/apache/datafusion/pull/16319#issuecomment-2963861479 🤦‍♂️ that's not very useful now is it. I need a better machine to test on min/avg/max ``` 1887.14 / 2104.13 ±982.87 / 8971.27 ms │ 1893.19 / 1949.39 ±31.69 / 2057.5

Re: [PR] Add more context to error message for datafusion-cli config failure [datafusion]

2025-06-11 Thread via GitHub
blaginin commented on PR #16379: URL: https://github.com/apache/datafusion/pull/16379#issuecomment-2963679090 So strange! Passing for me locally as well, maybe some env variables.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] build: Disable some rounding tests when miri is enabled [datafusion-comet]

2025-06-11 Thread via GitHub
andygrove opened a new pull request, #1873: URL: https://github.com/apache/datafusion-comet/pull/1873 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Support datafusion-cli access to public S3 buckets that do not require authentication [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16300: URL: https://github.com/apache/datafusion/pull/16300#issuecomment-2963654832 There appears to be some problem with this code in PR: - https://github.com/apache/datafusion/issues/16378 I have another PR up to add some better error messages to debug: - ht

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2963432937 FYI @xudong963 -- perhaps you can send invites to @andygrove and myself when that is published -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-11 Thread via GitHub
xudong963 commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2963445724 Some errors: ``` Packaging datafusion-spark v48.0.0 (/Users/xudong/opensource/datafusion/datafusion/spark) Updating crates.io index Packaged 38 files, 14

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-11 Thread via GitHub
alamb commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2140697002 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::pretty::prett

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-11 Thread via GitHub
xudong963 commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2963519484 Okay, I'll do it tomorrow, with sqllogictest crate, (a bit late now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2963482791 > Some errors when I tried to pulish the spark crate: It looks like the datafusion-spark crate is missing some features of the `datafusion_functions` crate (`crypto`

Re: [PR] Update publish command [datafusion]

2025-06-11 Thread via GitHub
alamb merged PR #16377: URL: https://github.com/apache/datafusion/pull/16377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2963478363 > I see this as a Stream implementation problem. I see the wisdom of this view. In my mind I think the DataFusion philosophy is "keep the barrier to entry as low as

[PR] Update publish command [datafusion]

2025-06-11 Thread via GitHub
xudong963 opened a new pull request, #16377: URL: https://github.com/apache/datafusion/pull/16377 ## Which issue does this PR close? - Closes #. ## Rationale for this change The command is missing some crates and the order of it is wrong. ## What ch

[PR] Add more context to error message for datafusion-cli config failure [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new pull request, #16379: URL: https://github.com/apache/datafusion/pull/16379 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/16300 - Part of https://github.com/apache/datafusion/issues/16378 ## Rationale for this ch

Re: [PR] Update publish command [datafusion]

2025-06-11 Thread via GitHub
alamb commented on PR #16377: URL: https://github.com/apache/datafusion/pull/16377#issuecomment-2963613954 Thank you @xudong963 and @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-11 Thread via GitHub
alamb opened a new issue, #16374: URL: https://github.com/apache/datafusion/issues/16374 ### Is your feature request related to a problem or challenge? One of the common criticisms of parquet based query systems is that they don't have some particular type of index (e.g. HyperLogLog a

[I] datafusion-sqllogictest 48.0.0 can't be published [datafusion]

2025-06-11 Thread via GitHub
xudong963 opened a new issue, #16375: URL: https://github.com/apache/datafusion/issues/16375 ``` cd datafusion/sqllogictest && cargo publish Updating crates.io index error: all dependencies must have a version specified when publishing. dependency `datafusion-substrait` does n

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2963507764 @alamb our messages crossed paths in the ether. > I have somewhat lost track of what exactly you are proposing. Sorry, I usually do this around a white board. First

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
alamb commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2963459115 > At the risk of making myself unpopular, I feel it's relevant to share my findings with you guys. Not at all --- this is great stuff -- thank you @pepijnve for continuing t

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-11 Thread via GitHub
pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2963459881 @zhuqi-lucas it would be useful to have some additional voices in this discussion. I can share my opinion, but it's only one opinion. I feel like I'm just going to keep repeati

Re: [PR] Support `DISTINCT AS { STRUCT | VALUE }` for BigQuery [datafusion-sqlparser-rs]

2025-06-11 Thread via GitHub
iffyio merged PR #1880: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Allow `IF NOT EXISTS` after table name for Snowflake [datafusion-sqlparser-rs]

2025-06-11 Thread via GitHub
iffyio merged PR #1881: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Request to update crates.io ownership [datafusion]

2025-06-11 Thread via GitHub
andygrove commented on issue #16323: URL: https://github.com/apache/datafusion/issues/16323#issuecomment-2963353039 The `datafusion-spark` crate will be new in this release, so we'll need to update permissions on that once it is created. -- This is an automated message from the Apache Git

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-11 Thread via GitHub
alamb commented on PR #74: URL: https://github.com/apache/datafusion-site/pull/74#issuecomment-2963320742 Thank you @akurmustafa and @timsaucer I hope to publish these posts in a few days if there are no more comments -- This is an automated message from the Apache Git Service. To

Re: [I] Add tpch csv support to bench.sh [datafusion]

2025-06-11 Thread via GitHub
zhuqi-lucas commented on issue #16370: URL: https://github.com/apache/datafusion/issues/16370#issuecomment-2963309637 Submitted a PR for review: https://github.com/apache/datafusion/pull/16373 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-11 Thread via GitHub
alamb commented on code in PR #74: URL: https://github.com/apache/datafusion-site/pull/74#discussion_r2140516503 ## content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md: ## @@ -0,0 +1,249 @@ +--- +layout: post +title: Optimizing SQL (and DataFrames) in DataFusion, Part

[PR] feat: Support tpch and tpch10 csv format [datafusion]

2025-06-11 Thread via GitHub
zhuqi-lucas opened a new pull request, #16373: URL: https://github.com/apache/datafusion/pull/16373 ## Which issue does this PR close? - Closes [#16370](https://github.com/apache/datafusion/issues/16370) ## Rationale for this change 1. tpch data generate for csv forma

  1   2   >