Re: [I] Add AI tooling disclosure text to contributor guide and fields to PR templates [datafusion]

2025-10-18 Thread via GitHub
2010YOUY01 commented on issue #18095: URL: https://github.com/apache/datafusion/issues/18095#issuecomment-3419224247 > My two cents as a contributor: Let's remember that there's no guarantee that people will tell the truth on the internet. I don't think this will help much with the most pro

Re: [PR] refactor: remove core crate from datafusion-proto [datafusion]

2025-10-18 Thread via GitHub
Jefffrey merged PR #18123: URL: https://github.com/apache/datafusion/pull/18123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] remove `datafusion` dependency from `datafusion-proto` [datafusion]

2025-10-18 Thread via GitHub
Jefffrey closed issue #17713: remove `datafusion` dependency from `datafusion-proto` URL: https://github.com/apache/datafusion/issues/17713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [branch-50] Backport Fix bug in LimitPushPastWindows (#18029) [datafusion]

2025-10-18 Thread via GitHub
avantgardnerio merged PR #18107: URL: https://github.com/apache/datafusion/pull/18107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[I] Consider folding `CoalesceAsyncExecInput` physical optimizer rule into `CoalesceBatches` [datafusion]

2025-10-18 Thread via GitHub
Jefffrey opened a new issue, #18155: URL: https://github.com/apache/datafusion/issues/18155 ### Is your feature request related to a problem or challenge? These are quite similar, I wonder if we can embed the logic from `CoalesceAsyncExecInput` into existing rule `CoalesceBatches` to

Re: [PR] [branch-50] Backport Fix bug in LimitPushPastWindows (#18029) [datafusion]

2025-10-18 Thread via GitHub
akurmustafa commented on code in PR #18107: URL: https://github.com/apache/datafusion/pull/18107#discussion_r2442740494 ## datafusion/sqllogictest/test_files/window.slt: ## @@ -5966,8 +5966,8 @@ physical_plan 01)ProjectionExec: expr=[c1@2 as c1, c2@3 as c2, sum(test.c2) FILTER

Re: [PR] fix: `array_distinct` inner nullability causing type mismatch [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on PR #18104: URL: https://github.com/apache/datafusion/pull/18104#issuecomment-3419171390 I verified the new test added fail without code changes: ``` running 1 test test set_ops::tests::test_array_distinct_inner_nullability_result_type_match_return_

Re: [PR] fix: `array_distinct` inner nullability causing type mismatch [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on code in PR #18104: URL: https://github.com/apache/datafusion/pull/18104#discussion_r2442734737 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -6479,3 +6479,87 @@ async fn test_duplicate_state_fields_for_dfschema_construct() -> Result<()> { Ok(()) }

Re: [PR] fix: `array_distinct` inner nullability causing type mismatch [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on code in PR #18104: URL: https://github.com/apache/datafusion/pull/18104#discussion_r2442733990 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -6479,3 +6479,87 @@ async fn test_duplicate_state_fields_for_dfschema_construct() -> Result<()> { Ok(()) }

Re: [PR] feat: Add array concatenation support to concat function [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on code in PR #18137: URL: https://github.com/apache/datafusion/pull/18137#discussion_r2442730061 ## datafusion/functions/src/string/concat.rs: ## @@ -90,23 +191,124 @@ impl ScalarUDFImpl for ConcatFunc { fn return_type(&self, arg_types: &[DataType]) ->

Re: [PR] fix: `array_distinct` inner nullability causing type mismatch [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on code in PR #18104: URL: https://github.com/apache/datafusion/pull/18104#discussion_r2442732221 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -563,3 +555,54 @@ fn general_array_distinct( array.nulls().cloned(), )?)) } + +#[cfg(test)] +mod

Re: [PR] fix: `array_distinct` inner nullability causing type mismatch [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on code in PR #18104: URL: https://github.com/apache/datafusion/pull/18104#discussion_r2442731100 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -290,10 +290,14 @@ impl ScalarUDFImpl for ArrayDistinct { fn return_type(&self, arg_types: &[DataType]) -

[PR] fix: SQL & Dataframe links broken on root readme [datafusion]

2025-10-18 Thread via GitHub
shanaya-Gupta opened a new pull request, #18154: URL: https://github.com/apache/datafusion/pull/18154 ### Fix for Issue #18153 **Issue:** SQL & Dataframe links broken on root readme **Changes:** - Modified: `README.md` *Please review carefully before merging.*

[I] SQL & Dataframe links broken on root readme [datafusion]

2025-10-18 Thread via GitHub
Jefffrey opened a new issue, #18153: URL: https://github.com/apache/datafusion/issues/18153 https://github.com/apache/datafusion/blob/b98cad616ad9c69df9a425fc7473b799ffc258ee/README.md?plain=1#L67 https://github.com/user-attachments/assets/79e49d39-dbee-424f-b8eb-1d5ac7d43eee"; />

Re: [PR] Implementing partition_statistics for EmptyExec (Issue #15873) [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] commented on PR #16941: URL: https://github.com/apache/datafusion/pull/16941#issuecomment-3419142379 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] We have now the CI ensure all doc strings remain formatted [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] commented on PR #16916: URL: https://github.com/apache/datafusion/pull/16916#issuecomment-3419142413 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] closed pull request #16876: Perf: Optimize vectorized append function URL: https://github.com/apache/datafusion/pull/16876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: implement partition_statistics for HashJoinExec [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] commented on PR #16956: URL: https://github.com/apache/datafusion/pull/16956#issuecomment-3419142348 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: `PushDownFilter` for `GROUP BY` on uppercase col names [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] closed pull request #16049: fix: `PushDownFilter` for `GROUP BY` on uppercase col names URL: https://github.com/apache/datafusion/pull/16049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] minor: improve format [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] closed pull request #16898: minor: improve format URL: https://github.com/apache/datafusion/pull/16898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] fix: `array_distinct` inner nullability causing type mismatch [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on code in PR #18104: URL: https://github.com/apache/datafusion/pull/18104#discussion_r2442715208 ## datafusion/functions-nested/src/set_ops.rs: ## @@ -290,10 +290,14 @@ impl ScalarUDFImpl for ArrayDistinct { fn return_type(&self, arg_types: &[DataType]

Re: [PR] Fix `DISTINCT ON` for tables with no columns (ReplaceDistinctWithAggregate: do not fail when on input without columns) [datafusion]

2025-10-18 Thread via GitHub
Jefffrey merged PR #18133: URL: https://github.com/apache/datafusion/pull/18133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] `DISTINCT` fails without columns in the `replace_distinct_aggregate` rule [datafusion]

2025-10-18 Thread via GitHub
Jefffrey closed issue #18132: `DISTINCT` fails without columns in the `replace_distinct_aggregate` rule URL: https://github.com/apache/datafusion/issues/18132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add PostgreSQL-style named arguments support for scalar functions [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on code in PR #18019: URL: https://github.com/apache/datafusion/pull/18019#discussion_r2442705784 ## datafusion/expr-common/src/signature.rs: ## @@ -996,13 +1159,119 @@ impl Signature { }, ), volatility, +

Re: [PR] chore: fix wasm-pack installation link in wasmtest README [datafusion]

2025-10-18 Thread via GitHub
alamb merged PR #17704: URL: https://github.com/apache/datafusion/pull/17704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] DRAFT - Adaptive `MinMaxBytesAccumulator` with Mode-Sensitive Dense/Sparse Processing and Comprehensive Criterion Benchmarks [datafusion]

2025-10-18 Thread via GitHub
kosiew opened a new pull request, #18006: URL: https://github.com/apache/datafusion/pull/18006 ## **Which issue does this PR close?** Closes #17897 **Summary:** `MinMaxBytesAccumulator::update_batch` previously allocated a `locations` buffer sized to `total_num_groups` for e

Re: [PR] chore: upgrade sqlparser [datafusion]

2025-10-18 Thread via GitHub
comphead merged PR #17925: URL: https://github.com/apache/datafusion/pull/17925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Improve performance of queries of the form `SELECT *, CASE ... END` [datafusion]

2025-10-18 Thread via GitHub
pepijnve commented on issue #18056: URL: https://github.com/apache/datafusion/issues/18056#issuecomment-3402148851 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] refactor: remove unused `type_coercion/aggregate.rs` functions [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on code in PR #18091: URL: https://github.com/apache/datafusion/pull/18091#discussion_r2437951954 ## datafusion/expr-common/src/type_coercion/aggregates.rs: ## @@ -16,31 +16,11 @@ // under the License. use crate::signature::TypeSignature; -use arrow::datat

Re: [PR] fix: Use dynamic timezone in now() function for accurate timestamp [datafusion]

2025-10-18 Thread via GitHub
Omega359 commented on PR #18017: URL: https://github.com/apache/datafusion/pull/18017#issuecomment-3407277942 This is looking good. I'd like to see an addition to the upgrade guide as this currently will be either a slight change in behaviour (timezone of None previously vs now it'll be Som

Re: [I] Implement GroupsAccumulator for array_agg aggregation function [datafusion]

2025-10-18 Thread via GitHub
vegarsti commented on issue #10145: URL: https://github.com/apache/datafusion/issues/10145#issuecomment-3348649325 Indeed! https://github.com/apache/datafusion/issues/17446#issuecomment-3348641092 Sorry for the distraction! -- This is an automated message from the Apache Git Service. To

Re: [PR] Introduce `expr_fields` to `AccumulatorArgs` to hold input argument fields [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on PR #18100: URL: https://github.com/apache/datafusion/pull/18100#issuecomment-3411530631 fyi @kosiew I tried implementing like this and it seems like no issues with regressions, thoughts on if this fix is simpler? -- This is an automated message from the Apach

Re: [I] Release DataFusion `50.1.0` (minor) [datafusion]

2025-10-18 Thread via GitHub
alamb commented on issue #17594: URL: https://github.com/apache/datafusion/issues/17594#issuecomment-3353240833 > Should I use the same template for the `50.2.0` issue? Yes please -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] docs: refine `AggregateUDFImpl::is_ordered_set_aggregate` documentation [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on PR #17805: URL: https://github.com/apache/datafusion/pull/17805#issuecomment-3419105018 I've updated the docstring per my latest findings; though I wonder if we are better off renaming `is_ordered_set_aggregate` to something like `supports_within_group_clause` instead

Re: [PR] feat: expose DataFrame.write_table [datafusion-python]

2025-10-18 Thread via GitHub
kosiew commented on code in PR #1264: URL: https://github.com/apache/datafusion-python/pull/1264#discussion_r2423428050 ## python/datafusion/dataframe.py: ## @@ -1206,3 +1265,48 @@ def fill_null(self, value: Any, subset: list[str] | None = None) -> DataFrame: - Fo

Re: [I] to_timestamp(double) gives different results depending on scalar/vectorized call context [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on issue #16678: URL: https://github.com/apache/datafusion/issues/16678#issuecomment-3414812439 Thanks for checking this @dqkqd Do you think you can check if we have an SLT test for this? If not could raise a PR with the test case so we can close this issue -- T

Re: [I] Add different configs for topk/join dynamic filter [datafusion]

2025-10-18 Thread via GitHub
xudong963 closed issue #18071: Add different configs for topk/join dynamic filter URL: https://github.com/apache/datafusion/issues/18071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[I] Add tracing to MemoryConsumer [datafusion]

2025-10-18 Thread via GitHub
andygrove opened a new issue, #17901: URL: https://github.com/apache/datafusion/issues/17901 ### Is your feature request related to a problem or challenge? To help debug OOM issues in Comet, I would like to add trace logging to `MemoryReservation` to record every call that changes the

Re: [PR] Case evaluation improvements [datafusion]

2025-10-18 Thread via GitHub
alamb commented on PR #17898: URL: https://github.com/apache/datafusion/pull/17898#issuecomment-3367176501 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1016-gcp #17~

Re: [PR] [WIP] chore: Make COMET_EXPLAIN_TRANSFORMATIONS behavior consistent [datafusion-comet]

2025-10-18 Thread via GitHub
codecov-commenter commented on PR #2564: URL: https://github.com/apache/datafusion-comet/pull/2564#issuecomment-3397018561 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2564?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.23 [datafusion-sandbox]

2025-10-18 Thread via GitHub
dependabot[bot] commented on PR #28: URL: https://github.com/apache/datafusion-sandbox/pull/28#issuecomment-3381093909 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the a

[PR] fix: Maintain `SUM` precision during two-phase aggregation [datafusion]

2025-10-18 Thread via GitHub
rkrishn7 opened a new pull request, #17815: URL: https://github.com/apache/datafusion/pull/17815 ## Which issue does this PR close? - Closes #17699 ## What changes are included in this PR? - Allows configuration of Sum aggregate UDF to maintain decimal precision

Re: [PR] #17801 Improve nullability reporting of case expressions [datafusion]

2025-10-18 Thread via GitHub
pepijnve commented on PR #17813: URL: https://github.com/apache/datafusion/pull/17813#issuecomment-3386914260 > Finding any way to make this easier to maintain / understand would be most appreciated I'll get there eventually. Latest commit tightens the code up further. -- This is a

Re: [I] Include GLIBC version with provided jars [datafusion-comet]

2025-10-18 Thread via GitHub
martin-g commented on issue #2504: URL: https://github.com/apache/datafusion-comet/issues/2504#issuecomment-3356286044 Building on old OS should work. This is the solution explained at https://kobzol.github.io/rust/ci/2021/05/07/building-rust-binaries-in-ci-that-work-with-older-glibc.htm

Re: [I] Support Push down expression evaluation in `TableProviders` [datafusion]

2025-10-18 Thread via GitHub
adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-3412932134 So basically we already have a projection pushdown physical optimizer rule: https://github.com/apache/datafusion/blob/337378ab81f6c7dab7da9000124c554d3b7ee568/datafusion/

Re: [PR] FileScanConfig: Preserve schema metadata across ser/de boundary [datafusion]

2025-10-18 Thread via GitHub
mach-kernel commented on PR #17966: URL: https://github.com/apache/datafusion/pull/17966#issuecomment-3382499011 > one other question for you @mach-kernel as you posted this in ballista group, do you need this to get to ballista 50? if so maybe we could back-port to datafusion 50.2 if @xudo

Re: [PR] fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct [datafusion]

2025-10-18 Thread via GitHub
zhuqi-lucas commented on PR #17706: URL: https://github.com/apache/datafusion/pull/17706#issuecomment-3318285891 Thank you @alamb @xudong963 for review. I agree @alamb , i will fix this PR, and then do follow-up. > Thanks @zhuqi-lucas > > This looks similar to a PR from #

Re: [PR] feat: support spark udf format_string [datafusion]

2025-10-18 Thread via GitHub
alamb merged PR #17561: URL: https://github.com/apache/datafusion/pull/17561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): bump actions/stale from 10.0.0 to 10.1.0 [datafusion-sandbox]

2025-10-18 Thread via GitHub
dependabot[bot] commented on PR #25: URL: https://github.com/apache/datafusion-sandbox/pull/25#issuecomment-3371386727 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the a

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-10-18 Thread via GitHub
mbutrovich commented on PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#issuecomment-3353068211 Results attached from the benchmark I added to `CometReadBenchmark`, and a small chart with highlights to see what the overhead of encryption is for the various readers.

Re: [I] `SortMergeJoinExec` fails to allocate memory but should spill instead [datafusion-comet]

2025-10-18 Thread via GitHub
comphead commented on issue #2452: URL: https://github.com/apache/datafusion-comet/issues/2452#issuecomment-3331130555 Thanks @andygrove might be related to https://github.com/apache/datafusion/pull/11218 Some ongoing spilling work also https://github.com/apache/datafusion/issues/17

Re: [PR] docs: Split configuration guide into different sections (scan, exec, shuffle, etc) [datafusion-comet]

2025-10-18 Thread via GitHub
andygrove commented on PR #2568: URL: https://github.com/apache/datafusion-comet/pull/2568#issuecomment-3403725181 > Thanks for arranging this, @andygrove! I am trying to come up with a category for all of the "explain transformations" and "explain native" besides "exec" but I think it's g

Re: [PR] Update extended tests with new results [datafusion-testing]

2025-10-18 Thread via GitHub
alamb commented on code in PR #14: URL: https://github.com/apache/datafusion-testing/pull/14#discussion_r2434276737 ## data/sqlite/random/expr/slt_good_103.slt: ## @@ -50557,12 +50557,10 @@ SELECT - 1 * + - 75 AS col2 75 -# Postgresql - Postgres error: db error: ERROR:

Re: [PR] feat: optimize and unparse grouping [datafusion]

2025-10-18 Thread via GitHub
Slimsammylim commented on PR #16161: URL: https://github.com/apache/datafusion/pull/16161#issuecomment-3366066625 Hi, I ran my code using this branch and unfortunately it did not solve my issue (https://github.com/apache/datafusion/issues/16590). -- This is an automated message from the A

Re: [PR] chore: Delete unused code [datafusion-comet]

2025-10-18 Thread via GitHub
mbutrovich merged PR #2565: URL: https://github.com/apache/datafusion-comet/pull/2565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] docs: Update HOWTOs for adding new functions [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on code in PR #18089: URL: https://github.com/apache/datafusion/pull/18089#discussion_r2434383753 ## docs/source/contributor-guide/howtos.md: ## @@ -152,15 +183,3 @@ valid installation of [protoc] (see [installation instructions] for details). [protoc]: h

Re: [PR] fix: Use dynamic timezone in now() function for accurate timestamp [datafusion]

2025-10-18 Thread via GitHub
Weijun-H commented on code in PR #18017: URL: https://github.com/apache/datafusion/pull/18017#discussion_r2431557975 ## datafusion/functions/src/datetime/now.rs: ## @@ -54,6 +57,15 @@ impl NowFunc { Self { signature: Signature::nullary(Volatility::Stable),

Re: [PR] fix: Fix regression with plan stability tests in CI [WIP] [datafusion-comet]

2025-10-18 Thread via GitHub
andygrove commented on PR #2492: URL: https://github.com/apache/datafusion-comet/pull/2492#issuecomment-3353547528 I'm seeing a difference in the explain plan (but not the simplified plan) for tpc-ds q9 with Spark 3.5 ``` expected:<...m#35, count#36, sum#[37, count#38] but was

Re: [I] date_part is calculating results incorrectly for intervals [datafusion]

2025-10-18 Thread via GitHub
Omega359 commented on issue #14817: URL: https://github.com/apache/datafusion/issues/14817#issuecomment-3378617568 This issue has been resolved with arrow 55.0.0 via https://github.com/apache/arrow-rs/pull/7189 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] add msrvcheck [datafusion-ballista]

2025-10-18 Thread via GitHub
killzoner commented on PR #1328: URL: https://github.com/apache/datafusion-ballista/pull/1328#issuecomment-3409805028 > > Example of failing check with MSRV downgraded: > > [killzoner@6d7022f](https://github.com/killzoner/datafusion-ballista/commit/6d7022f9f196f605b0eb4e91c947ef903263ee4

Re: [PR] feat:support_integral_decimal_cast_native_impl [datafusion-comet]

2025-10-18 Thread via GitHub
andygrove commented on code in PR #2472: URL: https://github.com/apache/datafusion-comet/pull/2472#discussion_r2404680751 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1464,6 +1465,104 @@ where cast_float_to_string!(from, _eval_mode, f32, Float32Array, OffsetSiz

Re: [PR] [WIP] Upgrade to arrow/parquet 57.0.0 [datafusion]

2025-10-18 Thread via GitHub
alamb commented on PR #17888: URL: https://github.com/apache/datafusion/pull/17888#issuecomment-3377712074 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_upgrade_arrow_57 Benchmark clickbench_extended.json ---

Re: [PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.5 [datafusion-sandbox]

2025-10-18 Thread via GitHub
dependabot[bot] commented on PR #13: URL: https://github.com/apache/datafusion-sandbox/pull/13#issuecomment-3327962990 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the a

Re: [PR] Add support for schema-scoped table functions [datafusion]

2025-10-18 Thread via GitHub
Omega359 commented on PR #18022: URL: https://github.com/apache/datafusion/pull/18022#issuecomment-3411783809 FYI - the FunctionRegistry trait [already exists](https://github.com/apache/datafusion/blob/b1723e5c6a6700ba939b03319377830511719aa2/datafusion/expr/src/registry.rs#L29C11-L29C28)

Re: [PR] Short circuit complex case evaluation modes as soon as possible [datafusion]

2025-10-18 Thread via GitHub
pepijnve commented on code in PR #17898: URL: https://github.com/apache/datafusion/pull/17898#discussion_r2433189254 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -208,10 +208,19 @@ impl CaseExpr { let mut current_value = new_null_array(&return_type, batch.

Re: [PR] Support `JOIN`, `PIVOT` pipe operators [datafusion]

2025-10-18 Thread via GitHub
simonvandel commented on PR #17365: URL: https://github.com/apache/datafusion/pull/17365#issuecomment-3369258039 Merged up from main, ready for review. @Jefffrey has been so kind reviewing my previous pipe operator PRs, so I'm trying my luck again. -- This is an automated message from

Re: [PR] feat: add fp16 support to Substrait [datafusion]

2025-10-18 Thread via GitHub
westonpace commented on PR #18086: URL: https://github.com/apache/datafusion/pull/18086#issuecomment-3408559574 I cannot add the api change label to this. Can someone do that for me? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] fix: ignore `DataType::Null` in possible types during csv type inference [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on PR #17796: URL: https://github.com/apache/datafusion/pull/17796#issuecomment-3346731891 I've just realized that returning `Utf8` for columns with only nulls (or empty files) causes schema mismatch when reading folders containing those files along with normal files. So

Re: [PR] fix: Deduplicate `collect_left_input` physical expression evaluation [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3388029571 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] chore: rename Schema `print_schema_tree` to `tree_string` [datafusion]

2025-10-18 Thread via GitHub
comphead commented on PR #17919: URL: https://github.com/apache/datafusion/pull/17919#issuecomment-3368413176 The method was added recently and not yet in use by users -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Dedicated machine / setup for running benchmarks [datafusion]

2025-10-18 Thread via GitHub
alamb commented on issue #18115: URL: https://github.com/apache/datafusion/issues/18115#issuecomment-3414461925 @rluvaton is there some link that describes the benchmarking setup for node.js? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] feat:support ansi mode rounding function [datafusion-comet]

2025-10-18 Thread via GitHub
andygrove commented on code in PR #2542: URL: https://github.com/apache/datafusion-comet/pull/2542#discussion_r2424676096 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -3017,6 +3017,37 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [D] datafusion mascot when? [datafusion]

2025-10-18 Thread via GitHub
GitHub user coracuity added a comment to the discussion: datafusion mascot when? Datafusion Fennec Fox? ![Fennec Fox](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Fennec_Fox_Vulpes_zerda.jpg/1280px-Fennec_Fox_Vulpes_zerda.jpg) GitHub link: https://github.com/apache/datafusion/dis

Re: [PR] feat: implement GroupArrayAggAccumulator attempt 3 [datafusion]

2025-10-18 Thread via GitHub
alamb commented on code in PR #17915: URL: https://github.com/apache/datafusion/pull/17915#discussion_r2403910495 ## datafusion/functions-aggregate-common/src/aggregate/array_agg.rs: ## @@ -0,0 +1,224 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] fix: [branch-0.10][iceberg] Close reader instance in ReadConf [datafusion-comet]

2025-10-18 Thread via GitHub
codecov-commenter commented on PR #2535: URL: https://github.com/apache/datafusion-comet/pull/2535#issuecomment-3372364715 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2535?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: implement_comet_native_lpad_expr [datafusion-comet]

2025-10-18 Thread via GitHub
coderfender commented on code in PR #2102: URL: https://github.com/apache/datafusion-comet/pull/2102#discussion_r2395667394 ## native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs: ## @@ -28,17 +28,22 @@ use std::sync::Arc; const SPACE: &str = " "; /// S

Re: [PR] More decimal 32/64 support - type coercsion and misc gaps [datafusion]

2025-10-18 Thread via GitHub
AdamGS commented on code in PR #17808: URL: https://github.com/apache/datafusion/pull/17808#discussion_r2387477540 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -357,6 +357,14 @@ fn math_decimal_coercion( | (Decimal256(_, _), Decimal256(_, _)) => {

Re: [PR] fix(agg/corr): return NULL when variance is zero or samples < 2 [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on PR #17621: URL: https://github.com/apache/datafusion/pull/17621#issuecomment-3322102855 Thanks @killme2008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[PR] feat: Change default off-heap memory from from `greedy_unified` to `fair_unified` [datafusion-comet]

2025-10-18 Thread via GitHub
andygrove opened a new pull request, #2526: URL: https://github.com/apache/datafusion-comet/pull/2526 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/2452 ## Rationale for this change ## What changes are included

Re: [PR] Metadata handling announcement [datafusion-site]

2025-10-18 Thread via GitHub
2010YOUY01 commented on code in PR #73: URL: https://github.com/apache/datafusion-site/pull/73#discussion_r2367314150 ## content/blog/2025-09-21-custom-types-using-metadata.md: ## @@ -0,0 +1,296 @@ +--- +layout: post +title: Custom types in DataFusion using Metadata +date: 2025-

Re: [PR] [forward port] Change version to 50.1.0 and add changelog (#17748) [datafusion]

2025-10-18 Thread via GitHub
xudong963 commented on code in PR #17826: URL: https://github.com/apache/datafusion/pull/17826#discussion_r2396875568 ## Cargo.toml: ## Review Comment: Other places in the file also need to change ``` datafusion = { path = "datafusion/core", version = "50.0.0", defau

Re: [PR] Chore: Fix Scala code warnings - common module [datafusion-comet]

2025-10-18 Thread via GitHub
comphead commented on code in PR #2527: URL: https://github.com/apache/datafusion-comet/pull/2527#discussion_r2410864704 ## common/src/main/scala/org/apache/spark/sql/comet/util/Utils.scala: ## @@ -223,9 +223,9 @@ object Utils extends CometTypeShim { writer.close()

Re: [PR] Support Schema Field Metadata in User-Defined Aggregate Functions (UDAFs) [datafusion]

2025-10-18 Thread via GitHub
Jefffrey commented on PR #17085: URL: https://github.com/apache/datafusion/pull/17085#issuecomment-3379521826 > Not at all 😄 > > * `acc_args.schema.field(i)` — returns the raw Arrow `Field` from the (physical) input schema at position `i` (name, type, nullability, metadata exactl

Re: [I] High compile time of crates using sqlparser(codegen phase) - any way to reduce generated code?​​ [datafusion-sqlparser-rs]

2025-10-18 Thread via GitHub
echou commented on issue #2066: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2066#issuecomment-3396632032 yes, my project uses vistor feature heavily. The crates not using sqlparser are quite fast to compile. -- This is an automated message from the Apache Git Service.

Re: [PR] Add trace of consumers to OOM error messages [datafusion]

2025-10-18 Thread via GitHub
wiedld commented on code in PR #17943: URL: https://github.com/apache/datafusion/pull/17943#discussion_r2413065243 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1301,7 +1303,9 @@ impl FileSink for ParquetSink { let props = parquet_props.clone();

[I] Improve fallback info [datafusion-comet]

2025-10-18 Thread via GitHub
wForget opened a new issue, #2449: URL: https://github.com/apache/datafusion-comet/issues/2449 ### What is the problem the feature request solves? I found some fallback info improvements while testing #2444: + Ignore unsupported info for `CometSparkRowToColumnar`: `CometSparkRo

Re: [PR] feat(cli): support external tables on multiple locations [datafusion]

2025-10-18 Thread via GitHub
fpetkovski commented on PR #17702: URL: https://github.com/apache/datafusion/pull/17702#issuecomment-3348817766 Thank you @alamb. I am aware of the CI issues and plan to address them. I was hoping to first get feedback on the overall direction. -- This is an automated message from the Apa

Re: [PR] feat:add_additional_char_support_rpad [datafusion-comet]

2025-10-18 Thread via GitHub
comphead commented on code in PR #2436: URL: https://github.com/apache/datafusion-comet/pull/2436#discussion_r2376630658 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -408,13 +408,23 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

[I] ListingTable provider does not prune partitions when not filters are supplied [datafusion]

2025-10-18 Thread via GitHub
peasee opened a new issue, #17957: URL: https://github.com/apache/datafusion/issues/17957 ### Describe the bug When the `ListingTable` provider performs a scan, it does not prune any partitions when there are no filters supplied. If there are partitions present that do not matc

Re: [PR] feat: expose DataFrame.write_table [datafusion-python]

2025-10-18 Thread via GitHub
Copilot commented on code in PR #1264: URL: https://github.com/apache/datafusion-python/pull/1264#discussion_r2411364846 ## python/tests/test_dataframe.py: ## @@ -58,9 +60,7 @@ def ctx(): @pytest.fixture -def df(): -ctx = SessionContext() - +def df(ctx): Review Commen

Re: [PR] Freeze PyO3 wrappers & introduce interior mutability to avoid PyO3 borrow errors [datafusion-python]

2025-10-18 Thread via GitHub
timsaucer merged PR #1253: URL: https://github.com/apache/datafusion-python/pull/1253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] dev: Add typos check to the local `dev/rust_lint.sh` [datafusion]

2025-10-18 Thread via GitHub
2010YOUY01 commented on code in PR #17863: URL: https://github.com/apache/datafusion/pull/17863#discussion_r2395072894 ## dev/rust_lint.sh: ## @@ -19,6 +19,10 @@ # This script runs all the Rust lints locally the same way the # DataFusion CI does +# +# Note: The installed che

Re: [PR] Push Down Filter Subexpressions in Nested Loop Joins as Projections [datafusion]

2025-10-18 Thread via GitHub
alamb merged PR #17906: URL: https://github.com/apache/datafusion/pull/17906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: optimizer `common_sub_expression_eliminate` fails in a window function [datafusion]

2025-10-18 Thread via GitHub
dqkqd commented on PR #17852: URL: https://github.com/apache/datafusion/pull/17852#issuecomment-3367808528 Thanks. I disable the rule and the test failed without the change: ``` Completed 355 test files in 16 seconds

[I] Improve performance of queries of the form `SELECT *, CASE ... END` [datafusion]

2025-10-18 Thread via GitHub
pepijnve opened a new issue, #18056: URL: https://github.com/apache/datafusion/issues/18056 ### Is your feature request related to a problem or challenge? When enriching a relation with a classification derived using complex `CASE` expressions performance can be quite slow. ###

Re: [PR] feat:support_integral_decimal_cast_native_impl [datafusion-comet]

2025-10-18 Thread via GitHub
andygrove commented on code in PR #2472: URL: https://github.com/apache/datafusion-comet/pull/2472#discussion_r2417199199 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -1232,7 +1244,6 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHel

Re: [PR] Eliminate Self Joins [datafusion]

2025-10-18 Thread via GitHub
github-actions[bot] closed pull request #16023: Eliminate Self Joins URL: https://github.com/apache/datafusion/pull/16023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] `CatalogProvider` errors are badly mangled [datafusion-python]

2025-10-18 Thread via GitHub
mesejo commented on issue #1226: URL: https://github.com/apache/datafusion-python/issues/1226#issuecomment-3323898054 @colinmarc Sorry for the late reply. After thinking about it, I've concluded that the best place for this change to happen is the upstream repo. I have opened an issue for

Re: [PR] Clarify documentation that ScalarUDFImpl::simplity must not change the schema [datafusion]

2025-10-18 Thread via GitHub
alamb commented on PR #17981: URL: https://github.com/apache/datafusion/pull/17981#issuecomment-3387119655 > Also do we need to add similar note for udaf/udwf or its only for scalar functions? Good call, added -- This is an automated message from the Apache Git Service. To resp

Re: [PR] feat: Support swap for `RightMark` Join [datafusion]

2025-10-18 Thread via GitHub
comphead commented on code in PR #17651: URL: https://github.com/apache/datafusion/pull/17651#discussion_r2389544570 ## datafusion/physical-plan/src/joins/sort_merge_join/tests.rs: ## @@ -1314,6 +1314,38 @@ async fn join_left_mark() -> Result<()> { Ok(()) } +#[tokio::tes

  1   2   3   4   5   6   7   8   9   10   >