[PR] Fix Partial Sort Get Slice Point Between Batches [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada opened a new pull request, #16881: URL: https://github.com/apache/datafusion/pull/16881 ## Which issue does this PR close? - Closes #. ## Rationale for this change PartialSortExec had a missing functionality where it failed to detect slice p

[PR] chore(deps): bump rand from 0.9.1 to 0.9.2 [datafusion]

2025-07-24 Thread via GitHub
dependabot[bot] opened a new pull request, #16882: URL: https://github.com/apache/datafusion/pull/16882 Bumps [rand](https://github.com/rust-random/rand) from 0.9.1 to 0.9.2. Changelog Sourced from https://github.com/rust-random/rand/blob/master/CHANGELOG.md";>rand's changelog.

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227706228 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -140,3 +210,19 @@ impl SpillManager { Ok(spawn_buffered(stream, self.batch_read_buffer_capa

Re: [PR] test: Fix flaky join tests [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 commented on code in PR #16860: URL: https://github.com/apache/datafusion/pull/16860#discussion_r2227716102 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4164,23 +4164,40 @@ AS VALUES (3, 3, true), (3, 3, false); -query B -SELECT * FROM t0 FULL JOIN

Re: [PR] test: Fix flaky join tests [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 merged PR #16860: URL: https://github.com/apache/datafusion/pull/16860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2227706228 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -140,3 +210,19 @@ impl SpillManager { Ok(spawn_buffered(stream, self.batch_read_buffer_capa

Re: [I] Optimize concatenation of complex data type, such as list, struct [datafusion]

2025-07-24 Thread via GitHub
zhuqi-lucas commented on issue #16838: URL: https://github.com/apache/datafusion/issues/16838#issuecomment-3112288717 > Thanks [@zhuqi-lucas](https://github.com/zhuqi-lucas) .Our scenario is `list(struct{})`, and the inner fields of struct are like: > > let schema = Arc::new(Schema::n

[I] panic when run `regx` benchmark [datafusion]

2025-07-24 Thread via GitHub
waynexia opened a new issue, #16879: URL: https://github.com/apache/datafusion/issues/16879 ### Describe the bug I'm trying follow up https://github.com/apache/datafusion/pull/13364, and encounter a panic with `cargo bench --bench regx`: ``` cargo bench --bench regx

Re: [PR] feat(spark): implement Spark datetime function last_day [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16828: URL: https://github.com/apache/datafusion/pull/16828#discussion_r2227624685 ## datafusion/sqllogictest/test_files/spark/datetime/last_day.slt: ## @@ -21,7 +21,80 @@ # For more information, please see: # https://github.com/apache/d

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on PR #16848: URL: https://github.com/apache/datafusion/pull/16848#issuecomment-3112363922 Hi @alamb, regarding [the suggestion](https://github.com/apache/datafusion/pull/16580#discussion_r2173716470), I’ve already incorporated those tests into the SLT file. Let me kn

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2227678617 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,134 @@ # specific language governing permissions and limitations # under

[PR] test: fix more flaky join tests [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 opened a new pull request, #16880: URL: https://github.com/apache/datafusion/pull/16880 ## Which issue does this PR close? - Closes #. ## Rationale for this change My local dev branch triggers a heisenbug: ``` cargo test --features backtra

[PR] Mutable Join Unwind [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada opened a new pull request, #16883: URL: https://github.com/apache/datafusion/pull/16883 ## Which issue does this PR close? - Closes #. ## Rationale for this change When implementing `Stream`s with `SpawnedTask` objects, the `poll_next` requi

Re: [I] [Bug] Aggregate + TopK fails when asc = false [datafusion]

2025-07-24 Thread via GitHub
zhuqi-lucas commented on issue #16837: URL: https://github.com/apache/datafusion/issues/16837#issuecomment-3112629819 Updated here, if we remove UTC to None, it works well: ```rust use std::sync::Arc; use arrow::array::{Int32Array, RecordBatch, TimestampMillisecondArray}; use

[I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub
xudong963 opened a new issue, #16884: URL: https://github.com/apache/datafusion/issues/16884 ```sql > CREATE TABLE t1 AS VALUES ('2021', 3, 'A'), ('2022', 4, 'B'), ('2023', 5, 'C'); 0 row(s) fetched. Elapsed 0.014 seconds. > descri

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub
NGA-TRAN commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2229483527 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?]

[I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton opened a new issue, #16904: URL: https://github.com/apache/datafusion/issues/16904 ### Is your feature request related to a problem or challenge? Yes, debugging memory problems are hard, when running DF in production and the memory pool does not able to grow the memory it wil

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub
GitHub user alamb added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Yes, please, I actually did some testing today, - https://github.com/apache/datafusion/issues/16899 - https://github.com/apache/datafusion/pull/16900 What I would

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2229471200 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?])),

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-24 Thread via GitHub
parthchandra commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3114897910 > managed to scan the map-type by setting `CometConf.COMET_NATIVE_SCAN_IMPL.key -> native_datafusion `. Added `map_sort` UDF with return type as `Map`. Right. `

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229547803 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-24 Thread via GitHub
adamreeve commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3114903866 > Thanks [@XiangpengHao](https://github.com/XiangpengHao) -- do you think we should disable the crypto feature by default? > > cc [@corwinjoy](https://github.com/corwinj

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
blaginin commented on PR #16901: URL: https://github.com/apache/datafusion/pull/16901#issuecomment-3114808149 would love to help! feel free ping in discord -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
adriangb merged PR #16901: URL: https://github.com/apache/datafusion/pull/16901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
Omega359 commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229497695 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] dissallow pushdown of volatile PhysicalExprs [datafusion]

2025-07-24 Thread via GitHub
adriangb commented on code in PR #16861: URL: https://github.com/apache/datafusion/pull/16861#discussion_r2229503058 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -485,21 +497,32 @@ fn push_down_filters( // currently. `self_filters` are the predicates w

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3115074500 The problem with that is there is no breakdown on what the memory is actually spent on in each consumer -- This is an automated message from the Apache Git Service. To respo

Re: [I] [Blog] Async Scalar User Defined Functions [datafusion]

2025-07-24 Thread via GitHub
Adez017 commented on issue #16525: URL: https://github.com/apache/datafusion/issues/16525#issuecomment-3115698018 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Fix integration tests not running [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on code in PR #16835: URL: https://github.com/apache/datafusion/pull/16835#discussion_r2229960446 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -68,6 +68,8 @@ pub trait SchemaAdapterFactory: Debug + Send + Sync + 'static { ) -> Box { self.

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-24 Thread via GitHub
ding-young commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3115960659 It might be helpful to add a brief description in `benchmarks/README.md`. Also, once this PR is merged, I'll follow up by adding nlj benchmarks to the memory profiling utility (in

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
shehabgamin commented on PR #16848: URL: https://github.com/apache/datafusion/pull/16848#issuecomment-3116003842 > Thank you @Standing-Man -- this looks good to me > > > > @shehabgamin does this look good to you (at a high level)? Will review when I'm home in the next fe

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on code in PR #16885: URL: https://github.com/apache/datafusion/pull/16885#discussion_r2229970123 ## datafusion/spark/src/function/math/hex.rs: ## @@ -212,6 +215,16 @@ pub fn compute_hex( Ok(ColumnarValue::Array(Arc::new(hexed)))

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230257567 ## datafusion/spark/src/function/string/luhn_check.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230260434 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -35,3 +35,140 @@ ## PySpark 3.5.5 Result: {'luhn_check(8112189876)': True, 'typeof(

[PR] Create 2025-07-25-async-scaler-udf.md [datafusion-site]

2025-07-24 Thread via GitHub
Adez017 opened a new pull request, #96: URL: https://github.com/apache/datafusion-site/pull/96 Hi @alamb, just finished drafting the basic post with all the things you had mentioned in [#16525](https://github.com/apache/datafusion/issues/16525) . I need you to review for further updates tha

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-24 Thread via GitHub
XiangpengHao commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3114784223 > > Update 1: I have to disable the `encryption` feature in Parquet to make it work: https://github.com/apache/datafusion/blob/main/Cargo.toml#L162 > > Thanks [@Xiang

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16456: URL: https://github.com/apache/datafusion/pull/16456#issuecomment-3114784536 0.58.0 is released: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-3114709826 -- This is an automated message from the Apache Git Service. To respond to t

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files > Yes, please, I actually did some testing today, > > * [Entire input is resorted when the data is partially sorted (not using > `PartialSortExec`) #16899

[PR] Fix create table by values with string, which doesn't respect `string_to_utf8view` config [datafusion]

2025-07-24 Thread via GitHub
xudong963 opened a new pull request, #16906: URL: https://github.com/apache/datafusion/pull/16906 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16884 ## Rationale for this change ## What changes are included in thi

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3116059618 I wanna narrow the implementation to the `create with values` first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3116085852 A draft PR: https://github.com/apache/datafusion/pull/16906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-07-24 Thread via GitHub
github-actions[bot] commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-3116116848 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-07-24 Thread via GitHub
github-actions[bot] commented on PR #16174: URL: https://github.com/apache/datafusion/pull/16174#issuecomment-3116116734 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3116117607 Hi @alamb, I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion? -- This is an automated message

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
chenkovsky commented on code in PR #1971: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971#discussion_r2229741103 ## tests/sqlparser_bigquery.rs: ## @@ -2566,3 +2566,101 @@ fn test_struct_trailing_and_nested_bracket() { ) ); } + +#[test] +fn test_

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub
zhuqi-lucas commented on code in PR #16885: URL: https://github.com/apache/datafusion/pull/16885#discussion_r2230025399 ## datafusion/spark/src/function/math/hex.rs: ## @@ -212,6 +215,16 @@ pub fn compute_hex( Ok(ColumnarValue::Array(Arc::new(hexed)))

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3116228640 > It might be helpful to add a brief description in `benchmarks/README.md`. Also, once this PR is merged, I'll follow up by adding nlj benchmarks to the memory profiling utility (

Re: [I] Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
alamb commented on issue #1886: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-3114709826 Thanks to @viirya and @comphead the release has been approved! The release is available here: https://dist.apache.org/repos/dist/release/datafusion/datafusi

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Both queries use `mode=Partial`. Addressing Question / Query 1) ``` +---+

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229625741 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229628338 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [I] Discussion: DataFusion Improvement Proposal (DIPs) Process? [datafusion]

2025-07-24 Thread via GitHub
phillipleblanc commented on issue #16886: URL: https://github.com/apache/datafusion/issues/16886#issuecomment-3115391379 I agree that format voting/approval doesn't make sense yet. Also having a structured way to propose "larger" changes that incorporates all relevant context for rev

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229840064 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,134 @@ # specific language governing permissions and limitations # under

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
chenkovsky closed pull request #1971: feat: support datetime_field as expr for bigquery URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229851563 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under

Re: [I] Combine utilities in `SpillManager` [datafusion]

2025-07-24 Thread via GitHub
ding-young commented on issue #16907: URL: https://github.com/apache/datafusion/issues/16907#issuecomment-3116294357 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3116309058 Closing this in favour of https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub
kosiew closed pull request #16681: Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic URL: https://github.com/apache/datafusion/pull/16681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
shehabgamin commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230086388 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -35,3 +35,140 @@ ## PySpark 3.5.5 Result: {'luhn_check(8112189876)': True, 'typeof(l

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2230104338 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` met

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2230100602 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` met

Re: [PR] Report error when `SessionState::sql_to_expr_with_alias` does not consume all input [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16811: URL: https://github.com/apache/datafusion/pull/16811#issuecomment-3113431404 I am not sure why I forgot to merge this one. Thanks @pepijnve and sorry for the delay -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Report error when `SessionState::sql_to_expr_with_alias` does not consume all input [datafusion]

2025-07-24 Thread via GitHub
alamb merged PR #16811: URL: https://github.com/apache/datafusion/pull/16811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `SessionState::sql_to_expr` does not report unconsumed input [datafusion]

2025-07-24 Thread via GitHub
alamb closed issue #16810: `SessionState::sql_to_expr` does not report unconsumed input URL: https://github.com/apache/datafusion/issues/16810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Panic happens when adding a decimal256 to a float (SQLancer) [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16689: URL: https://github.com/apache/datafusion/issues/16689#issuecomment-3113441773 I think @kosiew has fixed this error upstream so it should be fixed when we upgrade to the next version of arrow-rs -- This is an automated message from the Apache Git Service.

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3113398430 > One idea @notfilippo mentioned is that arrow-rs could offer some kind of API for tracking allocations. As it's arrow-rs the one who knows when a buffer is allocated, when it's r

Re: [PR] Fix `next_up` and `next_down` behavior for zero float values [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3113441575 @liamzwbao can you please cherrypick this commit https://github.com/apache/datafusion/compare/main...synnada-ai:datafusion-upstream:next-up-down I believe there would be n

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-24 Thread via GitHub
adriangb commented on PR #16790: URL: https://github.com/apache/datafusion/pull/16790#issuecomment-3113476763 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-24 Thread via GitHub
adriangb commented on PR #16790: URL: https://github.com/apache/datafusion/pull/16790#issuecomment-3113479042 > btw, I've checked behaviour on duckdb, and it looks more clearly - there is no prefixes at all. Maybe we can do the same? Could you open an issue to remove the prefixes / ma

Re: [I] Missing data when inserting into MemTable [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16836: URL: https://github.com/apache/datafusion/issues/16836#issuecomment-3113408301 I agree this sounds like a bug. I am not sure what is going on. I suggest a self contained reproducer is probably the most useful step to get more specific help -- This i

Re: [PR] Support multiple ordered `array_agg` aggregations [datafusion]

2025-07-24 Thread via GitHub
ozankabak commented on PR #16625: URL: https://github.com/apache/datafusion/pull/16625#issuecomment-3113392581 I didn't have time to dig deeper on this, so we can go ahead with the merge. We can unify `Beneficial` and `SoftRequirement` later in the future if we find a good way to do so. -

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3113400266 I think @Dandandan and @zhuqi-lucas have also been thinking about / trying to improve memory efficiency and might have some perspectives to offer -- This is an automated message

Re: [PR] Fix `next_up` and `next_down` behavior for zero float values [datafusion]

2025-07-24 Thread via GitHub
ozankabak commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3113401715 Since we do not yet fully understand what transitioning to partial ordering will entail (and we may not even want to do it, at the end), I think the best path forward is to go back

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3113393092 > Some imply copying/compacting just the necessary slice of data from the underlying buffer (ScalarValue::compact) so that it's actually owned by the consumer, but in certain case

Re: [PR] feat: enhance support for Decimal128 and Decimal256 [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16831: URL: https://github.com/apache/datafusion/pull/16831#discussion_r2228482416 ## datafusion/optimizer/src/simplify_expressions/utils.rs: ## @@ -168,10 +133,17 @@ pub fn is_one(s: &Expr) -> bool { Expr::Literal(ScalarValue::Float64(Som

Re: [I] `DataFusionError` leaks inner types to the user [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16805: URL: https://github.com/apache/datafusion/issues/16805#issuecomment-3113450971 Thanks for the idea @90degs2infty -- it sounds like a good idea to explore -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16790: URL: https://github.com/apache/datafusion/pull/16790#issuecomment-3113453261 Are we waiting on anything else to merge this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] RFC: What table provider features would be helpful in an example? [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16821: URL: https://github.com/apache/datafusion/issues/16821#issuecomment-3113448750 > I believe if I partitioned on field A and ordered by date then I could do the self-join manually with far more efficiency than a more generic self-join. @corasaurus-hex t

Re: [PR] test: fix more flaky join tests [datafusion]

2025-07-24 Thread via GitHub
alamb merged PR #16880: URL: https://github.com/apache/datafusion/pull/16880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] panic when running `regx` benchmark [datafusion]

2025-07-24 Thread via GitHub
chenkovsky commented on issue #16879: URL: https://github.com/apache/datafusion/issues/16879#issuecomment-3113507800 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Can't parse jsonb extractions in an on conflict in Postgres [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
stevenliebregt opened a new issue, #1977: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1977 The parser does not support the following: ```sql INSERT INTO table_with_constraint_over_jsonb (a_number, a_jsonb, a_string) VALUES ($1, $2, $3) ON CONFLICT (a_number

Re: [PR] refactor `character_length` impl by unifying null handling logic [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16877: URL: https://github.com/apache/datafusion/pull/16877#discussion_r2228560541 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -136,56 +136,31 @@ where // string is ASCII only is relatively cheap. // If strings are ASCI

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113557388 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113564433 (I also queued up the mircobenchmark) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] fix: regex bench [datafusion]

2025-07-24 Thread via GitHub
chenkovsky opened a new pull request, #16890: URL: https://github.com/apache/datafusion/pull/16890 ## Which issue does this PR close? - Closes #16879. ## Rationale for this change missing argument. ## What changes are included in this PR? add missed argument

[PR] ScalarValue Default + Min + Max [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada opened a new pull request, #16891: URL: https://github.com/apache/datafusion/pull/16891 ## Which issue does this PR close? - Closes #. ## Rationale for this change We use these utils in our fork and think they could be useful in here as well

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113703878 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub
NGA-TRAN commented on PR #16858: URL: https://github.com/apache/datafusion/pull/16858#issuecomment-3113736222 @alamb : When you have a moment, could you take a quick look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Chore: refactor Comparison out of QueryPlanSerde [datafusion-comet]

2025-07-24 Thread via GitHub
CuteChuanChuan commented on code in PR #2028: URL: https://github.com/apache/datafusion-comet/pull/2028#discussion_r2228749810 ## spark/src/main/scala/org/apache/comet/serde/comparisons.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113703717 🤖: Benchmark completed Details ``` Comparing HEAD and optimize_vectorized_append Benchmark clickbench_extended.json ---

Re: [PR] Chore: refactor Comparison out of QueryPlanSerde [datafusion-comet]

2025-07-24 Thread via GitHub
CuteChuanChuan commented on code in PR #2028: URL: https://github.com/apache/datafusion-comet/pull/2028#discussion_r2228741221 ## spark/src/main/scala/org/apache/comet/serde/comparisons.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113739347 🤖: Benchmark completed Details ``` Comparing HEAD and optimize_vectorized_append Benchmark clickbench_1.json

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113739542 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

[PR] Add Fetch Property to OutputRequirementExec [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada opened a new pull request, #16892: URL: https://github.com/apache/datafusion/pull/16892 ## Which issue does this PR close? - Closes #. ## Rationale for this change In some of our use cases, `OutputRequirementExec` requires fetch capability d

Re: [PR] Add Fetch Property to OutputRequirementExec [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada commented on PR #16892: URL: https://github.com/apache/datafusion/pull/16892#issuecomment-3113756580 I'll add a test if I can reproduce a failing case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2228760094 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?])),

Re: [I] Missing data when inserting into MemTable [datafusion]

2025-07-24 Thread via GitHub
zhuqi-lucas commented on issue #16836: URL: https://github.com/apache/datafusion/issues/16836#issuecomment-3113815429 I try to reproduce this, but it not reproduced, if we can reproduce it, i can help debugging, thanks! ```rust // src/main.rs use std::sync::Arc; use arrow

[PR] minor: Improve equivalence handling of joins [datafusion]

2025-07-24 Thread via GitHub
berkaysynnada opened a new pull request, #16893: URL: https://github.com/apache/datafusion/pull/16893 ## Which issue does this PR close? - Closes #. ## Rationale for this change While building the equivalence properties of joins, we can identify more prec

[I] Remove `__unnest_placeholder` from result projection on queries with struct unnest. [datafusion]

2025-07-24 Thread via GitHub
akoshchiy opened a new issue, #16894: URL: https://github.com/apache/datafusion/issues/16894 ### Is your feature request related to a problem or challenge? As discussed in #16790, queries with struct unnest produce columns with a placeholder `__unnest_placeholder`, which looks a bit a

Re: [PR] Perf: Optimize vectorized append function [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16876: URL: https://github.com/apache/datafusion/pull/16876#issuecomment-3113856695 🤖: Benchmark completed Details ``` group main

  1   2   3   >