Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-01 Thread via GitHub
Adez017 commented on PR #17018: URL: https://github.com/apache/datafusion/pull/17018#issuecomment-3146276212 hi @alamb , i coudnt find any rust file for this function . directly added the examples to the markdown . need a little help , while firing ``` ./dev/update_function_docs.sh `

[PR] Update Scalar_functions.md [datafusion]

2025-08-01 Thread via GitHub
Adez017 opened a new pull request, #17018: URL: https://github.com/apache/datafusion/pull/17018 ## Which issue does this PR close? - Closes #17004 ## Rationale for this change while Browsing through the documentation i find the example for the scalar functi

Re: [PR] feat: support multiple value for pivot [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio commented on code in PR #1970: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1970#discussion_r2249139287 ## src/parser/mod.rs: ## @@ -13828,7 +13840,13 @@ impl<'a> Parser<'a> { self.expect_token(&Token::LParen)?; let aggregate_functions =

Re: [PR] MySQL: Support comma-separated `CREATE TABLE` options [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio commented on code in PR #1989: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1989#discussion_r2249137924 ## src/parser/mod.rs: ## @@ -7708,6 +7708,9 @@ impl<'a> Parser<'a> { while let Some(option) = self.parse_plain_option()? { optio

Re: [PR] Postgres: Support `INTERVAL` data type options [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio merged PR #1984: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Perf: Port arrow-rs optimization for get_buffer_memory_size and add fast path for no buffer for gc string view [datafusion]

2025-08-01 Thread via GitHub
zhuqi-lucas commented on PR #17008: URL: https://github.com/apache/datafusion/pull/17008#issuecomment-3146236812 > Another potential idea is to simply switch to using the coalesce kernel directly (so we don't have to maintain a separate copy) > > * [Draft: Use upstream arrow `coalesce

Re: [PR] Test and fix for issue #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions [datafusion]

2025-08-01 Thread via GitHub
adriangb commented on PR #17016: URL: https://github.com/apache/datafusion/pull/17016#issuecomment-3146205523 As you can see the current behavior was intentional and necessary for this functionality. IMO we should add a `.cloned()` or `with_new_state()` method to ExecutionPlan that defaults

Re: [PR] docs: Fix random extra bullet for 'Analytical Functions' [datafusion]

2025-08-01 Thread via GitHub
xudong963 merged PR #17014: URL: https://github.com/apache/datafusion/pull/17014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] Test: Add checks to sqllogictest temporary file creations [datafusion]

2025-08-01 Thread via GitHub
2010YOUY01 opened a new pull request, #17017: URL: https://github.com/apache/datafusion/pull/17017 ## Which issue does this PR close? - Closes #. ## Rationale for this change In `sqllogictest`s, the set-up code will create a scratch folder for each test f

Re: [PR] docs: Fix 'Analaysis' typo in query optimizer docs [datafusion]

2025-08-01 Thread via GitHub
xudong963 merged PR #17015: URL: https://github.com/apache/datafusion/pull/17015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] docs : Change notes for `IntegralDivide` [datafusion-comet]

2025-08-01 Thread via GitHub
coderfender commented on PR #2054: URL: https://github.com/apache/datafusion-comet/pull/2054#issuecomment-3146197875 Thank you for the review @andygrove , I made changes as per suggestion -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] chore: Use Datafusion's Sha2 and remove Comet's implementation. [datafusion-comet]

2025-08-01 Thread via GitHub
codecov-commenter commented on PR #2063: URL: https://github.com/apache/datafusion-comet/pull/2063#issuecomment-3146169391 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2063?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Test and fix for issue #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions [datafusion]

2025-08-01 Thread via GitHub
robertream opened a new pull request, #17016: URL: https://github.com/apache/datafusion/pull/17016 ## Summary Fixes #16998: SortExec shares DynamicFilterPhysicalExpr across multiple executions This PR addresses an issue where `SortExec` was sharing the same `DynamicFilterPhysi

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-08-01 Thread via GitHub
2010YOUY01 commented on code in PR #16996: URL: https://github.com/apache/datafusion/pull/16996#discussion_r2249090965 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -660,529 +684,1048 @@ async fn collect_left_input( )) } -/// This enumeration represent

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-08-01 Thread via GitHub
github-actions[bot] closed pull request #14411: feat: Support On-Demand Repartition URL: https://github.com/apache/datafusion/pull/14411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-08-01 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3146120825 Opened PR https://github.com/apache/datafusion-comet/pull/2063 for using DF SHA2. -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] Use Datafusion's Sha2 and remove Comet's implementation. [datafusion-comet]

2025-08-01 Thread via GitHub
rishvin opened a new pull request, #2063: URL: https://github.com/apache/datafusion-comet/pull/2063 ## Which issue does this PR close? Closes #1820 ## Rationale for this change Now that Comet's dependency has been updated to DF-49, the SHA2 fix https://github

[PR] Fix 'Analaysis' typo in query optimizer docs [datafusion]

2025-08-01 Thread via GitHub
petern48 opened a new pull request, #17015: URL: https://github.com/apache/datafusion/pull/17015 ## Which issue does this PR close? Quick Typo fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] chore: create copy of fs-hdfs [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on code in PR #2062: URL: https://github.com/apache/datafusion-comet/pull/2062#discussion_r2249021313 ## native/fs-hdfs/Cargo.toml: ## @@ -0,0 +1,55 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

Re: [PR] chore: create copy of fs-hdfs [datafusion-comet]

2025-08-01 Thread via GitHub
parthchandra commented on PR #2062: URL: https://github.com/apache/datafusion-comet/pull/2062#issuecomment-3146020967 @comphead @Kontinuation @drexler-sky @jiayuasu, fyi cc @andygrove @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Standardize on import order in Rust code [datafusion-comet]

2025-08-01 Thread via GitHub
parthchandra commented on issue #2053: URL: https://github.com/apache/datafusion-comet/issues/2053#issuecomment-3146013043 As long as `cargo fmt` takes care of it for me, I don't have _any_ opinion on the matter. -- This is an automated message from the Apache Git Service. To respond to

[I] doc: Document supported `Map` methods in `expressions.md` [datafusion-comet]

2025-08-01 Thread via GitHub
comphead opened a new issue, #2061: URL: https://github.com/apache/datafusion-comet/issues/2061 ### What is the problem the feature request solves? Document supported `Map` methods in `expressions.md`. Followup on #2059 There is a Map epic https://github.com/apache/datafusion-co

Re: [PR] feat: support `map_entries` [datafusion-comet]

2025-08-01 Thread via GitHub
comphead commented on PR #2059: URL: https://github.com/apache/datafusion-comet/pull/2059#issuecomment-3145992129 > Do we need to add this to the list of supported expressions in the documentation? Thanks @andygrove I'll create a followup on the documentation, I realized none of alr

Re: [I] Support `map_entries` function [datafusion-comet]

2025-08-01 Thread via GitHub
comphead closed issue #1916: Support `map_entries` function URL: https://github.com/apache/datafusion-comet/issues/1916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] feat: support `map_entries` [datafusion-comet]

2025-08-01 Thread via GitHub
comphead merged PR #2059: URL: https://github.com/apache/datafusion-comet/pull/2059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Comet cannot execute some iceberg metadata table queries [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on issue #2033: URL: https://github.com/apache/datafusion-comet/issues/2033#issuecomment-3145991639 We discussed this offline and decided that we should fall back to Spark for now for Iceberg metadata queries, rather than try to accelerate them. -- This is an automate

Re: [I] Add support for Iceberg [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove closed issue #1028: Add support for Iceberg URL: https://github.com/apache/datafusion-comet/issues/1028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Add support for Iceberg [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on issue #1028: URL: https://github.com/apache/datafusion-comet/issues/1028#issuecomment-3145979438 I created a new epic for completing the Iceberg integration work https://github.com/apache/datafusion-comet/issues/2060 -- This is an automated message from the A

[I] [EPIC] Iceberg compatibility fixes for Comet 0.10.0 release [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove opened a new issue, #2060: URL: https://github.com/apache/datafusion-comet/issues/2060 ### What is the problem the feature request solves? The integration between Comet and Iceberg is not fully working. This epic is to track the work that we are targeting for the Comet 0.10.

Re: [PR] docs : Change notes for `IntegralDivide` [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on code in PR #2054: URL: https://github.com/apache/datafusion-comet/pull/2054#discussion_r2248981021 ## docs/source/user-guide/expressions.md: ## @@ -35,14 +35,14 @@ The following Spark expressions are currently available. Any known compatibility ## Bina

Re: [PR] feat: support `map_entries` [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on PR #2059: URL: https://github.com/apache/datafusion-comet/pull/2059#issuecomment-3145951537 Do we need to add this to the list of supported expressions in the documentation? -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] docs: Fix random extra bullet for 'Analytical Functions' [datafusion]

2025-08-01 Thread via GitHub
petern48 opened a new pull request, #17014: URL: https://github.com/apache/datafusion/pull/17014 ## Which issue does this PR close? N/A. quick doc fix ## Rationale for this change Fixing this: https://github.com/user-attachments/assets/1ec43be1-fae5-4a90-ad

Re: [I] Crash when there are too many WHERE conditions in the SQL query [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
sunheyi6 closed issue #1988: Crash when there are too many WHERE conditions in the SQL query URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-08-01 Thread via GitHub
Omega359 commented on PR #16970: URL: https://github.com/apache/datafusion/pull/16970#issuecomment-3145870578 > Good: > > It allows function behavior to be dynamically adapted based on the execution context (e.g., user locale, timezone, session settings, environment specifics). Espec

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-01 Thread via GitHub
codecov-commenter commented on PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#issuecomment-3145741036 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2057?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: support `map_entries` [datafusion-comet]

2025-08-01 Thread via GitHub
codecov-commenter commented on PR #2059: URL: https://github.com/apache/datafusion-comet/pull/2059#issuecomment-3145721990 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2059?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-08-01 Thread via GitHub
Dandandan commented on code in PR #16996: URL: https://github.com/apache/datafusion/pull/16996#discussion_r2248805092 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -660,529 +684,1048 @@ async fn collect_left_input( )) } -/// This enumeration represents

[PR] feat: support `map_entries` [datafusion-comet]

2025-08-01 Thread via GitHub
comphead opened a new pull request, #2059: URL: https://github.com/apache/datafusion-comet/pull/2059 ## Which issue does this PR close? Closes #1916 . ## Rationale for this change ## What changes are included in this PR? ## How are these cha

[I] feat: Migrate to `PhysicalExprAdapterFactory` [datafusion-comet]

2025-08-01 Thread via GitHub
comphead opened a new issue, #2058: URL: https://github.com/apache/datafusion-comet/issues/2058 ### What is the problem the feature request solves? DataFusion is considering to deprecate `SchemaAdapter` which is intensively used in Comet. Need to investigate migration to `Physi

[PR] MySQL: Support comma-separated `CREATE TABLE` options [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
mvzink opened a new pull request, #1989: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1989 In [MySQL], options for `CREATE TABLE` following the body can be optionally separated by commas. I'm not aware of any cases where this affects parsing (e.g. eliminating ambiguity or any

Re: [PR] Fix window_functions docs formatting [datafusion]

2025-08-01 Thread via GitHub
comphead merged PR #17005: URL: https://github.com/apache/datafusion/pull/17005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] feat: Support Array Literal [datafusion-comet]

2025-08-01 Thread via GitHub
comphead opened a new pull request, #2057: URL: https://github.com/apache/datafusion-comet/pull/2057 ## Which issue does this PR close? Closes #1977. Replaces accidentally closed #1978 ## Rationale for this change ## What changes are included in this P

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
jcsherin commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248726677 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -360,9 +366,14 @@ impl ApproxPercentileAccumulator { } } -// public for

Re: [PR] feat(spark): implement Spark string function like/ilike [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #16962: URL: https://github.com/apache/datafusion/pull/16962#issuecomment-3145543807 Thank you @chenkovsky and @andygrove -- I feel the spark compatibile function library is really picking up steam FYI @shehabgamin -- This is an automated message from the Ap

Re: [PR] feat(spark): implement Spark string function like/ilike [datafusion]

2025-08-01 Thread via GitHub
alamb merged PR #16962: URL: https://github.com/apache/datafusion/pull/16962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add tests for yielding / cancelling in SpillManager [datafusion]

2025-08-01 Thread via GitHub
alamb closed issue #16482: Add tests for yielding / cancelling in SpillManager URL: https://github.com/apache/datafusion/issues/16482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Add tests for yielding in `SpillManager::read_spill_as_stream` [datafusion]

2025-08-01 Thread via GitHub
alamb merged PR #16616: URL: https://github.com/apache/datafusion/pull/16616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add tests for yielding in `SpillManager::read_spill_as_stream` [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #16616: URL: https://github.com/apache/datafusion/pull/16616#issuecomment-3145541443 Thanks again @ding-young and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Fix window_functions docs formatting [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #17005: URL: https://github.com/apache/datafusion/pull/17005#issuecomment-3145537057 I took the liberty of updating the source and rerunning the generation scripts -- I think it still fixes the problem and this way it won't be undone on the next PR Thank you @ma

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3145534138 Thank you @tobixdev -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Planning speed slowed down in DataFusion 49 for [datafusion]

2025-08-01 Thread via GitHub
alamb closed issue #16987: Planning speed slowed down in DataFusion 49 for URL: https://github.com/apache/datafusion/issues/16987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-08-01 Thread via GitHub
alamb merged PR #16977: URL: https://github.com/apache/datafusion/pull/16977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: support literal for ARRAY top level [datafusion-comet]

2025-08-01 Thread via GitHub
comphead commented on PR #1978: URL: https://github.com/apache/datafusion-comet/pull/1978#issuecomment-3145524452 oops -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-01 Thread via GitHub
BlakeOrth commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3145516316 I figured I'd put my money where my mouth was with regards to my comment here: https://github.com/apache/datafusion/pull/16971#discussion_r2248565529 specifically with regards to t

Re: [I] Query grouping by column with datatype List> is failing [datafusion]

2025-08-01 Thread via GitHub
nirnayroy commented on issue #17012: URL: https://github.com/apache/datafusion/issues/17012#issuecomment-3145516413 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support literal for ARRAY top level [datafusion-comet]

2025-08-01 Thread via GitHub
comphead closed pull request #1978: feat: support literal for ARRAY top level URL: https://github.com/apache/datafusion-comet/pull/1978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Fix window_functions docs formatting [datafusion]

2025-08-01 Thread via GitHub
alamb commented on code in PR #17005: URL: https://github.com/apache/datafusion/pull/17005#discussion_r2248654246 ## docs/source/user-guide/sql/window_functions.md: ## @@ -330,7 +330,7 @@ row_number() FROM employees; ``` -sql Review Comment: What is weird is that

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-01 Thread via GitHub
alamb commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2248640819 ## datafusion/execution/src/cache/cache_unit.rs: ## @@ -157,9 +158,79 @@ impl CacheAccessor>> for DefaultListFilesCache { } } +/// Collected file embedded m

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2248618331 ## src/parser/mod.rs: ## @@ -16464,7 +16505,28 @@ impl<'a> Parser<'a> { /// Parse [Statement::Return] fn parse_return(&mut self) -> Result {

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2248618331 ## src/parser/mod.rs: ## @@ -16464,7 +16505,28 @@ impl<'a> Parser<'a> { /// Parse [Statement::Return] fn parse_return(&mut self) -> Result {

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2248618331 ## src/parser/mod.rs: ## @@ -16464,7 +16505,28 @@ impl<'a> Parser<'a> { /// Parse [Statement::Return] fn parse_return(&mut self) -> Result {

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-01 Thread via GitHub
alamb commented on code in PR #16779: URL: https://github.com/apache/datafusion/pull/16779#discussion_r2248598467 ## datafusion-examples/examples/parquet_encrypted_with_kms.rs: ## @@ -0,0 +1,288 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3145411771 🤖: Benchmark completed Details ``` group feature_improve-expr-hash-perf main -

Re: [I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-08-01 Thread via GitHub
alamb commented on issue #16903: URL: https://github.com/apache/datafusion/issues/16903#issuecomment-3145404574 In my opinion, given there are many fast and supported ways to convert between string types (e.g. the arrow `cast` kernels) there is very little value to supporting three differen

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
jcsherin commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248567975 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -360,9 +366,14 @@ impl ApproxPercentileAccumulator { } } -// public for

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
jcsherin commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248567975 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -360,9 +366,14 @@ impl ApproxPercentileAccumulator { } } -// public for

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
jcsherin commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248567975 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -360,9 +366,14 @@ impl ApproxPercentileAccumulator { } } -// public for

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-01 Thread via GitHub
BlakeOrth commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2248565529 ## datafusion/execution/src/cache/cache_unit.rs: ## @@ -157,9 +158,79 @@ impl CacheAccessor>> for DefaultListFilesCache { } } +/// Collected file embedd

Re: [PR] Add missing Substrait to DataFusion function name mappings [datafusion]

2025-08-01 Thread via GitHub
lorenarosati commented on code in PR #16950: URL: https://github.com/apache/datafusion/pull/16950#discussion_r2248548378 ## datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs: ## @@ -113,6 +122,15 @@ pub fn name_to_op(name: &str) -> Option { } } +pub

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3145324514 I thought some more about this PR last night and I wanted to suggest another idea, which is once we have added the memory limit to the cache, *ALWAYS* have the built in parquet reader

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-01 Thread via GitHub
alamb commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2248531228 ## datafusion/execution/src/cache/cache_unit.rs: ## @@ -157,9 +158,79 @@ impl CacheAccessor>> for DefaultListFilesCache { } } +/// Collected file embedded m

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
liamzwbao commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248523303 ## datafusion/functions-aggregate/src/approx_percentile_cont.rs: ## @@ -360,9 +366,14 @@ impl ApproxPercentileAccumulator { } } -// public fo

Re: [PR] Fix placeholder spans [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
alamb commented on PR #1979: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1979#issuecomment-3145306256 > many thanks for your support @iffyio 🙇 I agree @iffyio is awesome 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Perf: Port arrow-rs optimization for get_buffer_memory_size and add fast path for no buffer for gc string view [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #17008: URL: https://github.com/apache/datafusion/pull/17008#issuecomment-3145302066 Another potential idea is to simply switch to using the coalesce kernel directly (so we don't have to maintain a separate copy) - https://github.com/apache/datafusion/pull/16249 --

Re: [PR] Feature: Improve hash Expr performance [datafusion]

2025-08-01 Thread via GitHub
alamb commented on PR #16977: URL: https://github.com/apache/datafusion/pull/16977#issuecomment-3145291581 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
jcsherin commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248501858 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -1832,7 +1832,7 @@ d 124 e 115 query TI -SELECT c1, approx_percentile_cont_with_weight(c2, 0.95) WI

Re: [PR] feat: Cache Parquet metadata in built in parquet reader [datafusion]

2025-08-01 Thread via GitHub
nuno-faria commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2248488201 ## datafusion/execution/src/cache/cache_unit.rs: ## @@ -157,9 +158,79 @@ impl CacheAccessor>> for DefaultListFilesCache { } } +/// Collected file embed

Re: [PR] #16994 Ensure CooperativeExec#maintains_input_order returns a Vec of the correct size [datafusion]

2025-08-01 Thread via GitHub
alamb commented on code in PR #16995: URL: https://github.com/apache/datafusion/pull/16995#discussion_r2248489786 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -1045,6 +1046,37 @@ impl PlanProperties { } } +macro_rules! check_len { +($target:expr, $func_na

Re: [PR] Support `centroids` config for `approx_percentile_cont_with_weight` [datafusion]

2025-08-01 Thread via GitHub
liamzwbao commented on code in PR #17003: URL: https://github.com/apache/datafusion/pull/17003#discussion_r2248487090 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -1832,7 +1832,7 @@ d 124 e 115 query TI -SELECT c1, approx_percentile_cont_with_weight(c2, 0.95) W

Re: [PR] Fix column definition `COLLATE` parsing [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
mvzink commented on code in PR #1986: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1986#discussion_r2248453508 ## src/parser/mod.rs: ## @@ -1248,6 +1248,12 @@ impl<'a> Parser<'a> { debug!("parsing expr"); let mut expr = self.parse_prefix()?; +

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2248445838 ## src/dialect/mod.rs: ## @@ -1036,8 +1036,14 @@ pub trait Dialect: Debug + Any { /// Returns true if the specified keyword should be parsed as

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2248418391 ## src/parser/mod.rs: ## @@ -266,6 +266,22 @@ impl ParserOptions { self.unescape = unescape; self } + +/// Set if semicolo

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2248411515 ## src/parser/mod.rs: ## @@ -16464,7 +16505,28 @@ impl<'a> Parser<'a> { /// Parse [Statement::Return] fn parse_return(&mut self) -> Resul

Re: [I] try_ arithmetic functions return incorrect results [datafusion-comet]

2025-08-01 Thread via GitHub
coderfender commented on issue #2021: URL: https://github.com/apache/datafusion-comet/issues/2021#issuecomment-3145149316 Working on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Support uneven partition inputs HashJoinExec in Partitioned mode [datafusion]

2025-08-01 Thread via GitHub
timsaucer commented on issue #16740: URL: https://github.com/apache/datafusion/issues/16740#issuecomment-3145103785 closing as improper use -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Support uneven partition inputs HashJoinExec in Partitioned mode [datafusion]

2025-08-01 Thread via GitHub
timsaucer closed issue #16740: Support uneven partition inputs HashJoinExec in Partitioned mode URL: https://github.com/apache/datafusion/issues/16740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Performance regression in TPC-H benchmarks [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on issue #2056: URL: https://github.com/apache/datafusion-comet/issues/2056#issuecomment-3145023569 I was using JDK 11 rather than 17, and Spark 3.5.6 instead of 3.5.3. After fixing these issues, the timing is now 300s. -- This is an automated message from the Apache

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-08-01 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2248281203 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -734,6 +734,366 @@ async fn test_topk_dynamic_filter_pushdown() { ); } +#[tokio:

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-08-01 Thread via GitHub
adriangb commented on code in PR #16445: URL: https://github.com/apache/datafusion/pull/16445#discussion_r2248281203 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -734,6 +734,366 @@ async fn test_topk_dynamic_filter_pushdown() { ); } +#[tokio:

Re: [PR] chore(deps): bump serde_json from 1.0.141 to 1.0.142 [datafusion]

2025-08-01 Thread via GitHub
comphead merged PR #17006: URL: https://github.com/apache/datafusion/pull/17006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-01 Thread via GitHub
chenkovsky opened a new pull request, #17013: URL: https://github.com/apache/datafusion/pull/17013 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? implement shiftleft/shiftright/shiftr

Re: [I] Performance regression in TPC-H benchmarks [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove closed issue #2056: Performance regression in TPC-H benchmarks URL: https://github.com/apache/datafusion-comet/issues/2056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat: support multi value columns and aliases in unpivot [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
iffyio merged PR #1969: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Performance regression in TPC-H benchmarks [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove commented on issue #2056: URL: https://github.com/apache/datafusion-comet/issues/2056#issuecomment-3144839924 Actually, I am now seeing the same regression with the Comet 0.9.0 release, so it appears that something changed with my local environment. -- This is an automated mess

Re: [PR] chore: Refactor aggregate serde to be consistent with other expression serde [datafusion-comet]

2025-08-01 Thread via GitHub
mbutrovich merged PR #2055: URL: https://github.com/apache/datafusion-comet/pull/2055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[I] Query grouping by column with datatype List> is failing [datafusion]

2025-08-01 Thread via GitHub
LiaCastaneda opened a new issue, #17012: URL: https://github.com/apache/datafusion/issues/17012 ### Describe the bug Queries that group by columns of type List> fail with the following error: `Expected infallible creation of GenericListArray from ArrayDataRef failed: InvalidA

[I] Performance regression in TPC-H benchmarks [datafusion-comet]

2025-08-01 Thread via GitHub
andygrove opened a new issue, #2056: URL: https://github.com/apache/datafusion-comet/issues/2056 ### Describe the bug I am seeing a performance regression when comparing the main branch to the 0.9.0 release, and this seems to have been introduced very recently. I am investigating.

Re: [I] Upgrade to DataFusion 49.0.0 [datafusion-comet]

2025-08-01 Thread via GitHub
mbutrovich closed issue #1993: Upgrade to DataFusion 49.0.0 URL: https://github.com/apache/datafusion-comet/issues/1993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] chore: migrate to DF 49.0.0 [datafusion-comet]

2025-08-01 Thread via GitHub
mbutrovich merged PR #2040: URL: https://github.com/apache/datafusion-comet/pull/2040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[I] Crash when there are too many WHERE conditions in the SQL query [datafusion-sqlparser-rs]

2025-08-01 Thread via GitHub
sunheyi6 opened a new issue, #1988: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1988 Related to https://github.com/GreptimeTeam/greptimedb/issues/6628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

  1   2   >