[PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.6 faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
zhuqi-lucas opened a new pull request, #16509: URL: https://github.com/apache/datafusion/pull/16509 ## Which issue does this PR close? - Closes [#16508](https://github.com/apache/datafusion/issues/16508) ## Rationale for this change Add fast path for CursorValues compare

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.6 faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
zhuqi-lucas commented on PR #16509: URL: https://github.com/apache/datafusion/pull/16509#issuecomment-2995786891 Test result: ```rust ┏━━┳━━━┳┳━━━┓ ┃ Query┃ main ┃ fast_path_view ┃Change

Re: [I] Perf: Optimize CursorValues compare performance for StringViewArray [datafusion]

2025-06-23 Thread via GitHub
zhuqi-lucas commented on issue #16508: URL: https://github.com/apache/datafusion/issues/16508#issuecomment-2995792101 Submitted a PR, and show 1.4x faster for sort-tpch Q11 which is mostly inlined bytes for this testing. https://github.com/apache/datafusion/pull/16509#issuecomment-299

[PR] chore(deps): bump syn from 2.0.103 to 2.0.104 [datafusion]

2025-06-23 Thread via GitHub
dependabot[bot] opened a new pull request, #16507: URL: https://github.com/apache/datafusion/pull/16507 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.103 to 2.0.104. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.104 Di

Re: [I] Release sqlparser-rs version `0.57.0` around 2024-06-15 [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
alamb commented on issue #1837: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1837#issuecomment-2995835197 Success! The release is available here: https://dist.apache.org/repos/dist/release/datafusion/datafusion-sqlparser-rs-0.57.0 I have also published it to cra

Re: [I] Release sqlparser-rs version `0.57.0` around 2024-06-15 [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
alamb closed issue #1837: Release sqlparser-rs version `0.57.0` around 2024-06-15 URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2160887410 ## datafusion/optimizer/src/simplify_predicates.rs: ## @@ -0,0 +1,194 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [I] Release sqlparser-rs version `0.58.0` around 2024-08-15 [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
alamb commented on issue #1886: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-2995854161 @Dimchikkk notes that 0.57.0 has a bug: - https://github.com/apache/datafusion-sqlparser-rs/pull/1899 We may need to create a 0.58.0 sooner -- This is an a

Re: [PR] Fix `limit` in subqueries [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
alamb commented on PR #1899: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1899#issuecomment-2995850581 > @alamb I'm aware that DataFusion isn’t on 0.56 yet. However, in order to upgrade it to 0.56, this fix would need to be backported. So I’m wondering: what’s the plan? Will

Re: [PR] Fix `limit` in subqueries [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
alamb commented on PR #1899: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1899#issuecomment-2995860448 I didn't hold 0.57.0 as the bug wasn't introduced in 0.57.0 -- instead it seems to have been introduced in 0.56.0 -- so in my mind it makes sense to just keep pushing forwa

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2161280602 ## datafusion/optimizer/src/simplify_predicates.rs: ## @@ -0,0 +1,194 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] fix: document and fix macro hygiene for `config_field!` [datafusion]

2025-06-23 Thread via GitHub
crepererum merged PR #16473: URL: https://github.com/apache/datafusion/pull/16473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Excessive Arc-clone in HashJoinStream with StringView on build-side [datafusion]

2025-06-23 Thread via GitHub
ctsk commented on issue #16206: URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2994380522 I think the issue on the take+concat pattern is only tangentially related to this issue. Ultimately, you would need a version of the take operation, that does not simply clone the

Re: [I] Perf Optimize CursorValues compare performance for StringViewArray [datafusion]

2025-06-23 Thread via GitHub
zhuqi-lucas commented on issue #16508: URL: https://github.com/apache/datafusion/issues/16508#issuecomment-2995685096 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[I] Enhance string concat coercion to support castable types [datafusion]

2025-06-23 Thread via GitHub
osipovartem opened a new issue, #16510: URL: https://github.com/apache/datafusion/issues/16510 ### Is your feature request related to a problem or challenge? Related to #12709 Currently, in BinaryTypeCoercer, we have the following logic for the StringConcat operator: ```rust

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2161141184 ## datafusion/expr/src/expr.rs: ## @@ -2069,6 +2069,11 @@ impl Expr { _ => None, } } + +/// Check if the Expr is literal Review C

Re: [I] Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #12393: URL: https://github.com/apache/datafusion/issues/12393#issuecomment-2995782285 😅 "finally" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16331: URL: https://github.com/apache/datafusion/pull/16331#issuecomment-2995781080 Thanks @adriangb and @Omega359 for the help with this one (and to @tustvold for / @ion-elgreco for the underlying feature) It's taken a while but we have made it -- This is

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-23 Thread via GitHub
alamb merged PR #16331: URL: https://github.com/apache/datafusion/pull/16331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work) [datafusion]

2025-06-23 Thread via GitHub
alamb closed issue #12393: Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work) URL: https://github.com/apache/datafusion/issues/12393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Perf Optimize CursorValues compare performance for StringViewArray [datafusion]

2025-06-23 Thread via GitHub
zhuqi-lucas opened a new issue, #16508: URL: https://github.com/apache/datafusion/issues/16508 ### Is your feature request related to a problem or challenge? Similar to the following arrow-rs side change, we can optimize the CursorValues compare for StringViewArray type, especially fo

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
kosiew commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-2994965537 Putting more words to how I understand pushdown and data adaptation: 1. Pushdown — “Which rows or pages should I read?” - Input: your original predicate (e.g. col("foo.b") > 5

[PR] Feat/sample [datafusion]

2025-06-23 Thread via GitHub
chenkovsky opened a new pull request, #16505: URL: https://github.com/apache/datafusion/pull/16505 ## Which issue does this PR close? ## Rationale for this change Currently table sample is not supported. ## What changes are included in this PR? support table sample

[PR] fix: extend recursive protection to prevent stack overflows in additional functions [datafusion]

2025-06-23 Thread via GitHub
ahmed-mez opened a new pull request, #16506: URL: https://github.com/apache/datafusion/pull/16506 ## Which issue does this PR close? Fixes stack overflows caused by deeply nested query plans during recursive function calls in various optimizer and expression evaluation paths. #

[PR] Add more doc for physical filter pushdown [datafusion]

2025-06-23 Thread via GitHub
xudong963 opened a new pull request, #16504: URL: https://github.com/apache/datafusion/pull/16504 ## Which issue does this PR close? - Closes #. ## Rationale for this change Related to the issue: https://github.com/apache/datafusion/issues/16188#issuecomm

Re: [PR] build(deps): bump prost from 0.13.5 to 0.14.0 [datafusion-python]

2025-06-23 Thread via GitHub
dependabot[bot] closed pull request #1153: build(deps): bump prost from 0.13.5 to 0.14.0 URL: https://github.com/apache/datafusion-python/pull/1153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Fix `limit` in subqueries [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
Dimchikkk commented on PR #1899: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1899#issuecomment-2994258750 @alamb I'm aware that DataFusion isn’t on 0.56 yet. However, in order to upgrade it to 0.56, this fix would need to be backported. So I’m wondering: what’s the plan? Wi

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2161366914 ## datafusion/optimizer/src/simplify_predicates.rs: ## @@ -0,0 +1,194 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: support table sample [datafusion]

2025-06-23 Thread via GitHub
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2996411604 > It would be better to add more details about the PR, such as: sample levels: block level or row level sample ways: fixed row counts or percent? @xudong963 updated -- T

Re: [PR] use 'lit' as the field name for literal values [datafusion]

2025-06-23 Thread via GitHub
adriangb merged PR #16498: URL: https://github.com/apache/datafusion/pull/16498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: `PushDownFilter` for `GROUP BY` on uppercase col names [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16049: URL: https://github.com/apache/datafusion/pull/16049#discussion_r2161450922 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -4123,4 +4127,34 @@ mod tests { " ) } + +/// Create a test table scan with uppe

Re: [PR] fix: make `with_new_state` a trait method for `ExecutionPlan` [datafusion]

2025-06-23 Thread via GitHub
geoffreyclaude commented on code in PR #16469: URL: https://github.com/apache/datafusion/pull/16469#discussion_r2161376429 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -580,6 +581,24 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { // cooperate

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2161387780 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -778,6 +779,16 @@ impl OptimizerRule for PushDownFilter { return Ok(Transformed::no(plan));

Re: [PR] Chore: implement predicate exprs as ScalarUDFImpl [datafusion-comet]

2025-06-23 Thread via GitHub
mbutrovich commented on PR #1864: URL: https://github.com/apache/datafusion-comet/pull/1864#issuecomment-2996336512 > Ci is failing because in (`iceberg_compat`)`initRecordBatchReader` we call `planner.createExpr` for predicates that are pushed down and the expressions are no longer there.

[I] FixedSizeBinary support in min/max accumulators [datafusion]

2025-06-23 Thread via GitHub
alexwilcoxson-rel opened a new issue, #16513: URL: https://github.com/apache/datafusion/issues/16513 ### Is your feature request related to a problem or challenge? In our system we have certain FixedSizeBinary columns that are now starting to get used in joins. Therefore we are hittin

[PR] build: Fix build [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove opened a new pull request, #1924: URL: https://github.com/apache/datafusion-comet/pull/1924 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

[PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-23 Thread via GitHub
timsaucer opened a new pull request, #1167: URL: https://github.com/apache/datafusion-python/pull/1167 # Which issue does this PR close? None # Rationale for this change By design in a Jupyter notebook `display()` calls both `__repr__` and `_repr_html_`. This currently

Re: [PR] fix: `PushDownFilter` for `GROUP BY` on uppercase col names [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16049: URL: https://github.com/apache/datafusion/pull/16049#discussion_r2161450922 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -4123,4 +4127,34 @@ mod tests { " ) } + +/// Create a test table scan with uppe

[PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-23 Thread via GitHub
ding-young opened a new pull request, #16512: URL: https://github.com/apache/datafusion/pull/16512 ## Which issue does this PR close? - Related to #16367 ## Rationale for this change ## What changes are included in this PR? This pr adds some microbench

Re: [PR] Simplify predicates in `PushDownFilter` optimizer rule [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on code in PR #16362: URL: https://github.com/apache/datafusion/pull/16362#discussion_r2161147718 ## datafusion/optimizer/src/simplify_predicates.rs: ## @@ -0,0 +1,194 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Add microbenchmark for spilling with compression [datafusion]

2025-06-23 Thread via GitHub
ding-young commented on PR #16512: URL: https://github.com/apache/datafusion/pull/16512#issuecomment-2996151512 To run bench, `cargo bench --bench spill_io` ### Q2 - spill_compression/q2/uncompressed time: [51.207 ms 51.521 ms 51.841 ms] [q2 | Uncompressed]

Re: [PR] fix: parse snowflake fetch clause [datafusion-sqlparser-rs]

2025-06-23 Thread via GitHub
Vedin commented on code in PR #1894: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1894#discussion_r2161623234 ## src/parser/mod.rs: ## @@ -15018,6 +15018,9 @@ impl<'a> Parser<'a> { /// Parse a FETCH clause pub fn parse_fetch(&mut self) -> Result { +

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

2025-06-23 Thread via GitHub
kosiew commented on PR #16461: URL: https://github.com/apache/datafusion/pull/16461#issuecomment-2994947013 @adriangb Thanks for the ping on this. > Would it be possible to implement the nested struct imputation work you're doing with this approach? Do you mean reusing the

[I] EPIC: use cp_solver framework to develop a more sophisticated predicate simplification [datafusion]

2025-06-23 Thread via GitHub
xudong963 opened a new issue, #16511: URL: https://github.com/apache/datafusion/issues/16511 ### Is your feature request related to a problem or challenge? The predicates in filter can be variable, it's possible to use the cp_solver to simplify the predicates in filter, then reduce th

Re: [PR] fix: SortMergeJoin for timestamp keys [datafusion-comet]

2025-06-23 Thread via GitHub
SKY-ALIN commented on code in PR #1901: URL: https://github.com/apache/datafusion-comet/pull/1901#discussion_r2160644398 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2168,7 +2168,8 @@ object QueryPlanSerde extends Logging with CometExprShim { *

Re: [PR] Support Null aware anti join by HashJoin [datafusion]

2025-06-23 Thread via GitHub
viirya closed pull request #10584: Support Null aware anti join by HashJoin URL: https://github.com/apache/datafusion/pull/10584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] build: Fix conflict between #1910 and #1912 [datafusion-comet]

2025-06-23 Thread via GitHub
codecov-commenter commented on PR #1924: URL: https://github.com/apache/datafusion-comet/pull/1924#issuecomment-2996847214 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1924?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16509: URL: https://github.com/apache/datafusion/pull/16509#discussion_r2161855265 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -293,14 +293,19 @@ impl CursorValues for StringViewArray { self.views().len() } +#[inline(

[I] Support standard syntax for filtered aggregations [datafusion]

2025-06-23 Thread via GitHub
findepi opened a new issue, #16516: URL: https://github.com/apache/datafusion/issues/16516 ### Is your feature request related to a problem or challenge? ``` $ cargo run --bin datafusion-cli Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s Running `

Re: [PR] Restore topk filtering tests [datafusion]

2025-06-23 Thread via GitHub
AdamGS commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-2996918815 Would love to give a hand with that, I have some thoughts I can try and put into a preliminary PR. It also seems like Datafusion is going to have more of this shared state that's s

Re: [PR] Restore topk filtering tests [datafusion]

2025-06-23 Thread via GitHub
adriangb commented on PR #16501: URL: https://github.com/apache/datafusion/pull/16501#issuecomment-2996942053 Thank you @AdamGS! It would be super helpful if we could first determine if the test is being overly sensitive to non-determinism or if the issue is actually reflecting incorrect qu

Re: [PR] Metadata handling announcement [datafusion-site]

2025-06-23 Thread via GitHub
paleolimbot commented on PR #73: URL: https://github.com/apache/datafusion-site/pull/73#issuecomment-2996949961 Example one! ```python from uuid import UUID import datafusion import pyarrow as pa @datafusion.udf([pa.string()], pa.uuid(), "stable") def uuid_fr

Re: [I] Extend `DESCRIBE` statement to output the schema [datafusion]

2025-06-23 Thread via GitHub
comphead commented on issue #16429: URL: https://github.com/apache/datafusion/issues/16429#issuecomment-2997068765 We also need to document the examples the way it is intended to do in https://github.com/apache/datafusion/issues/16518 -- This is an automated message from the Apache Git Se

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-23 Thread via GitHub
parthchandra commented on PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#issuecomment-2997064670 Thanks @leung-ming, for the test on the native side. I was really thinking of a unit test in `org.apache.comet.CometCastSuite` -- This is an automated message from the A

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#issuecomment-2997096994 The CI test failure is unrelated to changes in this PR and is now fixed in main branch -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on code in PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#discussion_r2161998053 ## native/spark-expr/src/conversion_funcs/schubfach.rs: ## @@ -0,0 +1,1517 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] fix: Add overflow check for SumDecimalGroupsAccumulator::evaluate [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on PR #1922: URL: https://github.com/apache/datafusion-comet/pull/1922#issuecomment-2997097572 The CI test failure is unrelated to changes in this PR and is now fixed in main branch -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-06-23 Thread via GitHub
goldmedal commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2997099582 hi @alamb I have fixed the conflicts. If no more comments, I think we can merge it. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#issuecomment-2997098005 The CI test failure is unrelated to changes in this PR and is now fixed in main branch -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on code in PR #1920: URL: https://github.com/apache/datafusion-comet/pull/1920#discussion_r2162206037 ## common/src/main/java/org/apache/comet/parquet/ColumnReader.java: ## @@ -126,6 +126,13 @@ public void setPageReader(PageReader pageReader) throws IOExcept

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16509: URL: https://github.com/apache/datafusion/pull/16509#issuecomment-2997267614 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubun

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16509: URL: https://github.com/apache/datafusion/pull/16509#issuecomment-2997270770 🤖: Benchmark completed Details ``` Comparing HEAD and fast_path_view Benchmark sort_tpch.json ┏

Re: [PR] chore: Introduce `exprHandlers` map in QueryPlanSerde [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on PR #1903: URL: https://github.com/apache/datafusion-comet/pull/1903#issuecomment-2997187412 > I assume this is not the end of it and we would be enhancing this as we go? Yes, this is just a first step. > Some initial thoughts on that - Add one or more a

Re: [PR] Perf: Optimize CursorValues compare performance for StringViewArray (1.4X faster for sort-tpch Q11) [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16509: URL: https://github.com/apache/datafusion/pull/16509#issuecomment-2997267521 🤖: Benchmark completed Details ``` Comparing HEAD and fast_path_view Benchmark clickbench_extended.json ┏━━

Re: [PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on code in PR #1694: URL: https://github.com/apache/datafusion-comet/pull/1694#discussion_r2162109484 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -196,18 +198,34 @@ case class CometExecRule(session: SparkSession) extends Rule[Sp

Re: [PR] Add DESC alias for DESCRIBE command. [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16514: URL: https://github.com/apache/datafusion/pull/16514#issuecomment-2997307702 > Thanks @lucqui I love it I just realized we do not have `DESCRIBE` documented, I'm adding a new issue for it The power of community! -- This is an automated message from the

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove commented on code in PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#discussion_r2162047684 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -1298,6 +1298,7 @@ where }); } else {

Re: [PR] fix: `PushDownFilter` for `GROUP BY` on uppercase col names [datafusion]

2025-06-23 Thread via GitHub
aditanase commented on PR #16049: URL: https://github.com/apache/datafusion/pull/16049#issuecomment-2997142470 > I also suggest adding sqllogictest based on the sql in PR summary I'd be happy to, can you please point me at a sample patch or a good suite to add to? Last time I tried th

Re: [PR] chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [datafusion-comet]

2025-06-23 Thread via GitHub
kazuyukitanimura commented on code in PR #1910: URL: https://github.com/apache/datafusion-comet/pull/1910#discussion_r2162132677 ## dev/diffs/3.5.6.diff: ## @@ -1938,7 +1938,17 @@ index 8e88049f51e..d3c0737d52e 100644 import testImplicits._ // keep() should take effe

[PR] doc: Document DESCRIBE comman in ddl.md [datafusion]

2025-06-23 Thread via GitHub
krikera opened a new pull request, #16524: URL: https://github.com/apache/datafusion/pull/16524 Add documentation for DESCRIBE and DESC commands with syntax, examples, and output format explanation. Fixes #16518 ## Which issue does this PR close? - Closes #16518.

[PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new pull request, #16523: URL: https://github.com/apache/datafusion/pull/16523 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16522 ## Rationale for this change Following @berkaysynnada 's suggestion in http

Re: [I] [EPIC] More Async User Defined Function work [datafusion]

2025-06-23 Thread via GitHub
alamb commented on issue #16520: URL: https://github.com/apache/datafusion/issues/16520#issuecomment-2997682321 @goldmedal is there any other todo items you can think of for async udfs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb commented on code in PR #16523: URL: https://github.com/apache/datafusion/pull/16523#discussion_r2162390807 ## datafusion/expr/src/async_udf.rs: ## @@ -35,34 +35,7 @@ use std::sync::Arc; /// /// The name is chosen to mirror ScalarUDFImpl #[async_trait] -pub trait AsyncS

Re: [PR] Simplify AsyncScalarUdfImpl so it extends ScalarUdfImpl [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16523: URL: https://github.com/apache/datafusion/pull/16523#issuecomment-2997741800 I feel like there may be some more duplication we can remove as part of the PhysicalExpr layer too -- This is an automated message from the Apache Git Service. To respond to the mess

[I] Update `AsyncScalarUDFImpl` API to match `ScalarUDFImpl `API [datafusion]

2025-06-23 Thread via GitHub
alamb opened a new issue, #16522: URL: https://github.com/apache/datafusion/issues/16522 ### Is your feature request related to a problem or challenge? * https://github.com/apache/datafusion/pull/14837 introduces `AsyncScalarUDFImpl` to run async functions 🥳 🦜 🚀 However, the

Re: [PR] [datafusion-spark] Implement `factorical` function [datafusion]

2025-06-23 Thread via GitHub
alamb commented on PR #16125: URL: https://github.com/apache/datafusion/pull/16125#issuecomment-2997749229 gogogogogogo THanks again @shehabgamin and @tlm365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [datafusion-spark] Implement `factorical` function [datafusion]

2025-06-23 Thread via GitHub
alamb merged PR #16125: URL: https://github.com/apache/datafusion/pull/16125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Docker build kube/Dockerfile failed with ### COMPILER BUG DETECTED ### [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove closed issue #1917: Docker build kube/Dockerfile failed with ### COMPILER BUG DETECTED ### URL: https://github.com/apache/datafusion-comet/issues/1917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-23 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-2997787662 > I was not planning on changing it substantially anymore. I was thinking of maybe rereading the text with a fresh pair of eyes and editing a sentence here or there, but that's it. Need

[PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-23 Thread via GitHub
parthchandra opened a new pull request, #1925: URL: https://github.com/apache/datafusion-comet/pull/1925 #1817 introduced S3A configuration for the `native_datafusion` reader. This PR does the same for `native_iceberg_compat` ## How are these changes tested? Existing unit t

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-23 Thread via GitHub
parthchandra commented on PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925#issuecomment-2998299473 @Kontinuation please review if you can. (This PR is draft because I haven't been able to test it with S3 yet. The unit test passes, though). -- This is an automated me

Re: [I] Make `datafusion` read parquet folders if non parquet files exists [datafusion]

2025-06-23 Thread via GitHub
comphead commented on issue #16460: URL: https://github.com/apache/datafusion/issues/16460#issuecomment-2998350932 @hendrikmakait sorry I took the liberty to wrap my PR up, please feel free to review -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-23 Thread via GitHub
timsaucer commented on PR #1167: URL: https://github.com/apache/datafusion-python/pull/1167#issuecomment-2998390832 > I don't think this is a reasonable workaround because there are many Jupyter-protocol frontends that do not support displaying HTML output. This means that repr would be br

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
drexler-sky commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162749959 ## spark/src/main/scala/org/apache/comet/serde/arrays.scala: ## @@ -171,9 +184,9 @@ object CometArrayMax extends CometExpressionSerde { binding: Boo

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-06-23 Thread via GitHub
kazuyukitanimura merged PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] feat: collect once during display() in jupyter notebooks [datafusion-python]

2025-06-23 Thread via GitHub
kylebarron commented on PR #1167: URL: https://github.com/apache/datafusion-python/pull/1167#issuecomment-2998437499 As [mentioned in a comment on SO](https://stackoverflow.com/questions/15411967/how-can-i-check-if-code-is-executed-in-the-ipython-notebook/24937408#comment81917993_39662359),

Re: [PR] `TableProvider` to skip files in the folder which non relevant to selected reader [datafusion]

2025-06-23 Thread via GitHub
comphead commented on code in PR #16487: URL: https://github.com/apache/datafusion/pull/16487#discussion_r2162736134 ## datafusion/core/src/datasource/listing_table_factory.rs: ## @@ -125,6 +125,13 @@ impl TableProviderFactory for ListingTableFactory { // specifical

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
drexler-sky commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162673921 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -232,24 +232,42 @@ class CometArrayExpressionSuite extends CometTestBase w

[I] Add support for Spark SQL `explode` expression [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove opened a new issue, #1927: URL: https://github.com/apache/datafusion-comet/issues/1927 ### What is the problem the feature request solves? Add support for `explode`: https://spark.apache.org/docs/latest/api/sql/index.html#explode > explode(expr) - Separates the

[I] Add support for `size` expression [datafusion-comet]

2025-06-23 Thread via GitHub
andygrove opened a new issue, #1926: URL: https://github.com/apache/datafusion-comet/issues/1926 ### What is the problem the feature request solves? Add support for Spark SQL `size` expression: https://spark.apache.org/docs/latest/api/sql/index.html#size From the document

Re: [PR] feat: Implement ToPrettyString [datafusion-comet]

2025-06-23 Thread via GitHub
comphead commented on code in PR #1921: URL: https://github.com/apache/datafusion-comet/pull/1921#discussion_r2162731604 ## native/core/src/execution/planner.rs: ## @@ -746,6 +746,22 @@ impl PhysicalPlanner { let child = self.create_expr(expr.child.as_ref().unwr

Re: [PR] feat: Support hadoop s3a config in native_iceberg_compat [datafusion-comet]

2025-06-23 Thread via GitHub
codecov-commenter commented on PR #1925: URL: https://github.com/apache/datafusion-comet/pull/1925#issuecomment-2998365766 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1925?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: move udf registration to better place [datafusion-comet]

2025-06-23 Thread via GitHub
comphead merged PR #1899: URL: https://github.com/apache/datafusion-comet/pull/1899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: supports array_distinct [datafusion-comet]

2025-06-23 Thread via GitHub
comphead commented on code in PR #1923: URL: https://github.com/apache/datafusion-comet/pull/1923#discussion_r2162734270 ## spark/src/main/scala/org/apache/comet/serde/arrays.scala: ## @@ -171,9 +184,9 @@ object CometArrayMax extends CometExpressionSerde { binding: Boolea

Re: [PR] feat: Finalize support for `RightMark` join + `Mark` join swap [datafusion]

2025-06-23 Thread via GitHub
jonathanc-n commented on PR #16488: URL: https://github.com/apache/datafusion/pull/16488#issuecomment-2998466640 If you have the time, are you able to take a look? Should be a straightforward review, thanks! @comphead @Dandandan -- This is an automated message from the Apache Git Service

Re: [PR] Add more doc for physical filter pushdown [datafusion]

2025-06-23 Thread via GitHub
xudong963 commented on PR #16504: URL: https://github.com/apache/datafusion/pull/16504#issuecomment-2998476200 Thank you all, let's go! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Add more doc for physical filter pushdown [datafusion]

2025-06-23 Thread via GitHub
xudong963 merged PR #16504: URL: https://github.com/apache/datafusion/pull/16504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Add Cloud-Native Performance Monitoring System with GitHub Integration [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] commented on PR #15624: URL: https://github.com/apache/datafusion/pull/15624#issuecomment-2998537216 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix!: incorrect coercion when comparing with string literals [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] closed pull request #15482: fix!: incorrect coercion when comparing with string literals URL: https://github.com/apache/datafusion/pull/15482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] refactor!: consistent null handling in coercible signatures [datafusion]

2025-06-23 Thread via GitHub
github-actions[bot] closed pull request #15404: refactor!: consistent null handling in coercible signatures URL: https://github.com/apache/datafusion/pull/15404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

  1   2   >