Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-04-19 Thread via GitHub
kevinjqliu commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2816875067 > In order to try and make progress on this, I decided to go with having a single function that builds all tables for a single scale factor similar to how DuckDB does it. My

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
appletreeisyellow commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051594000 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { // For example, casts from string to

[I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-04-19 Thread via GitHub
adriangb opened a new issue, #15780: URL: https://github.com/apache/datafusion/issues/15780 ### Describe the bug Consider the following test: ```sql COPY ( SELECT arrow_cast(a, 'Int16') AS a FROM ( VALUES (1), (2), (3) ) AS t(a) ) TO 'test_files/scratch/parquet

[PR] move reassign_predicate_columns onto PhysicalExpr [datafusion]

2025-04-19 Thread via GitHub
adriangb opened a new pull request, #15779: URL: https://github.com/apache/datafusion/pull/15779 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] fix: clickbench type err [datafusion]

2025-04-19 Thread via GitHub
Weijun-H commented on code in PR #15773: URL: https://github.com/apache/datafusion/pull/15773#discussion_r2051622969 ## benchmarks/queries/clickbench/README.md: ## @@ -155,7 +155,7 @@ WHERE THEN split_part(split_part("URL", 'resolution=', 2), '&', 1)::INT ELSE

Re: [I] Add Support for Dynamic SQL Macros for Flexible Column Selection [datafusion]

2025-04-19 Thread via GitHub
kumarlokesh commented on issue #14512: URL: https://github.com/apache/datafusion/issues/14512#issuecomment-2816854617 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816927115 @etseidl I just pushed a change that does what I think you suggested and only denies a cast between a string and a non string type but otherwise assumes that the general casting rul

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816927715 Marking as ready for review despite not having any numbers to substantiate performance improvement (because we need #15769) given that algorithmically and from experience in the pre

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2051611694 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -148,7 +149,7 @@ mod tests { let mut source = ParquetSource::default();

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816928597 Thanks @adriangb. I'm testing now... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2051614507 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -5006,7 +5006,7 @@ SELECT column5, avg(column1) FROM d GROUP BY column5; query I?? SELECT column5,

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816941931 Yep, fixes my issue. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Implement Parquet filter pushdown via new filter pushdown APIs [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2816943989 > Thanks @adriangb -- I am about to be offline for a week so I will review this when I return Enjoy your vacation! I think you'll like this diff: https://github.com/use

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816945357 Given approvals and relatively simple change, could a committer merge this please? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] fix: describe Parquet schema with coerce_int96 [datafusion]

2025-04-19 Thread via GitHub
chenkovsky closed pull request #15750: fix: describe Parquet schema with coerce_int96 URL: https://github.com/apache/datafusion/pull/15750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Show current SQL recursion limit in RecursionLimitExceeded error message [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on PR #15644: URL: https://github.com/apache/datafusion/pull/15644#issuecomment-2816727289 > thanks @kumarlokesh The code now looks much more aligned. > > perhaps we can factor out > > ``` > self.parser >

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
appletreeisyellow commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816729397 @adriangb Yes, I'm happy to review. I'll have some time to review it this afternoon or evening -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Use `interleave` in hash repartitioning [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051494486 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -233,11 +233,11 @@ impl BatchPartitioner { /// /// The time spent repartitioning, not includi

Re: [I] OOM when nested join + limit [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816732058 > What I suggest is that someone updates our documentation with the current state of joins in DataFusion (namely what operators are implemented and what types of joins they ar

[PR] Perform type coercion for corr aggregate function during physical planning [datafusion]

2025-04-19 Thread via GitHub
kumarlokesh opened a new pull request, #15776: URL: https://github.com/apache/datafusion/pull/15776 ## Which issue does this PR close? - Closes #13721. ## Rationale for this change ## What changes are included in this PR? - Created a new utility

Re: [I] Move code in `user_defined_plan.rs` to the `extending-operators` doc [datafusion]

2025-04-19 Thread via GitHub
Adez017 commented on issue #15774: URL: https://github.com/apache/datafusion/issues/15774#issuecomment-2816779407 HI @xudong963 , i want to work on this , but coul you clarify what exactly we need to do -- This is an automated message from the Apache Git Service. To respond to the mes

[I] [Feature] Add SQL Example for Aggregate functions [datafusion]

2025-04-19 Thread via GitHub
Adez017 opened a new issue, #15777: URL: https://github.com/apache/datafusion/issues/15777 Some of the examples are there for the Aggregate functions in The docs but most are remaining. Want that exmaple should be added for rest -- This is an automated message from the Apache Git Servi

Re: [I] [Feature] Add SQL Example for Aggregate functions [datafusion]

2025-04-19 Thread via GitHub
Adez017 commented on issue #15777: URL: https://github.com/apache/datafusion/issues/15777#issuecomment-2816783180 I want to deal with it personally -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [Feature] Add SQL Example for Aggregate functions [datafusion]

2025-04-19 Thread via GitHub
Adez017 commented on issue #15777: URL: https://github.com/apache/datafusion/issues/15777#issuecomment-2816783217 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-04-19 Thread via GitHub
james-ryans commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2816972256 I would love to work on this task -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-04-19 Thread via GitHub
PokIsemaine commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2051632953 ## datafusion/expr/src/expr.rs: ## @@ -701,6 +701,24 @@ impl TryCast { } } +/// OrderBy Expressions +pub enum OrderByExprs { +OrderByExprVec(Vec),

Re: [I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2816990712 I can confirm this is currently being done at the LogicalPlan level. I'd say the first step is to understand how it happens there and then if something similar exists for Physi

Re: [PR] Support `Accumulator` for avg duration [datafusion]

2025-04-19 Thread via GitHub
alamb merged PR #15468: URL: https://github.com/apache/datafusion/pull/15468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add a "Gentle Introduction to Arrow / Record Batches" [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #11336: URL: https://github.com/apache/datafusion/issues/11336#issuecomment-2816668707 Thank you @Adez017 -- I would personally suggest starting by porting some of the contents of https://jorgecarleitao.github.io/arrow2/main/guide/ into DataFUsion's docs In

Re: [PR] Improve documentation for format `OPTIONS` clause [datafusion]

2025-04-19 Thread via GitHub
alamb commented on code in PR #15708: URL: https://github.com/apache/datafusion/pull/15708#discussion_r2051458845 ## docs/source/user-guide/sql/format_options.md: ## @@ -0,0 +1,209 @@ + + +# Format Options + +DataFusion supports customizing how data is read from or written to di

Re: [PR] Improve `simplify_expressions` rule [datafusion]

2025-04-19 Thread via GitHub
alamb merged PR #15735: URL: https://github.com/apache/datafusion/pull/15735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve `simplify_expressions` rule [datafusion]

2025-04-19 Thread via GitHub
alamb commented on PR #15735: URL: https://github.com/apache/datafusion/pull/15735#issuecomment-2816669049 Thanks @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] enable `supports_filter_during_aggregation` for Generic dialect [datafusion-sqlparser-rs]

2025-04-19 Thread via GitHub
alamb merged PR #1815: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] OOM when nested join + limit [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816665497 TLDR while this is straight forward bug report, I think fixing it is not not something we are going to make a patch for -- it will require a more serious implementation effort fo

Re: [PR] re-implement filter pushdown for parquet [datafusion]

2025-04-19 Thread via GitHub
alamb commented on PR #15769: URL: https://github.com/apache/datafusion/pull/15769#issuecomment-2816667139 Thanks @adriangb -- I am about to be offline for a week so I will review this when I return -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816667473 Funny enough I just opened https://github.com/apache/datafusion/pull/15764 without having seen this issue! It sounds like there may be some complexity with floats... hone

Re: [PR] Improve documentation for format `OPTIONS` clause [datafusion]

2025-04-19 Thread via GitHub
alamb merged PR #15708: URL: https://github.com/apache/datafusion/pull/15708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] feat: ORDER BY ALL [datafusion]

2025-04-19 Thread via GitHub
PokIsemaine opened a new pull request, #15772: URL: https://github.com/apache/datafusion/pull/15772 ## Which issue does this PR close? - None ## Rationale for this change - https://github.com/apache/datafusion/issues/14514 ## What changes are includ

Re: [PR] fix: describe Parquet schema with coerce_int96 [datafusion]

2025-04-19 Thread via GitHub
comphead merged PR #15750: URL: https://github.com/apache/datafusion/pull/15750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] When `datafusion.execution.parquet.coerce_int96` is set, timestamp type is still reported as Timestamp(nanoseconds) [datafusion]

2025-04-19 Thread via GitHub
comphead closed issue #15721: When `datafusion.execution.parquet.coerce_int96` is set, timestamp type is still reported as Timestamp(nanoseconds) URL: https://github.com/apache/datafusion/issues/15721 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051557860 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { // For example, casts from string to numbers

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-19 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816688267 > @Dandandan I believe with this setup we should be able to achieve with a couple LOC in `insert_batch`: > > ```rust > // Apply the filter to the batch before processing

Re: [I] OOM when nested join + limit [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816753297 I can try and help this effort in a few weeks, but I need to finish up the filter pushdown work in parquet and topk work first (I have too many things outstanding at the moment an

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
alamb commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051508492 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { // For example, casts from string to numbers is

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
alamb commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051508562 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1544,7 +1572,10 @@ fn build_predicate_expression( Ok(builder) => builder, // allow partial

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on code in PR #15744: URL: https://github.com/apache/datafusion/pull/15744#discussion_r2051509268 ## datafusion/optimizer/src/push_down_limit.rs: ## @@ -137,6 +142,9 @@ impl OptimizerRule for PushDownLimit { } } else {

[PR] fix: clickbench type err [datafusion]

2025-04-19 Thread via GitHub
chenkovsky opened a new pull request, #15773: URL: https://github.com/apache/datafusion/pull/15773 ## Which issue does this PR close? - Closes #15753. ## Rationale for this change column types of UTMSource and UTMCampaign in clickbench_partitioned are binary, but in

Re: [I] Incorrect field indices for right‑side columns in Substrait ProjectRel after [datafusion]

2025-04-19 Thread via GitHub
chenkovsky commented on issue #15765: URL: https://github.com/apache/datafusion/issues/15765#issuecomment-2816756469 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add `statistics_by_partition API` to ExecutionPlan [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on PR #15503: URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2816757906 I may start a new branch based on the branch to experiment with @berkaysynnada's suggestion to see if there are some challenges next week. /cc @alamb @suremarc @wiedld (Hope we ca

Re: [I] Support coercsion from `FixedSizeBinary` to `BinaryView` [datafusion]

2025-04-19 Thread via GitHub
chenkovsky commented on issue #15755: URL: https://github.com/apache/datafusion/issues/15755#issuecomment-2816757278 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
alamb commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051510677 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { // For example, casts from string to numbers is

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816758476 - While reviewing https://github.com/apache/datafusion/pull/15764 it wasn't clear to me why we are checking casting / types at all in the pruning code. I think that might

Re: [PR] Add DataFusion 47.0.0 Upgrade Guide [datafusion]

2025-04-19 Thread via GitHub
alamb commented on PR #15749: URL: https://github.com/apache/datafusion/pull/15749#issuecomment-2816763464 > Thank you @alamb > > After merging this one, if I have a chance on the weekend, I'll add something other to the guide. Thank you very much @xudong963 -- This is an a

[I] Tracking: speed up the logical optimizer [datafusion]

2025-04-19 Thread via GitHub
xudong963 opened a new issue, #15775: URL: https://github.com/apache/datafusion/issues/15775 I plan to speed up the logical optimizer from two aspects: 1. Try to reduce the optimization rounds by making each rule generate the best plan as much as possible 2. Optimize the single rule's

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2816765297 > Currently, q23 takes approximately 6 seconds to execute. I have confirmed that DataFusion does not have the aforementioned optimizations and still scans a very large number of r

Re: [I] Blog post about user defined window functions [datafusion]

2025-04-19 Thread via GitHub
alamb closed issue #6781: Blog post about user defined window functions URL: https://github.com/apache/datafusion/issues/6781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] User defined window functions blog post [datafusion-site]

2025-04-19 Thread via GitHub
alamb merged PR #66: URL: https://github.com/apache/datafusion-site/pull/66 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [PR] User defined window functions blog post [datafusion-site]

2025-04-19 Thread via GitHub
alamb commented on PR #66: URL: https://github.com/apache/datafusion-site/pull/66#issuecomment-2816641318 The blog is now live: https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions/ -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] User defined window functions blog post [datafusion-site]

2025-04-19 Thread via GitHub
alamb commented on PR #66: URL: https://github.com/apache/datafusion-site/pull/66#issuecomment-2816641235 Thanks again @Adez017 @Dandandan @getChan and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2051461834 ## datafusion/physical-expr/src/expressions/mod.rs: ## @@ -22,7 +22,7 @@ mod binary; mod case; mod cast; mod column; -mod dynamic_filters; +pub mod dynamic_fil

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-19 Thread via GitHub
andygrove merged PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816673274 > Pausing this until #15769 is done I was able to unblock by wiring up to TestDataSource -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-19 Thread via GitHub
andygrove merged PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816680541 @Dandandan I believe with this setup we should be able to achieve with a couple LOC in `insert_batch`: ```rust // Apply the filter to the batch before processing let fil

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816792398 I'll follow up in #15764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051538025 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { // For example, casts from string to numbers i

[PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-04-19 Thread via GitHub
Adez017 opened a new pull request, #15778: URL: https://github.com/apache/datafusion/pull/15778 ## Which issue does this PR close? - Closes #15777 ## Rationale for this change Added Example for all the Aggregate funcitons provided in docs under `window funcitons`

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2816794323 Thanks @adriangb! I think if we clean up the pruning allowed types this can close #15742. We can then tackle the special handling for floats later. They're already wrong for n

[I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-04-19 Thread via GitHub
alamb opened a new issue, #15771: URL: https://github.com/apache/datafusion/issues/15771 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2816650357 I filed the following ticket for the next release: - https://github.com/apache/datafusion/issues/15771 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-19 Thread via GitHub
etseidl commented on code in PR #15764: URL: https://github.com/apache/datafusion/pull/15764#discussion_r2051538025 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { // For example, casts from string to numbers i

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-19 Thread via GitHub
alamb commented on code in PR #15646: URL: https://github.com/apache/datafusion/pull/15646#discussion_r2051449542 ## datafusion/common/src/dfschema.rs: ## @@ -969,16 +969,28 @@ impl Display for DFSchema { /// widely used in the DataFusion codebase. pub trait ExprSchema: std::f

Re: [I] `Cargo bench --bench sql_planner` is failing [datafusion]

2025-04-19 Thread via GitHub
alamb closed issue #15762: `Cargo bench --bench sql_planner` is failing URL: https://github.com/apache/datafusion/issues/15762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] `Cargo bench --bench sql_planner` is failing [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15762: URL: https://github.com/apache/datafusion/issues/15762#issuecomment-2816657782 I believe this is a duplicate of - https://github.com/apache/datafusion/issues/15753 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on PR #15766: URL: https://github.com/apache/datafusion/pull/15766#issuecomment-2816770079 There is a picture that I drew before, maybe we can convert it into doc ![image](https://github.com/user-attachments/assets/e6d75e1a-4128-4cdf-98db-0d1aebb5e192) -- This i

[PR] chore: add read/write roundtrip tests [datafusion-ballista]

2025-04-19 Thread via GitHub
milenkovicm opened a new pull request, #1249: URL: https://github.com/apache/datafusion-ballista/pull/1249 # Which issue does this PR close? Closes None. # Rationale for this change Adding few more tests to cover writer/read round trip # What changes are included

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-19 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2051416091 ## src/parser/mod.rs: ## @@ -4055,6 +4070,44 @@ impl<'a> Parser<'a> { ) } +/// Look backwards in the token stream and expect that th

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816659212 > It would be nice if Datafusion always used statistics for floating point columns if they are available. One potential fix is to add more cases to verify_support_type_for_prune (

Re: [I] Support more types when pruning Parquet data [datafusion]

2025-04-19 Thread via GitHub
alamb commented on issue #15742: URL: https://github.com/apache/datafusion/issues/15742#issuecomment-2816659245 FYI @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: ORDER BY ALL [datafusion]

2025-04-19 Thread via GitHub
xudong963 commented on code in PR #15772: URL: https://github.com/apache/datafusion/pull/15772#discussion_r2051496431 ## datafusion/expr/src/expr.rs: ## @@ -701,6 +701,24 @@ impl TryCast { } } +/// OrderBy Expressions +pub enum OrderByExprs { +OrderByExprVec(Vec), +

Re: [PR] Use `interleave` in hash repartitioning [datafusion]

2025-04-19 Thread via GitHub
Dandandan commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051497967 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -233,11 +233,11 @@ impl BatchPartitioner { /// /// The time spent repartitioning, not includi

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-19 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2051499380 ## datafusion/core/tests/physical_optimizer/push_down_filter/mod.rs: ## @@ -0,0 +1,401 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-04-19 Thread via GitHub
Adez017 commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2816830201 @alamb need your help with the same issue that i faced in previus PR . i had tried running ``` /dev/update_function_docs.sh ``` but i think there is something left o