Re: [PR] Support some of pipe operators [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
iffyio commented on PR #1759: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1759#issuecomment-2837681286 @simonvandel could you take a look at the CI issues when you get the time? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Add all missing table options to be handled in any order [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
iffyio commented on code in PR #1747: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1747#discussion_r2065632669 ## src/parser/mod.rs: ## @@ -7081,18 +7029,243 @@ impl<'a> Parser<'a> { if let Token::Word(word) = self.peek_token().token {

Re: [I] [REGRESSION] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
ozankabak commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2837673359 I remember that we discussed this before and the conclusion that this is actually valid behavior. Quoting from Wikipedia: > ORDER BY is the only way to sort the rows in

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-04-28 Thread via GitHub
2010YOUY01 commented on code in PR #15409: URL: https://github.com/apache/datafusion/pull/15409#discussion_r2065594544 ## datafusion/datasource/src/memory.rs: ## @@ -723,6 +761,222 @@ impl MemorySourceConfig { pub fn original_schema(&self) -> SchemaRef { Arc::clone

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on code in PR #15865: URL: https://github.com/apache/datafusion/pull/15865#discussion_r2065622070 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1129,7 +1130,17 @@ impl ListingTable { let (file_group, inexact_stats) = get_fil

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-04-28 Thread via GitHub
2010YOUY01 commented on code in PR #15409: URL: https://github.com/apache/datafusion/pull/15409#discussion_r2065566166 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3471,3 +3477,102 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> {

[I] Add schema mapper factory to `ListingOptions` [datafusion]

2025-04-28 Thread via GitHub
xudong963 opened a new issue, #15889: URL: https://github.com/apache/datafusion/issues/15889 I think using the default schema mapper makes sense for now / in this PR, but in general I think it would make sense to allow the user to provide their own schema mapping rules here (

Re: [PR] Respect ignore_nulls in array_agg [datafusion]

2025-04-28 Thread via GitHub
joroKr21 commented on code in PR #15544: URL: https://github.com/apache/datafusion/pull/15544#discussion_r2065610845 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -288,10 +300,23 @@ impl Accumulator for ArrayAggAccumulator { return internal_err!("expects

Re: [PR] Snowflake: Add support for `CONNECT_BY_ROOT` [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
iffyio merged PR #1780: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Question: why is the Visitor trait limited to statements, relations & expressions? [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
freshtonic commented on issue #934: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/934#issuecomment-2837635646 @ramnes anything I can do to help you make [your Visitor changes](https://github.com/formalco/datafusion-sqlparser-rs/commit/3ab0faf6247287c9c31abd048ad8ed93f105fe01

Re: [PR] feat: support min/max for struct [datafusion]

2025-04-28 Thread via GitHub
chenkovsky commented on PR #15667: URL: https://github.com/apache/datafusion/pull/15667#issuecomment-2837583800 @alamb could you please review it again ? I see min/max for dict is supported now. struct is similar. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065542062 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,168 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065539727 ## datafusion/spark/src/function/string/ascii.rs: ## @@ -0,0 +1,210 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065539353 ## datafusion/sqllogictest/src/engines/conversion.rs: ## @@ -77,7 +77,21 @@ pub(crate) fn f64_to_str(value: f64) -> String { } else if value == f64::NEG_I

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065538751 ## datafusion/spark/src/lib.rs: ## @@ -0,0 +1,154 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
shehabgamin commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065538632 ## Cargo.lock: ## @@ -2558,6 +2558,27 @@ dependencies = [ "tokio", ] +[[package]] +name = "datafusion-spark" +version = "47.0.0" +dependencies = [ + "arro

[I] feat: Set/cancel with job tag for CometBroadcastExchangeExec [datafusion-comet]

2025-04-28 Thread via GitHub
wForget opened a new issue, #1692: URL: https://github.com/apache/datafusion-comet/issues/1692 ### What is the problem the feature request solves? Similar to https://github.com/apache/incubator-gluten/pull/4882, we also need to set/cancel with job tag for `CometBroadcastExchangeExec`

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on PR #15865: URL: https://github.com/apache/datafusion/pull/15865#issuecomment-2837504538 > Can you please add a test that verifies the statistics of a ListingTable that was created with two parquet files of different schemas? I think you could write a SLT level test wi

[I] feat: Make max broadcast table size configurable [datafusion-comet]

2025-04-28 Thread via GitHub
wForget opened a new issue, #1691: URL: https://github.com/apache/datafusion-comet/issues/1691 ### What is the problem the feature request solves? Make max broadcast table size configurable, see: https://github.com/apache/spark/pull/50327 ### Describe the potential solution

Re: [PR] Add `statistics_by_partition API` to ExecutionPlan [datafusion]

2025-04-28 Thread via GitHub
xudong963 closed pull request #15503: Add `statistics_by_partition API` to ExecutionPlan URL: https://github.com/apache/datafusion/pull/15503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Implement min max for dictionary types [datafusion]

2025-04-28 Thread via GitHub
xudong963 merged PR #15827: URL: https://github.com/apache/datafusion/pull/15827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2837476884 > This PR appears to hve no changes https://private-user-images.githubusercontent.com/490673/438510565-a8b12e9d-739c-4b18-9df2-69d64572667e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVC

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on PR #15766: URL: https://github.com/apache/datafusion/pull/15766#issuecomment-2837463051 Thank you @alamb @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-28 Thread via GitHub
xudong963 merged PR #15766: URL: https://github.com/apache/datafusion/pull/15766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] chore: fix build errors [datafusion-comet]

2025-04-28 Thread via GitHub
codecov-commenter commented on PR #1690: URL: https://github.com/apache/datafusion-comet/pull/1690#issuecomment-2837440640 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1690?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: fix build errors [datafusion-comet]

2025-04-28 Thread via GitHub
comphead commented on code in PR #1690: URL: https://github.com/apache/datafusion-comet/pull/1690#discussion_r2065374830 ## spark/src/main/scala/org/apache/comet/serde/arrays.scala: ## @@ -189,7 +190,7 @@ object CometArrayRepeat extends CometExpressionSerde with IncompatExpr {

[PR] chore: fix build errors [datafusion-comet]

2025-04-28 Thread via GitHub
comphead opened a new pull request, #1690: URL: https://github.com/apache/datafusion-comet/pull/1690 ## Which issue does this PR close? Fix build errors Closes #. ## Rationale for this change ## What changes are included in this PR? ## How ar

Re: [I] ILike with no wildcards is mistakenly optimized to string equality [datafusion]

2025-04-28 Thread via GitHub
comphead closed issue #15835: ILike with no wildcards is mistakenly optimized to string equality URL: https://github.com/apache/datafusion/issues/15835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
comphead merged PR #15836: URL: https://github.com/apache/datafusion/pull/15836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Concat error while testing "array_repeat" [datafusion-comet]

2025-04-28 Thread via GitHub
comphead closed issue #1347: Concat error while testing "array_repeat" URL: https://github.com/apache/datafusion-comet/issues/1347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-28 Thread via GitHub
comphead merged PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-28 Thread via GitHub
comphead commented on PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#issuecomment-2837304316 Thanks @parthchandra for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] chore: Rename `scalarExprToProto` to `scalarFunctionExprToProto` [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove merged PR #1688: URL: https://github.com/apache/datafusion-comet/pull/1688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: typo for `instr` in fuzz testing [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove merged PR #1686: URL: https://github.com/apache/datafusion-comet/pull/1686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: simplify count distinct logical plan [datafusion]

2025-04-28 Thread via GitHub
jayzhan211 commented on PR #15867: URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2837279738 I found we actually didn't have distinct count group accumulator https://github.com/apache/datafusion/pull/15888 I try one and the performance is much better now, but

[PR] Draft: Count distinct opt [datafusion]

2025-04-28 Thread via GitHub
jayzhan211 opened a new pull request, #15888: URL: https://github.com/apache/datafusion/pull/15888 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Set HashJoin seed [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15783: URL: https://github.com/apache/datafusion/pull/15783#issuecomment-2837274656 > │ QQuery 4 │ 887.43ms │ 740.13ms │ +1.20x faster │ Given these queries don't have joins I am not sure that is reproduceable 😬 -- This is an automated message

Re: [PR] feat: simplify count distinct logical plan [datafusion]

2025-04-28 Thread via GitHub
chenkovsky commented on PR #15867: URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2837256602 @jayzhan211 I think the root cause of poor performance is that, orginal plan can count parallelly, but current plan is actually blocked on final aggregation. I think I' trying to

Re: [I] Migrate optimizer tests to `insta` [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #15396: URL: https://github.com/apache/datafusion/issues/15396#issuecomment-2837227469 Thanks -- done @qstommyshu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] Migrate optimizer tests to `insta` [datafusion]

2025-04-28 Thread via GitHub
blaginin opened a new issue, #15396: URL: https://github.com/apache/datafusion/issues/15396 In https://github.com/apache/datafusion/issues/15178, we're switching hard-coded constants in tests to `insta`. This issue targets updating **optimizer tests** (`datafusion/optimizer`).

Re: [PR] Migrate Optimizer tests to insta, part2 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15884: URL: https://github.com/apache/datafusion/pull/15884#issuecomment-2837218022 🚀 FYI @blaginin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Migrate Optimizer tests to insta, part2 [datafusion]

2025-04-28 Thread via GitHub
alamb merged PR #15884: URL: https://github.com/apache/datafusion/pull/15884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Set HashJoin seed [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15783: URL: https://github.com/apache/datafusion/pull/15783#issuecomment-2837217773 🤖: Benchmark completed Details ``` Comparing HEAD and fix_hash-join-seed Benchmark clickbench_extended.json

Re: [PR] Added SQL Example for `Aggregate Functions` [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15778: URL: https://github.com/apache/datafusion/pull/15778#issuecomment-2837211036 This PR appears to hve no changes https://github.com/user-attachments/assets/a8b12e9d-739c-4b18-9df2-69d64572667e"; /> I may be missing something 🤔 -- This is an automate

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2837210261 > duckdb's CALL dbgen(sf = 1); creates tables in the current schema and then populates those tables with data using its own format. The other thing we can do is to just make

Re: [PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15766: URL: https://github.com/apache/datafusion/pull/15766#issuecomment-2837199583 > This is a helpful improvement 😄 > > I agree with @xudong963 that diagrams would be super helpful! We should put them in one place (e.g. in `DataSourceExec` or `DataSource`) and

[I] Add diagrams for relationship between `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-28 Thread via GitHub
alamb opened a new issue, #15887: URL: https://github.com/apache/datafusion/issues/15887 This is a helpful improvement 😄 I agree with @xudong963 that diagrams would be super helpful! We should put them in one place (e.g. in `DataSourceExec` or `DataSource`) and link

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2837191380 This PR seems to have some CI failures Please mark it as ready for review when it is ready for another look -- This is an automated message from the Apache Git Service. To re

Re: [PR] refactor: replace `unwrap_or` with `unwrap_or_else` for improved lazy… [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15841: URL: https://github.com/apache/datafusion/pull/15841#issuecomment-2837191694 Thank you for this PR @NevroHelios -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add `statistics_by_partition API` to ExecutionPlan [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15503: URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2837190474 I believe this is superceded by https://github.com/apache/datafusion/pull/15852 so marking as a draft -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Respect ignore_nulls in array_agg [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15544: URL: https://github.com/apache/datafusion/pull/15544#discussion_r2065191063 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -288,10 +300,23 @@ impl Accumulator for ArrayAggAccumulator { return internal_err!("expects si

Re: [PR] Feat: introduce `ExecutionPlan::partition_statistics` API [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2065185660 ## datafusion/physical-plan/src/coalesce_batches.rs: ## @@ -196,7 +196,14 @@ impl ExecutionPlan for CoalesceBatchesExec { } fn statistics(&self) -> Resul

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2065176521 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -941,49 +994,15 @@ impl ExecutionPlan for AggregateExec { } fn statistics(&self) -> Result { -

Re: [PR] Set HashJoin seed [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15783: URL: https://github.com/apache/datafusion/pull/15783#discussion_r2065171783 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -86,6 +86,10 @@ use datafusion_physical_expr_common::physical_expr::fmt_sql; use futures::{ready, Stream,

Re: [PR] Set HashJoin seed [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15783: URL: https://github.com/apache/datafusion/pull/15783#issuecomment-2837160708 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15836: URL: https://github.com/apache/datafusion/pull/15836#discussion_r2065170350 ## datafusion/sqllogictest/test_files/strings.slt: ## @@ -115,6 +115,12 @@ p1 p1e1 p1m1e1 +query T rowsort Review Comment: In case anyone is curious, without

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-04-28 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2837154356 Closing, no update from @TheBuilderJR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-04-28 Thread via GitHub
kosiew closed pull request #15295: Enhance Schema adapter to accommodate evolving struct URL: https://github.com/apache/datafusion/pull/15295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15865: URL: https://github.com/apache/datafusion/pull/15865#discussion_r2065156402 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1129,7 +1130,17 @@ impl ListingTable { let (file_group, inexact_stats) = get_files_w

Re: [PR] chore: Remove fallback reason "because the children were not native" [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra commented on PR #1672: URL: https://github.com/apache/datafusion-comet/pull/1672#issuecomment-2837134447 Merged. Thanks @andygrove, @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] chore: Remove fallback reason "because the children were not native" [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra merged PR #1672: URL: https://github.com/apache/datafusion-comet/pull/1672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065131934 ## datafusion/sqllogictest/test_files/spark/README.md: ## @@ -0,0 +1,57 @@ + + +# Spark Test Files + +This directory contains test files for the `spark` test suite.

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
srh commented on PR #15836: URL: https://github.com/apache/datafusion/pull/15836#issuecomment-2837073697 > Thanks @srh for providing the test case please add the query to one of select.slt files to preserve the regression I have added a test case (not the one with an inner select, bec

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
srh commented on code in PR #15836: URL: https://github.com/apache/datafusion/pull/15836#discussion_r2065080834 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1606,8 +1606,9 @@ impl TreeNodeRewriter for Simplifier<'_, S> {

Re: [PR] chore: Remove fallback reason "because the children were not native" [datafusion-comet]

2025-04-28 Thread via GitHub
comphead commented on PR #1672: URL: https://github.com/apache/datafusion-comet/pull/1672#issuecomment-2837020381 are we okay to merge it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-28 Thread via GitHub
akurmustafa commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2837020145 Sort function in `Datafusion` accepts the vector. By this way, you can pass the desired lexicographical ordering such as (`.sort(vec![ident("userPrimaryKey").sort(true, true

Re: [I] [REGRESSION] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
comphead commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2836992611 > I'm getting the results I expect if I revert the changes from that commit ^ in this file: datafusion/sql/src/relation/mod.rs (ie: remove the call to `optimize_subquery_sort`)

[PR] chore: update dev/release/rat_exclude_files.txt [datafusion-comet]

2025-04-28 Thread via GitHub
hsiang-c opened a new pull request, #1689: URL: https://github.com/apache/datafusion-comet/pull/1689 ## Which issue does this PR close? Closes #. https://github.com/apache/datafusion-comet/issues/1678 ## Rationale for this change Update Rat's exclude files

Re: [I] Migrate optimizer tests to `insta` [datafusion]

2025-04-28 Thread via GitHub
qstommyshu commented on issue #15396: URL: https://github.com/apache/datafusion/issues/15396#issuecomment-2836976596 Hi @alamb and @blaginin , Do you mind reopening this issue just to indicate the status of this issue is not done yet? -- This is an automated message from the Apache

Re: [PR] Migrate Optimizer tests to insta, part2 [datafusion]

2025-04-28 Thread via GitHub
qstommyshu commented on PR #15884: URL: https://github.com/apache/datafusion/pull/15884#issuecomment-2836975145 Hi @alamb @blaginin , This PR is ready for review; I’ll tackle the remaining migrations in subsequent PRs to keep each set of changes manageable. -- This is an automated

Re: [PR] feat: make execution_graph.stages() public [datafusion-ballista]

2025-04-28 Thread via GitHub
andygrove commented on code in PR #1256: URL: https://github.com/apache/datafusion-ballista/pull/1256#discussion_r2064981773 ## ballista/scheduler/src/state/execution_graph.rs: ## @@ -218,7 +218,7 @@ impl ExecutionGraph { new_tid } -pub(crate) fn stages(&sel

Re: [PR] Do not add redundant subquery ordering into plan [datafusion]

2025-04-28 Thread via GitHub
maxburke commented on PR #12003: URL: https://github.com/apache/datafusion/pull/12003#issuecomment-2836969736 I think this change causes this bug: https://github.com/apache/datafusion/issues/15886 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] [REGRESSION] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
maxburke commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2836968414 I bisected this to commit 02eab80cd62e02fcb68dee8b99d63aaac680a66c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
comphead commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2836896989 Simpler test case ORDER BY in outer query ``` > explain select x.* from (select 1 a union all select null) x order by a nulls last; +---+--

[I] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
maxburke opened a new issue, #15886: URL: https://github.com/apache/datafusion/issues/15886 ### Describe the bug Referenced discussion: https://the-asf.slack.com/archives/C01QUFS30TD/p1745875862723149 Given this table: ``` > create table d1 (ul_node_id string); ```

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-28 Thread via GitHub
comphead commented on code in PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#discussion_r2064904374 ## native/spark-expr/src/array_funcs/array_repeat.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15836: URL: https://github.com/apache/datafusion/pull/15836#discussion_r2064905078 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1606,8 +1606,9 @@ impl TreeNodeRewriter for Simplifier<'_, S> {

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15836: URL: https://github.com/apache/datafusion/pull/15836#issuecomment-2836883716 Thank you for this PR @srh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Upgrade-guide: Downgrade "FileScanConfig –> FileScanConfigBuilder" headline [datafusion]

2025-04-28 Thread via GitHub
alamb merged PR #15883: URL: https://github.com/apache/datafusion/pull/15883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15727: URL: https://github.com/apache/datafusion/pull/15727#issuecomment-2836864703 What I suggest we should do with this PR is 1. `[#ignore]` the tests that are failing 2. leave a comment with link to the PR / ticket to fix them 3. Merge this PR -- This

Re: [PR] fix: describe Parquet schema with coerce_int96 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15750: URL: https://github.com/apache/datafusion/pull/15750#issuecomment-2836855789 Nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] fix: clickbench type err [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15773: URL: https://github.com/apache/datafusion/pull/15773#issuecomment-2836854794 Thank you so much @chenkovsky and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Improve `ListingTable` / `ListingTableOptions` docs [datafusion]

2025-04-28 Thread via GitHub
alamb merged PR #15767: URL: https://github.com/apache/datafusion/pull/15767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra commented on code in PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#discussion_r2064867088 ## native/spark-expr/src/array_funcs/array_repeat.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
comphead commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836840621 I'll try to start with https://github.com/apache/datafusion/issues/14510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-28 Thread via GitHub
daphnenhuch-at commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2836826844 > Thanks for the report [@daphnenhuch-at](https://github.com/daphnenhuch-at) > > If you want the output sorted in a particular way I think you need to explicitly ad

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836821370 > I think [#14595](https://github.com/apache/datafusion/pull/14595) is in a decent shape and could be merged :) Though it cannot unnest all queries and might produce wrong result

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836819606 - I am also trying to help organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2836818904 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Git

Re: [I] Implement nested join optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2836816875 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Git

Re: [I] [EPIC] More Subquery support [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #5483: URL: https://github.com/apache/datafusion/issues/5483#issuecomment-2836816535 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Git

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-28 Thread via GitHub
GitHub user daphnenhuch-at added a comment to the discussion: How does 'sort' interact with record batches? That doesn't fix this problem unfortunately. When I swap the order I still get the record batch starting with 8192 first GitHub link: https://github.com/apache/datafusion/discussions/1

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2836816251 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Gi

[I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-04-28 Thread via GitHub
alamb opened a new issue, #15885: URL: https://github.com/apache/datafusion/issues/15885 # What I see (what problem we are trying to solve) DataFusion's current join implementations are fairly basic. They are functional enough to run TPCH and TPC-DS, but lack other features such as large

Re: [I] Fix rat check errors during release process [datafusion-comet]

2025-04-28 Thread via GitHub
hsiang-c commented on issue #1678: URL: https://github.com/apache/datafusion-comet/issues/1678#issuecomment-2836800737 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
skyzh commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836799420 I think #14595 is in a decent shape and could be merged :) Though it cannot unnest all queries and might produce wrong result for some lateral joins, that would be a good starting

Re: [I] Out of memory when sorting [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #5108: URL: https://github.com/apache/datafusion/issues/5108#issuecomment-2836619319 I believe this has been fixed by much of the recent work with the external sorting from @2010YOUY01 and other collaborators -- This is an automated message from the Apache Git Ser

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-28 Thread via GitHub
GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches? I think you need to add the sort after the window function" - Related ticket for anyone followng along: https://github.com/apache/datafusion/issues/15833 Something like ```rust ctx .tab

Re: [D] San Francisco DataFusion Meetup scheduled for 9/25 [datafusion]

2025-04-28 Thread via GitHub
GitHub user emgeee closed a discussion: San Francisco DataFusion Meetup scheduled for 9/25 Lu.ma link: https://lu.ma/05lyf19h Interested in speaking? Submit a talk here https://forms.gle/WoFWTM3LRfUgBkBy7 GitHub link: https://github.com/apache/datafusion/discussions/11972 This is an aut

Re: [D] outlier, time compare or frequency analysis operators in datafusion? [datafusion]

2025-04-28 Thread via GitHub
GitHub user alamb added a comment to the discussion: outlier, time compare or frequency analysis operators in datafusion? I don't think we have any plans to do this that I know of I think implementing them using the DataFusion extension APIs is likely the best bet GitHub link: https://githu

  1   2   3   >