Re: [I] Restructure core codepaths to prevent stack overflows [datafusion]

2025-07-16 Thread via GitHub
ahmed-mez commented on issue #16788: URL: https://github.com/apache/datafusion/issues/16788#issuecomment-3077723754 From the work done in https://github.com/apache/datafusion/pull/16506, this is **not** substrait specific. The stack overflows happen during the planning / optimization phase

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
alamb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2210136013 ## docs/source/library-user-guide/upgrading.md: ## @@ -120,6 +120,17 @@ SET datafusion.execution.spill_compression = 'zstd'; For more details about this configura

[I] Allow comparison netween booleans and integers [datafusion]

2025-07-16 Thread via GitHub
osipovartem opened a new issue, #16797: URL: https://github.com/apache/datafusion/issues/16797 ### Is your feature request related to a problem or challenge? ```sql select true::boolean = 0 ``` Returns `cannot infer common argument type for comparison operation Boolean = Int6

Re: [I] Optimize the join operators [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078275180 > [@zhuqi-lucas](https://github.com/zhuqi-lucas) - these benchmarks use Parquet files, see the querybench repo for the code: https://github.com/MrPowers/querybench. I think

Re: [PR] DataFusion 48.0.0 blog post [datafusion-site]

2025-07-16 Thread via GitHub
alamb commented on PR #84: URL: https://github.com/apache/datafusion-site/pull/84#issuecomment-3078420635 It is now published: https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/ -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-16 Thread via GitHub
UBarney commented on PR #16443: URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3078462264 > > select t1.value from range(100) t1 join range(819200) t2 on t1.value + t2.value < t1.value * t2.value; > > I'm happy to include this benchmark in the bench suite this week,

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078463110 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

[PR] Allow comparison between boolean and int values [datafusion]

2025-07-16 Thread via GitHub
osipovartem opened a new pull request, #16798: URL: https://github.com/apache/datafusion/pull/16798 ## Which issue does this PR close? Closes #16797 ## Rationale for this change This PR enables comparison between boolean and interger types which already supported by [ar

Re: [PR] Fix: Preserve sorting for the COPY TO plan [datafusion]

2025-07-16 Thread via GitHub
alamb closed pull request #16785: Fix: Preserve sorting for the COPY TO plan URL: https://github.com/apache/datafusion/pull/16785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Fix: Preserve sorting for the COPY TO plan [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16785: URL: https://github.com/apache/datafusion/pull/16785#issuecomment-3078438440 close/reopen to retrigger CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] fix tests in page_pruning when filter pushdown is enabled by default [datafusion]

2025-07-16 Thread via GitHub
alamb merged PR #16794: URL: https://github.com/apache/datafusion/pull/16794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix tests in page_pruning when filter pushdown is enabled by default [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16794: URL: https://github.com/apache/datafusion/pull/16794#issuecomment-3078443389 Thank you @XiangpengHao and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078458739 > Strange, i can't reproduce this in my local: I will rerun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-07-16 Thread via GitHub
alamb opened a new issue, #16799: URL: https://github.com/apache/datafusion/issues/16799 ### Is your feature request related to a problem or challenge? Tracking ticket for next release, also a place to track desired inclusions Previous release will be https://crates.io/crates/d

Re: [D] DISCUSSION: DataFusion Meetup in Boston, USA - Nov 12, 2025 [datafusion]

2025-07-16 Thread via GitHub
GitHub user NGA-TRAN edited a discussion: DISCUSSION: DataFusion Meetup in Boston, USA - Nov 12, 2025 With the upcoming New York meetup on the horizon, the DataDog Boston team is excited to plan a local DataFusion-themed gathering this fall! **Date:** Wednesday, November 12 📍 Location: Data

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA - Sep 15, 2025 [datafusion]

2025-07-16 Thread via GitHub
GitHub user leoDYL edited a discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA - Sep 15, 2025 We are organizing an NYC meetup to celebrate the upcoming release 50. Currently planning on Sept 15th, 2025. We will organize it in the same location as #11213 Registration link: https://

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-16 Thread via GitHub
GitHub user alamb added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files 👋 Give your description, I am surprised that this query is using a HashAggregateStream -- the hash aggregate needs to buffer the entire dataset in RAM / spill i

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3078620431 I looked into this failure running clickbench: ``` │ QQuery 27│ 2328.28 ms │ FAIL │ incomparable │ ``` I ran the [`q27.sql`](https://github.com/apache

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078726061 🤖: Benchmark completed Details ``` group main reduce_expr_size -

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2210616673 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -140,6 +143,8 @@ impl ArrowPredicate for DatafusionArrowPredicate { } fn evaluate(&mut self,

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-16 Thread via GitHub
iffyio commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2210549842 ## src/parser/mod.rs: ## @@ -7724,6 +7737,27 @@ impl<'a> Parser<'a> { return option; } +self.with_state( +Co

Re: [I] count_all() aggregations cannot be aliased [datafusion]

2025-07-16 Thread via GitHub
Loaki07 commented on issue #16795: URL: https://github.com/apache/datafusion/issues/16795#issuecomment-3078924010 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Optimize the join operators [datafusion]

2025-07-16 Thread via GitHub
UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078931663 > Thanks [@nuno-faria](https://github.com/nuno-faria) that's a great insight (for TPC-H / very nested joins we probably should implement a smarter join order algorithm). >

[I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-16 Thread via GitHub
adriangb opened a new issue, #16800: URL: https://github.com/apache/datafusion/issues/16800 As discussed in https://github.com/apache/datafusion/pull/16791 the long term plan in my mind (and that I would like to discuss with the community) is to replace `SchemaAdapter` with `PhysicalExprAda

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on PR #16791: URL: https://github.com/apache/datafusion/pull/16791#issuecomment-3078933391 I opened https://github.com/apache/datafusion/issues/16800 to track the big picture -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] q9 [datafusion-comet]

2025-07-16 Thread via GitHub
comphead closed issue #2005: q9 URL: https://github.com/apache/datafusion-comet/issues/2005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-16 Thread via GitHub
comphead commented on code in PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210658711 ## native/hdfs/src/object_store/hdfs.rs: ## @@ -88,19 +88,18 @@ impl HadoopFileSystem { fn read_range(range: &Range, file: &HdfsFile) -> Result {

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-07-16 Thread via GitHub
alamb commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3078987206 > Marked the reduce Expr size task here: > > [#16771](https://github.com/apache/datafusion/pull/16771) Added -- This is an automated message from the Apache Git Ser

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3079000603 My guess is that some of the new slowdown / less predictability is due to many more `Box`es (and thus allocations) -- I suggest we reconsider Boxing frequently used structures (like Co

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3079128237 > My guess is that some of the new slowdown / less predictability is due to many more `Box`es (and thus allocations) -- I suggest we reconsider Boxing frequently used structures

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
mbutrovich commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2210768385 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2765,6 +2765,26 @@ class CometExpressionSuite extends CometTestBase with Adapti

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078771877 > After opening the DF50.0.0 release issue, you can add it to the list Thank you @xudong963 , added it in https://github.com/apache/datafusion/issues/16799#issuecomment-307

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3078770965 Marked the reduce Expr size task here: https://github.com/apache/datafusion/pull/16771 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3078761333 > 🤖: Benchmark completed > > Details > > ``` > group main reduce_expr_size > -

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
mbutrovich commented on PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#issuecomment-3079038284 > In those scenarios we do have reproducibility and I believe a native implementation should also have this property. Thank you for the great explanation! This makes se

Re: [PR] 48.0.1 [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #16755: URL: https://github.com/apache/datafusion/pull/16755#issuecomment-3079019727 What is the purpose of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-16 Thread via GitHub
Kontinuation commented on code in PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210716251 ## native/hdfs/src/object_store/hdfs.rs: ## @@ -88,19 +88,18 @@ impl HadoopFileSystem { fn read_range(range: &Range, file: &HdfsFile) -> Result {

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-16 Thread via GitHub
Kontinuation commented on code in PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210729874 ## native/hdfs/src/object_store/hdfs.rs: ## @@ -141,13 +140,15 @@ impl ObjectStore for HadoopFileSystem { let file_status = file.get_file_sta

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-16 Thread via GitHub
Kontinuation commented on PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#issuecomment-3079205816 > Sorry @Kontinuation if I check your references https://github.com/datafusion-contrib/fs-hdfs/blob/8c03c5ef0942b75abc79ed673931355fa9552131/c_src/libhdfs/hdfs.c#L1564C15-L1

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16790: URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210812065 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -128,12 +128,31 @@ physical_plan 06)--ProjectionExec: expr=[column1@0 as column1, colum

Re: [I] Integration tests are not being run [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on issue #16801: URL: https://github.com/apache/datafusion/issues/16801#issuecomment-3079240904 @kosiew could you take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[I] Integration tests are not being run [datafusion]

2025-07-16 Thread via GitHub
adriangb opened a new issue, #16801: URL: https://github.com/apache/datafusion/issues/16801 ### Describe the bug There's something wrong with [datafusion/core/tests/integration_tests/schema_adapter_integration_tests.rs](https://github.com/apache/datafusion/blob/main/datafusion/core/te

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
mbutrovich commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2210833524 ## native/spark-expr/src/nondetermenistic_funcs/randn.rs: ## @@ -0,0 +1,265 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-16 Thread via GitHub
mbutrovich commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3079257969 Comet makes increasing use of `SchemaAdapter`, but nothing you describe here sounds like a dealbreaker for Comet at first glance. I think we'd be able to make the necessary c

Re: [I] [Discussion]: show more info for `OutputRequirementExec` display [datafusion]

2025-07-16 Thread via GitHub
crepererum closed issue #16725: [Discussion]: show more info for `OutputRequirementExec` display URL: https://github.com/apache/datafusion/issues/16725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-16 Thread via GitHub
crepererum merged PR #16726: URL: https://github.com/apache/datafusion/pull/16726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
akupchinskiy commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2209889591 ## native/spark-expr/src/nondetermenistic_funcs/randn.rs: ## @@ -0,0 +1,265 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3077919308 > 👍 Just some nit for your review. Thank you @kosiew for review and suggestion, addressed in latest PR. -- This is an automated message from the Apache Git Service. To re

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on code in PR #16771: URL: https://github.com/apache/datafusion/pull/16771#discussion_r2209914828 ## datafusion/expr/src/expr_rewriter/mod.rs: ## @@ -250,7 +257,9 @@ fn coerce_exprs_for_schema( let new_type = dst_schema.field(idx).data_type();

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
akupchinskiy commented on PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#issuecomment-3077747073 > Thanks @akupchinskiy! I have a few questions. The big one for me is: does the seed state per partition match Spark's behavior, in particular the life cycle? If the seed g

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
akupchinskiy commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2209806943 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -2765,6 +2765,26 @@ class CometExpressionSuite extends CometTestBase with Adap

Re: [I] Unnest struct expression can't be aliased [datafusion]

2025-07-16 Thread via GitHub
xudong963 commented on issue #12794: URL: https://github.com/apache/datafusion/issues/12794#issuecomment-3077979105 I also met the issue, and supporting alias for unnest struct makes sense to me -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
xudong963 commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3077988985 After opening the DF50.0.0 release issue, you can add it to the list -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Optimize the join operators [datafusion]

2025-07-16 Thread via GitHub
mrpowers-wb commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078052376 @zhuqi-lucas - these benchmarks use Parquet files, see the querybench repo for the code: https://github.com/MrPowers/querybench. I think Parquet is a lot better for these b

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2210213790 ## datafusion/datasource-parquet/src/row_filter.rs: ## @@ -106,6 +106,8 @@ pub(crate) struct DatafusionArrowPredicate { rows_matched: metrics::Count, //

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2210208739 ## docs/source/library-user-guide/upgrading.md: ## @@ -120,6 +120,17 @@ SET datafusion.execution.spill_compression = 'zstd'; For more details about this config

Re: [PR] DataFusion 48.0.0 blog post [datafusion-site]

2025-07-16 Thread via GitHub
alamb commented on PR #84: URL: https://github.com/apache/datafusion-site/pull/84#issuecomment-3078340366 Thanks again @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] DataFusion 48.0.0 blog post [datafusion-site]

2025-07-16 Thread via GitHub
alamb commented on code in PR #84: URL: https://github.com/apache/datafusion-site/pull/84#discussion_r2210222605 ## content/blog/2025-07-16-datafusion-48.0.0.md: ## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +cat

Re: [PR] DataFusion 48.0.0 blog post [datafusion-site]

2025-07-16 Thread via GitHub
alamb merged PR #84: URL: https://github.com/apache/datafusion-site/pull/84 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusio

Re: [I] Blog post for the DataFusion 48 release [datafusion]

2025-07-16 Thread via GitHub
alamb closed issue #16757: Blog post for the DataFusion 48 release URL: https://github.com/apache/datafusion/issues/16757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on PR #16791: URL: https://github.com/apache/datafusion/pull/16791#issuecomment-3078342870 > In particular I am not sure about the intended behavior when they are both present If you have an expression adapter you map the expression and the expression is now evalua

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-07-16 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-3078354263 Marking as draft as we still plan more work. Thanks @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Update CI rules [datafusion-python]

2025-07-16 Thread via GitHub
timsaucer merged PR #1188: URL: https://github.com/apache/datafusion-python/pull/1188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: change Expr Alias ,OuterReferenceColumn, Column to Box type for reducing expr struct size [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3079295059 > > My guess is that some of the new slowdown / less predictability is due to many more `Box`es (and thus allocations) -- I suggest we reconsider Boxing frequently used structure

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-16 Thread via GitHub
akoshchiy commented on code in PR #16790: URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210860376 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -128,12 +128,31 @@ physical_plan 06)--ProjectionExec: expr=[column1@0 as column1, colu

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-16 Thread via GitHub
akoshchiy commented on code in PR #16790: URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210860376 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -128,12 +128,31 @@ physical_plan 06)--ProjectionExec: expr=[column1@0 as column1, colu

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3079371233 Thank you for chiming in! > We'll be able to start making the migration with the DF 49.0 release? Yes that's the plan. I'm trying to figure out the way to make the

Re: [PR] fix: hdfs read into buffer fully [datafusion-comet]

2025-07-16 Thread via GitHub
parthchandra commented on code in PR #2031: URL: https://github.com/apache/datafusion-comet/pull/2031#discussion_r2210968794 ## native/hdfs/src/object_store/hdfs.rs: ## @@ -88,19 +88,18 @@ impl HadoopFileSystem { fn read_range(range: &Range, file: &HdfsFile) -> Result {

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-16 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Addressing Question 1. The query plan for the original query: ```sql CREATE EXTERNAL TABLE example ( col_1 VARCHAR(50) NOT NULL, col_2 BIGINT NOT

Re: [I] Support `max_by` in Aggregation function [datafusion]

2025-07-16 Thread via GitHub
findepi commented on issue #12252: URL: https://github.com/apache/datafusion/issues/12252#issuecomment-3077344494 I'd love to see `min_by` & `max_by` within DataFusion repository. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Implement equals for stateful functions [datafusion]

2025-07-16 Thread via GitHub
findepi commented on code in PR #16781: URL: https://github.com/apache/datafusion/pull/16781#discussion_r2209548158 ## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ## @@ -217,6 +217,34 @@ impl ScalarUDFImpl for Simple0ArgsScalarUDF { fn invoke_with_a

Re: [PR] Implement equals for stateful functions [datafusion]

2025-07-16 Thread via GitHub
findepi commented on PR #16781: URL: https://github.com/apache/datafusion/pull/16781#issuecomment-3077368850 > I think it could be reduced in size somewhat by adding `aliases` to the default implementations. i'd rather see stuff like aliases and documentation not be part of ScalardUD

Re: [I] Restructure core codepaths to prevent stack overflows [datafusion]

2025-07-16 Thread via GitHub
gabotechs commented on issue #16788: URL: https://github.com/apache/datafusion/issues/16788#issuecomment-3077407611 This seems like a recurring problem specially when submitting Substrait plans to DataFusion. There was this other instance that was fixed by @fmonjalet some months ago https:/

Re: [I] [substrait] [sqllogictest] Cannot convert to Substrait [datafusion]

2025-07-16 Thread via GitHub
gabotechs commented on issue #16281: URL: https://github.com/apache/datafusion/issues/16281#issuecomment-3077327843 🤔 I'm not very familiar with that, the issue looks relatively old and things might have changed since then. If you manage to make it work adding support for `OuterRefer

Re: [PR] Add reproducing test cases for stackoverflows [datafusion]

2025-07-16 Thread via GitHub
gabotechs commented on code in PR #16787: URL: https://github.com/apache/datafusion/pull/16787#discussion_r2209559180 ## datafusion/substrait/tests/cases/deeply_nested_plan.rs: ## @@ -0,0 +1,104 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Fix: Preserve sorting for the COPY TO plan [datafusion]

2025-07-16 Thread via GitHub
bert-beyondloops commented on PR #16785: URL: https://github.com/apache/datafusion/pull/16785#issuecomment-3077507961 Thanks @alamb for the quick review 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-16 Thread via GitHub
aharpervc opened a new pull request, #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949 This PR is a followup ([ref](https://github.com/apache/datafusion-sqlparser-rs/pull/1937#issuecomment-3070806780)) to recent work on parsing without requiring semicolon statement del

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211279282 ## datafusion/core/tests/parquet/schema_adapter.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211280296 ## datafusion/datasource-parquet/src/source.rs: ## @@ -468,10 +468,50 @@ impl FileSource for ParquetSource { let projection = base_config .f

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3080077103 Here's an example from Comet: https://github.com/vaibhawvipul/datafusion-comet/blob/main/native/core/src/parquet/schema_adapter.rs. As you can see it's _a lot_ of code with [a

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211616561 ## datafusion/core/tests/parquet/schema_adapter.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
alamb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211267704 ## docs/source/library-user-guide/upgrading.md: ## @@ -120,6 +120,17 @@ SET datafusion.execution.spill_compression = 'zstd'; For more details about this configura

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-16 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files The above results were performed with the following setup: * `datafusion-cli -m 8G -d 50G --top-memory-consumers 25` * The default `datafusion.execution.par

[PR] fix: clean up iceberg integration APIs [datafusion-comet]

2025-07-16 Thread via GitHub
huaxingao opened a new pull request, #2032: URL: https://github.com/apache/datafusion-comet/pull/2032 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Expected: decimal(7,2), Found: DOUBLE when running TPC-DS benchmarks on Spark 3.5 [datafusion-comet]

2025-07-16 Thread via GitHub
parthchandra closed issue #2029: Expected: decimal(7,2), Found: DOUBLE when running TPC-DS benchmarks on Spark 3.5 URL: https://github.com/apache/datafusion-comet/issues/2029 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Expected: decimal(7,2), Found: DOUBLE when running TPC-DS benchmarks on Spark 3.5 [datafusion-comet]

2025-07-16 Thread via GitHub
parthchandra commented on issue #2029: URL: https://github.com/apache/datafusion-comet/issues/2029#issuecomment-3080492782 > One surprising thing to note is that Gluten and Blaze were working fine with the data containing the flag That is, in fact, quite surprising. Double is not a good

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
akupchinskiy commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2211570646 ## native/spark-expr/src/nondetermenistic_funcs/randn.rs: ## @@ -0,0 +1,265 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] feat: randn expression support [datafusion-comet]

2025-07-16 Thread via GitHub
akupchinskiy commented on code in PR #2010: URL: https://github.com/apache/datafusion-comet/pull/2010#discussion_r2211570646 ## native/spark-expr/src/nondetermenistic_funcs/randn.rs: ## @@ -0,0 +1,265 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[PR] Snowflake: CREATE USER [datafusion-sqlparser-rs]

2025-07-16 Thread via GitHub
yoavcloud opened a new pull request, #1950: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1950 Added support for the `CREATE USER` statement in Snowflake. Enhanced the KeyValueOptions struct with: 1. A custom delimiter 2. Optional parentheses 3. Optional keywords that

Re: [PR] fix: clean up iceberg integration APIs [datafusion-comet]

2025-07-16 Thread via GitHub
codecov-commenter commented on PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#issuecomment-3080720676 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2032?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: skip predicates on struct unnest in PushDownFilter [datafusion]

2025-07-16 Thread via GitHub
akoshchiy commented on code in PR #16790: URL: https://github.com/apache/datafusion/pull/16790#discussion_r2210860376 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -128,12 +128,31 @@ physical_plan 06)--ProjectionExec: expr=[column1@0 as column1, colu

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-16 Thread via GitHub
parthchandra commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3081890593 I feel it may be a fair amount of work in Comet to move from `SchemaAdapter` to `PhysicalExprAdapter` but from the pseudocode example it appears tractable. I think we'll be

Re: [I] Optimize the join operators [datafusion]

2025-07-16 Thread via GitHub
UBarney commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082082344 > [@UBarney](https://github.com/UBarney) - here are the 1e7 join results on my M3 Macbook with 16GB of RAM: @MrPowers I am using the **1e8** dataset. ``` target

Re: [I] [EPIC] TPC-H performance improvements [datafusion-comet]

2025-07-16 Thread via GitHub
comphead commented on issue #2004: URL: https://github.com/apache/datafusion-comet/issues/2004#issuecomment-3082214873 DF has similar work for q1-q4 H2O benchmarks https://github.com/apache/datafusion/issues/16710 -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Optimize the join operators [datafusion]

2025-07-16 Thread via GitHub
zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3082269096 > > [@UBarney](https://github.com/UBarney) - here are the 1e7 join results on my M3 Macbook with 16GB of RAM: > > [@MrPowers](https://github.com/MrPowers) I am using t

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-07-16 Thread via GitHub
2010YOUY01 commented on PR #16798: URL: https://github.com/apache/datafusion/pull/16798#issuecomment-3082274998 what about using explicit casting in applications? For example: ```sh > select not(arrow_cast(1, 'Boolean')); +--+ | NOT arrow_ca

[PR] Add example of custom file schema casting rules [datafusion]

2025-07-16 Thread via GitHub
adriangb opened a new pull request, #16803: URL: https://github.com/apache/datafusion/pull/16803 https://github.com/apache/datafusion/issues/16800#issuecomment-3080175396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Plan to replace `SchemaAdapter` with `PhysicalExprAdapter` [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on issue #16800: URL: https://github.com/apache/datafusion/issues/16800#issuecomment-3082294797 @parthchandra @mbutrovich please take a look at https://github.com/apache/datafusion/pull/16803. As per the comments in the example it looks like Comet already has a custom

[PR] chore: add tests for out of bounds for NullArray [datafusion]

2025-07-16 Thread via GitHub
comphead opened a new pull request, #16802: URL: https://github.com/apache/datafusion/pull/16802 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16187. ## Rationale for this change Add tests proving the issue is fixed after

Re: [PR] Restore custom SchemaAdapter functionality for Parquet [datafusion]

2025-07-16 Thread via GitHub
adriangb commented on code in PR #16791: URL: https://github.com/apache/datafusion/pull/16791#discussion_r2211108359 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -1095,4 +1124,167 @@ mod test { assert_eq!(num_batches, 0); assert_eq!(num_rows, 0);

  1   2   >