Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-11 Thread via GitHub
UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -828,13 +833,127 @@ impl NestedLoopJoinStream { handle_state!(self.process_pr

[PR] MSSQL: Add support for EXEC output and default keywords [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
yoavcloud opened a new pull request, #1940: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1940 Added support for parsing the OUTPUT and DEFAULT keywords in MSSQL when calling a stored procedure. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-11 Thread via GitHub
UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -828,13 +833,127 @@ impl NestedLoopJoinStream { handle_state!(self.process_pr

Re: [PR] Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
zhuqi-lucas commented on PR #16751: URL: https://github.com/apache/datafusion/pull/16751#issuecomment-3064694991 > > Thank you @Rachelint , it seems no performance improvement for clickbench from my local Mac benchmark, i need to investigate further. > > Yes... I made a simple try onl

[PR] Fix broken link in development.md [datafusion-comet]

2025-07-11 Thread via GitHub
petern48 opened a new pull request, #2024: URL: https://github.com/apache/datafusion-comet/pull/2024 ## Which issue does this PR close? Closes #2023 ## Rationale for this change Broken link. ## What changes are included in this PR? - Swapped the `[

[I] Docs: Link not rendering in development.md [datafusion-comet]

2025-07-11 Thread via GitHub
petern48 opened a new issue, #2023: URL: https://github.com/apache/datafusion-comet/issues/2023 https://github.com/user-attachments/assets/95ca66d1-0e92-4029-aff2-7ba0a1bf06a8"; /> https://datafusion.apache.org/comet/contributor-guide/development.html -- This is an automated message fr

Re: [PR] Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
Rachelint commented on PR #16751: URL: https://github.com/apache/datafusion/pull/16751#issuecomment-3064694448 > Thank you @Rachelint , it seems no performance improvement for clickbench from my local Mac benchmark, i need to investigate further. Yes... I did a simple try only in appe

Re: [PR] Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
zhuqi-lucas commented on PR #16751: URL: https://github.com/apache/datafusion/pull/16751#issuecomment-3064692416 Thank you @Rachelint , it seems no performance improvement for clickbench from my Mac benchmark, i need to investigate further. -- This is an automated message from the Apache

Re: [PR] Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
Rachelint commented on PR #16751: URL: https://github.com/apache/datafusion/pull/16751#issuecomment-3064683673 Thanks @zhuqi-lucas ! How about the benchmark result? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: Add JNI-based Hadoop FileSystem support for S3 and other Hadoop-compatible stores [datafusion-comet]

2025-07-11 Thread via GitHub
Kontinuation commented on PR #1992: URL: https://github.com/apache/datafusion-comet/pull/1992#issuecomment-306422 There are some problems with the approach of using fs-hdfs (libhdfs). ### Problems 1. Linking against libjvm.so As we discovered before, using fs-hd

Re: [PR] fix : cast_operands_to_double_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-11 Thread via GitHub
coderfender commented on PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#issuecomment-3064547421 The issue now seems to be one of correctness. Spark (Scala/ JVM) wrap up the division result to Long.MIN_VALUE while Rust / Datafusion return back Long.MAX_VALUE+1 which is

[PR] 48.0.1 [datafusion]

2025-07-11 Thread via GitHub
matthewmturner opened a new pull request, #16755: URL: https://github.com/apache/datafusion/pull/16755 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] chore: Make `GroupValues` and APIs on `PhysicalGroupBy` aggregation APIs public [datafusion]

2025-07-11 Thread via GitHub
haohuaijin commented on code in PR #16733: URL: https://github.com/apache/datafusion/pull/16733#discussion_r2196439508 ## datafusion/physical-plan/src/aggregates/group_values/mod.rs: ## @@ -121,13 +121,15 @@ pub(crate) trait GroupValues: Send { /// will be chosen. ///

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-11 Thread via GitHub
timsaucer commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3064351821 I agree with your assessment. I am starting to think your original suggestion was the correct one. I'm sorry I took a detour in the above approach. I think

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on code in PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#discussion_r2202060728 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -0,0 +1,293 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on code in PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#discussion_r2202060380 ## spark/src/main/scala/org/apache/comet/serde/arithmetic.scala: ## @@ -0,0 +1,293 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

[PR] ensure MemTable has at least one partition [datafusion]

2025-07-11 Thread via GitHub
waynexia opened a new pull request, #16754: URL: https://github.com/apache/datafusion/pull/16754 ## Which issue does this PR close? - Related to https://github.com/datafusion-contrib/datafusion-postgres/pull/108. ## Rationale for this change When creating

Re: [I] [EPIC] A collection of items to improve developer / CI speed [datafusion]

2025-07-11 Thread via GitHub
blaginin commented on issue #13813: URL: https://github.com/apache/datafusion/issues/13813#issuecomment-3064077568 Added cache to CI runners to get some speedup: https://github.com/apache/datafusion/pull/16709 Also pinged the infra team if we can get larger runners: https://issues.ap

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
rishvin commented on PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#issuecomment-3064076095 > @rishvin, This PR is still a draft, but could you review the changes to the `remainder` code to ensure I didn't miss anything from your changes? Thanks @andygrove for th

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-11 Thread via GitHub
colinmarc commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3064059046 I explored the solution space a bit today, and I don't think this problem is really solvable with the APIs as they currently exist. Just to be clear about what is

Re: [PR] docs: Remove legacy comment in docs [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove merged PR #2022: URL: https://github.com/apache/datafusion-comet/pull/2022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-11 Thread via GitHub
colinmarc commented on PR #16750: URL: https://github.com/apache/datafusion/pull/16750#issuecomment-3063952865 I added a test! Let me know if that seems like enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
Dandandan commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063915741 Also, referencing the direct indexing / perfect hash join here. I think that should be relatively simple to implement. https://github.com/duckdb/duckdb/pull/1959 #816

Re: [PR] Per file filter evaluation [datafusion]

2025-07-11 Thread via GitHub
adriangb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2201901299 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
jonathanc-n commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063872693 Another thing we can do is hash it once and use parts of the hash at a time during `RepartitionExec` and building the hashtable. This is made even better with having to do a

Re: [PR] Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16668: URL: https://github.com/apache/datafusion/pull/16668#issuecomment-3063821753 I merged up from main to rerun the CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3063812572 Thank you @XiangpengHao FYI @NGA-TRAN and @LiaCastaneda -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
alamb commented on code in PR #16744: URL: https://github.com/apache/datafusion/pull/16744#discussion_r2201840244 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1739,7 +1739,7 @@ async fn roundtrip_physical_plan_node() { } // Failing due to https://github

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
alamb commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2201829283 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<()>

Re: [PR] Per file filter evaluation [datafusion]

2025-07-11 Thread via GitHub
alamb commented on code in PR #15057: URL: https://github.com/apache/datafusion/pull/15057#discussion_r2201812153 ## datafusion-examples/examples/default_column_values.rs: ## @@ -0,0 +1,366 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Avoid explicit cast during execution in `corr` aggregate function [datafusion]

2025-07-11 Thread via GitHub
alamb closed issue #13721: Avoid explicit cast during execution in `corr` aggregate function URL: https://github.com/apache/datafusion/issues/13721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Perform type coercion for corr aggregate function [datafusion]

2025-07-11 Thread via GitHub
alamb merged PR #15776: URL: https://github.com/apache/datafusion/pull/15776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Perform type coercion for corr aggregate function [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #15776: URL: https://github.com/apache/datafusion/pull/15776#issuecomment-3063767365 Thanks again @kumarlokesh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-11 Thread via GitHub
corwinjoy commented on code in PR #16738: URL: https://github.com/apache/datafusion/pull/16738#discussion_r2201805504 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1723,28 +1708,47 @@ async fn output_single_parquet_file_parallelized( let (serialize_tx, seriali

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
Dandandan commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063754335 > > is it using a hash table or open addressing (df doesn't have the latter) > > [@XiangpengHao](https://github.com/XiangpengHao) has mentioned several times that we thi

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
alamb commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063749417 > is it using a hash table or open addressing (df doesn't have the latter) @XiangpengHao has mentioned several times that we think DuckDB uses radix trees (which work l

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3063739024 > * Doing (some of the) merge algorithm itself in parallel - I am not sure what would the best way forward here, but it seems it could give the largest gains, as merging is currently d

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-11 Thread via GitHub
corwinjoy commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3063738684 Looks good to me, with the exception of multi-row group writing being missing. When you go to rebase to the latest datafusion the diff should get a lot simpler since they have upgr

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16732: URL: https://github.com/apache/datafusion/pull/16732#issuecomment-3063730981 Thanks again @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Refactor filter pushdown APIs to enable joins to pass through filters [datafusion]

2025-07-11 Thread via GitHub
alamb merged PR #16732: URL: https://github.com/apache/datafusion/pull/16732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-07-11 Thread via GitHub
corwinjoy commented on code in PR #16738: URL: https://github.com/apache/datafusion/pull/16738#discussion_r2201788873 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -1571,12 +1564,14 @@ fn spawn_parquet_parallel_serialization_task( let max_row_group_rows = w

Re: [D] DISCUSSION: DataFusion Meetup in Boston, USA [datafusion]

2025-07-11 Thread via GitHub
GitHub user alamb added a comment to the discussion: DISCUSSION: DataFusion Meetup in Boston, USA Sounds good -- when you get a chance perhaps you can create a luma event (or some other signup of your choosing) so we can start advertising / starting speakers Here is an example (this fall NYC

Re: [I] Optimized spill file format [datafusion]

2025-07-11 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-3063721375 > Yes, I think so. Of course, there's still room to seek further performance optimizations, but for now: Indeed -we can always make the code better :) -- This is an a

Re: [I] Optimized spill file format [datafusion]

2025-07-11 Thread via GitHub
alamb commented on issue #14078: URL: https://github.com/apache/datafusion/issues/14078#issuecomment-3063722047 Thanks again @ding-young -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Optimized spill file format [datafusion]

2025-07-11 Thread via GitHub
alamb closed issue #14078: Optimized spill file format URL: https://github.com/apache/datafusion/issues/14078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3063612959 I didn't have permissions, but I could make a request which I did: https://github.com/user-attachments/assets/3884c68a-4af5-4e94-94f1-59f9f5d102dc"; /> -- This is an autom

Re: [PR] Remove parquet_filter and parquet `sort` benchmarks [datafusion]

2025-07-11 Thread via GitHub
alamb merged PR #16730: URL: https://github.com/apache/datafusion/pull/16730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove parquet_filter and parquet `sort` benchmarks [datafusion]

2025-07-11 Thread via GitHub
alamb commented on PR #16730: URL: https://github.com/apache/datafusion/pull/16730#issuecomment-3063605329 Thank you for the review @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add support for `+` char in Snowflake stage names [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
alamb commented on PR #1935: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1935#issuecomment-3063600101 A code machine! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] docs: Remove legacy comment in docs [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new pull request, #2022: URL: https://github.com/apache/datafusion-comet/pull/2022 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/2016 ## Rationale for this change Follow on from https://gith

Re: [D] How does Comet compare to Gluten? Are there any plans to integrate with Gluten? [datafusion-comet]

2025-07-11 Thread via GitHub
GitHub user andygrove added a comment to the discussion: How does Comet compare to Gluten? Are there any plans to integrate with Gluten? We now have a guide comparing Comet and Gluten as part of our documentation: https://datafusion.apache.org/comet/user-guide/gluten_comparison.html GitHub l

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#issuecomment-3063567453 @rishvin, This PR is still a draft, but could you review the changes to the `remainder` code to ensure I didn't miss anything from your changes? -- This is an automated mess

[PR] WIP: Update `object_store` 0.12.3 [datafusion]

2025-07-11 Thread via GitHub
alamb opened a new pull request, #16753: URL: https://github.com/apache/datafusion/pull/16753 ## Which issue does this PR close? - Related to https://github.com/apache/arrow-rs-object-store/issues/428 - Closes #. ## Rationale for this change Keep up with dependencies

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-07-11 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3063537883 Should be able to open Comet's PR after https://github.com/apache/datafusion-comet/issues/1993 is closed. -- This is an automated message from the Apache Git Service. To

Re: [PR] minor: Refactor to move some shuffle-related logic from `QueryPlanSerde` to `CometExecRule` [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich merged PR #2015: URL: https://github.com/apache/datafusion-comet/pull/2015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Improve documentation publishing to avoid maintaining separate template files [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich closed issue #2016: Improve documentation publishing to avoid maintaining separate template files URL: https://github.com/apache/datafusion-comet/issues/2016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] chore: Improve process for generating dynamic content into documentation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich merged PR #2017: URL: https://github.com/apache/datafusion-comet/pull/2017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
ryanschneider commented on PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#issuecomment-3063425629 @iffyio I went with the new ParserState::ColumnDefinition idea mentioned here: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2200796876 I

Re: [PR] docs: Add guide showing comparison between Comet and Gluten [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove merged PR #2012: URL: https://github.com/apache/datafusion-comet/pull/2012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] docs: Add guide showing comparison between Comet and Gluten [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove commented on PR #2012: URL: https://github.com/apache/datafusion-comet/pull/2012#issuecomment-3063409220 Thanks for the review @kazuyukitanimura. I will go ahead and merge this as a starting point for this content. I am sure we will add more to it soon. -- This is an automated

[PR] add filter to handle backtrace [datafusion]

2025-07-11 Thread via GitHub
geetanshjuneja opened a new pull request, #16752: URL: https://github.com/apache/datafusion/pull/16752 ## Which issue does this PR close? - Closes #16146. ## Rationale for this change To run datafusion-cli tests with backtrace=1 ## What changes are incl

Re: [I] Add ANSI support for Remainder [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich closed issue #532: Add ANSI support for Remainder URL: https://github.com/apache/datafusion-comet/issues/532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich merged PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2201477659 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_

[I] try_ arithmetic functions return incorrect results [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new issue, #2021: URL: https://github.com/apache/datafusion-comet/issues/2021 ### Describe the bug As part of exploring writing unit tests for serde code in https://github.com/apache/datafusion-comet/issues/2020, I discovered that we currently have incorrect behavi

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
rishvin commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2201439622 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_fro

[I] Replace configure_me with maintained alternative [datafusion-ballista]

2025-07-11 Thread via GitHub
milenkovicm opened a new issue, #1281: URL: https://github.com/apache/datafusion-ballista/issues/1281 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `configure_me` dependency does not look maintained, it is a blocker to updat

Re: [I] Optimize the join operators [datafusion]

2025-07-11 Thread via GitHub
Dandandan commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3063067577 Besides profiling, I would like to suggest to research how the other engines are running the join and extract some high level learnings out of it: * is it using a hash t

Re: [PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-11 Thread via GitHub
ozankabak commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3062949373 Thanks for taking a look at this. A cursory look suggests when a strict inequality is being propagated, if the next value of other side's lower bound is greater than the uppe

[PR] Snowflake create database [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
osipovartem opened a new pull request, #1939: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1939 https://docs.snowflake.com/en/sql-reference/sql/create-database Added support for ```sql CREATE [ OR REPLACE ] [ TRANSIENT ] DATABASE [ IF NOT EXISTS ] [ CLONE

[I] Add support for Snowflake CREATE DATABASE [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
osipovartem opened a new issue, #1938: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1938 https://docs.snowflake.com/en/sql-reference/sql/create-database -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] Implement unit tests for serde logic [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new issue, #2020: URL: https://github.com/apache/datafusion-comet/issues/2020 ### What is the problem the feature request solves? We currently rely on end-to-end integration tests to ensure that expressions are serialized correctly. This has generally been ok, but w

Re: [PR] Benchmark for char expression [datafusion]

2025-07-11 Thread via GitHub
ajita-asthana commented on PR #16743: URL: https://github.com/apache/datafusion/pull/16743#issuecomment-3062843109 Thanks @comphead I will fix the linux build failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3062840649 A little bit more investigation show that some of the non-determinism is introduced by hashset, so we probably also want to change how we compare plans. -- This is an automate

Re: [PR] Benchmark for char expression [datafusion]

2025-07-11 Thread via GitHub
comphead commented on PR #16743: URL: https://github.com/apache/datafusion/pull/16743#issuecomment-3062760181 I changed PR header to `related` instead of `closed` as the #16009 expects both bench and optimization implementation which I believe is another your PR https://github.com/apache/da

Re: [I] Optimize performance of `ByteViewGroupValueBuilder` on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
zhuqi-lucas commented on issue #16330: URL: https://github.com/apache/datafusion/issues/16330#issuecomment-3062751769 @Dandandan @Rachelint I submit a PR try to experiment this to see the performance gain or loss. -- This is an automated message from the Apache Git Service. To respond t

[PR] Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views [datafusion]

2025-07-11 Thread via GitHub
zhuqi-lucas opened a new pull request, #16751: URL: https://github.com/apache/datafusion/pull/16751 ## Which issue does this PR close? Optimize following cases, and add more fast path. do_append_val_inner do_equal_to_inner This is wasteful if there is no data buffe

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062721623 > @XiangpengHao: If you believe the round-trip bug reproduced in `test_round_trip_tpch_queries` from PR #16742 is distinct, we can file a separate issue and tackle it independen

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062727458 > > @adriangb can you take a look if this is the right way to fix it? > > I took an initial look and... I'm a bit stumped. I don't fully understand where this is running o

Re: [I] Add support for StringDecode in Spark 4.0.0 [datafusion-comet]

2025-07-11 Thread via GitHub
peter-toth commented on issue #1942: URL: https://github.com/apache/datafusion-comet/issues/1942#issuecomment-3062725854 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
XiangpengHao commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3062716781 I checked #16744 with this test, and confirm that most tests still fails. A closer look at this show that it's mostly due to the field "human_display", the deserialized on

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2200966491 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_

Re: [PR] chore: Introduce ANSI support for remainder operation [datafusion-comet]

2025-07-11 Thread via GitHub
mbutrovich commented on code in PR #1971: URL: https://github.com/apache/datafusion-comet/pull/1971#discussion_r2200966491 ## native/spark-expr/src/comet_scalar_funcs.rs: ## @@ -53,13 +54,23 @@ macro_rules! make_comet_scalar_udf { ); Ok(Arc::new(ScalarUDF::new_

[PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-11 Thread via GitHub
colinmarc opened a new pull request, #16750: URL: https://github.com/apache/datafusion/pull/16750 Currently, only instances of `TableProvider` are considered by `LogicalExtensionCodec`, and are automatically wrapped in a `DefaultTableSource` when deserializing. That doesn't work with custom

[I] Serializing custom `TableSource` implementations fails [datafusion]

2025-07-11 Thread via GitHub
colinmarc opened a new issue, #16749: URL: https://github.com/apache/datafusion/issues/16749 ### Describe the bug `LogicalExtensionCodec` allows providing a custom serialization strategy for a `TableProvider`, but the calling code always expects a `DefaultTableSource` to unwrap:

Re: [PR] minor: Refactor arithmetic serde into separate classes [datafusion-comet]

2025-07-11 Thread via GitHub
codecov-commenter commented on PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018#issuecomment-3062533657 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2018?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062503974 @XiangpengHao: If you believe the round-trip bug reproduced in `test_round_trip_tpch_queries` from PR #16742 is distinct, we can file a separate issue and tackle it independently. @

[I] [EPIC] Refactor all expression serde logic out of `QueryPlanSerde` [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new issue, #2019: URL: https://github.com/apache/datafusion-comet/issues/2019 ### What is the problem the feature request solves? The `QueryPlanSerde.exprToProtoInternal` method contains logic for serializing Spark expressions to protocol buffer format and also cont

[PR] minor: Refactor arithmetic serde into separate classes [datafusion-comet]

2025-07-11 Thread via GitHub
andygrove opened a new pull request, #2018: URL: https://github.com/apache/datafusion-comet/pull/2018 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] DuckDB, Postgres, SQLite: NOT NULL and NOTNULL expressions [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
ryanschneider commented on code in PR #1927: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1927#discussion_r2200796876 ## src/dialect/mod.rs: ## @@ -1076,6 +1088,15 @@ pub trait Dialect: Debug + Any { fn supports_comma_separated_drop_column_list(&self) -> boo

Re: [I] Bug: `make_date(year, month, day)` reports error if one of the fileds is NULL [datafusion]

2025-07-11 Thread via GitHub
Omega359 commented on issue #16746: URL: https://github.com/apache/datafusion/issues/16746#issuecomment-3062326009 Interesting. I would expect a db to error for that, unlike something like spark which I would expect to be lenient (if not in ansi/safe mode). I believe it should be a fairly e

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2200734851 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on code in PR #16742: URL: https://github.com/apache/datafusion/pull/16742#discussion_r2198593774 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -1780,3 +1780,111 @@ async fn test_tpch_part_in_list_query_with_real_parquet_data() -> Result<(

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
adriangb commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062283423 > @adriangb can you take a look if this is the right way to fix it? I took an initial look and... I'm a bit stumped. I don't fully understand where this is running or how. Wha

Re: [PR] Support optional semicolon between statements [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio merged PR #1937: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-11 Thread via GitHub
Loaki07 commented on PR #16726: URL: https://github.com/apache/datafusion/pull/16726#issuecomment-3062153659 Looks like the ci is unable to install `sudo apt-get install -y protobuf-compiler` -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on PR #16744: URL: https://github.com/apache/datafusion/pull/16744#issuecomment-3062022168 Thanks @XiangpengHao for the fix. Could you also run the tests [in this PR](https://github.com/apache/datafusion/pull/16742)? The deserialization bug only happens to 1 tpc-h queries

Re: [PR] Add serialization/deserialization and round-trip tests for all tpc-h queries [datafusion]

2025-07-11 Thread via GitHub
NGA-TRAN commented on PR #16742: URL: https://github.com/apache/datafusion/pull/16742#issuecomment-3062011181 > Just out of curiosity, do you know if the issue is that an specific node can't serialize? Some info and fix: - https://github.com/apache/datafusion/issues/16665#issuecomm

Re: [PR] Added unquoted identifiers unicode support for mySql, postgreSqp, als… [datafusion-sqlparser-rs]

2025-07-11 Thread via GitHub
iffyio commented on code in PR #1933: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1933#discussion_r2200438892 ## tests/sqlparser_common.rs: ## @@ -15895,3 +15895,11 @@ fn parse_create_procedure_with_parameter_modes() { _ => unreachable!(), } } + +

Re: [PR] DataFusion 47.0.0 blog post [datafusion-site]

2025-07-11 Thread via GitHub
alamb commented on PR #83: URL: https://github.com/apache/datafusion-site/pull/83#issuecomment-3061841186 The blog is live! https://datafusion.apache.org/blog/2025/07/11/datafusion-47.0.0/ -- This is an automated message from the Apache Git Service. To respond to the message, please log o

  1   2   >