Re: [PR] Replace π-related bound constants with next_up/next_down [datafusion]

2025-07-21 Thread via GitHub
findepi commented on PR #16823: URL: https://github.com/apache/datafusion/pull/16823#issuecomment-3101335642 Current MSRV is 1.85.1. We need to wait until MSRV=1.86. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: enhance support for Decimal128 and Decimal256 [datafusion]

2025-07-21 Thread via GitHub
findepi commented on code in PR #16831: URL: https://github.com/apache/datafusion/pull/16831#discussion_r2221398992 ## datafusion/common/src/scalar/mod.rs: ## @@ -1790,6 +1808,27 @@ impl ScalarValue { (Self::Float64(Some(l)), Self::Float64(Some(r))) => {

Re: [PR] fix: clean up [iceberg] integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
huaxingao closed pull request #2032: fix: clean up [iceberg] integration APIs URL: https://github.com/apache/datafusion-comet/pull/2032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-21 Thread via GitHub
rishvin commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3101303544 Some Updates: I have a simple test to start with, which will produce `_groupingmapsort`. ``` val data = Seq( | Map("a" -> 1, "b" -> 2), | Map("a"

Re: [PR] Simplify try cast expr evaluation [datafusion]

2025-07-21 Thread via GitHub
findepi merged PR #16834: URL: https://github.com/apache/datafusion/pull/16834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] fix: clean up [iceberg] integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
huaxingao commented on code in PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#discussion_r2221390974 ## common/src/main/java/org/apache/comet/parquet/BatchReader.java: ## @@ -377,12 +374,15 @@ public void init() throws URISyntaxException, IOException {

Re: [PR] Fix integration tests not running [datafusion]

2025-07-21 Thread via GitHub
findepi commented on code in PR #16835: URL: https://github.com/apache/datafusion/pull/16835#discussion_r2221391075 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -68,6 +68,8 @@ pub trait SchemaAdapterFactory: Debug + Send + Sync + 'static { ) -> Box { self

Re: [I] CI: Check broken links in src doc comments [datafusion]

2025-07-21 Thread via GitHub
Adez017 commented on issue #16840: URL: https://github.com/apache/datafusion/issues/16840#issuecomment-3101266129 hi @2010YOUY01 i would try to validate the concerned issue and find out a way -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-21 Thread via GitHub
Iskander14yo commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3101264542 > I did not see this message when I reproduced the error. You can run `./benchmark.sh` (to make it faster, remove all queries from `queries.sql` except the one t

Re: [I] CI: Check broken links in src doc comments [datafusion]

2025-07-21 Thread via GitHub
Adez017 commented on issue #16840: URL: https://github.com/apache/datafusion/issues/16840#issuecomment-3101263663 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Update extending-operators.md [datafusion]

2025-07-21 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-3101253592 > Thanks @Adez017 > > Thank you for your patience. > > I think there are a few outstanding issues we should to resolve prior to merging this: > > 1. Remove the (n

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-21 Thread via GitHub
shehabgamin commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3101058153 > > Fyi, the [branch-49](https://github.com/apache/datafusion/tree/branch-49) has come out, we can do tests with it. > > [@alamb](https://github.com/alamb) When do you th

Re: [PR] feat(spark): implement Spark datetime function last_day [datafusion]

2025-07-21 Thread via GitHub
2010YOUY01 commented on code in PR #16828: URL: https://github.com/apache/datafusion/pull/16828#discussion_r2221174152 ## datafusion/sqllogictest/test_files/spark/datetime/last_day.slt: ## @@ -21,7 +21,80 @@ # For more information, please see: # https://github.com/apache/dat

Re: [PR] Adds script to detect breaking API changes/ semver [datafusion]

2025-07-21 Thread via GitHub
lucqui commented on code in PR #16541: URL: https://github.com/apache/datafusion/pull/16541#discussion_r2221148945 ## .github/workflows/dev.yml: ## @@ -35,6 +80,9 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 +with: Review Comment:

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221120019 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221120019 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
ding-young commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2221118843 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-21 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2220847542 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under

[PR] feat(spark): Implement Spark luhn_check function [datafusion]

2025-07-21 Thread via GitHub
Standing-Man opened a new pull request, #16848: URL: https://github.com/apache/datafusion/pull/16848 ## Which issue does this PR close? - Part of #15914. - Closes #16612 and the draft #16580. ## Rationale for this change WIP: adding the [suggestion](ht

Re: [I] QUALIFY clause [datafusion]

2025-07-21 Thread via GitHub
haohuaijin commented on issue #15485: URL: https://github.com/apache/datafusion/issues/15485#issuecomment-3100388769 Hi @alan910127, are you still working on this? If not, I would like to take over. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-21 Thread via GitHub
haohuaijin commented on PR #16762: URL: https://github.com/apache/datafusion/pull/16762#issuecomment-3100311649 Thanks for you reviews @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: clean up [iceberg] integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
parthchandra commented on code in PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#discussion_r2220720094 ## common/src/main/java/org/apache/comet/parquet/BatchReader.java: ## @@ -377,12 +374,15 @@ public void init() throws URISyntaxException, IOException {

Re: [PR] fix: clean up [iceberg] integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
parthchandra commented on PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#issuecomment-3100213965 > Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft [PR](https://github.com/apache/iceberg

Re: [PR] fix: clean up [iceberg] integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
huaxingao commented on PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#issuecomment-3100157773 Iceberg CI failed. We need to change the corresponding iceberg side to make Iceberg CI pass. The required changes are in this draft [PR](https://github.com/apache/iceberg/pull

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-21 Thread via GitHub
comphead commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3099957481 Created PR to DF 49 branch https://github.com/apache/datafusion/pull/16847 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] chore: use `equals_datatype` for `BinaryExpr` (#16813) [datafusion]

2025-07-21 Thread via GitHub
comphead opened a new pull request, #16847: URL: https://github.com/apache/datafusion/pull/16847 * chore: use `equals_datatype` instead of direct type comparison for `BinaryExpr` * chore: use `equals_datatype` instead of direct type comparison for `BinaryExpr` (cherry picked f

Re: [PR] Blog: Fix page overflow [datafusion-site]

2025-07-21 Thread via GitHub
timsaucer commented on PR #92: URL: https://github.com/apache/datafusion-site/pull/92#issuecomment-3099930315 Sorry I’m not available all week. I’ll try to catch up on things next Monday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-21 Thread via GitHub
comphead commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3099903377 Appreciate if https://github.com/apache/datafusion/pull/16813 can be included -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] fix: clean up [iceberg] integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
huaxingao commented on code in PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#discussion_r2220566792 ## common/src/main/java/org/apache/comet/parquet/BatchReader.java: ## @@ -183,9 +183,7 @@ public BatchReader( this.taskContext = TaskContext$.MODULE$.ge

Re: [PR] docs: Fix broken links [datafusion]

2025-07-21 Thread via GitHub
comphead merged PR #16839: URL: https://github.com/apache/datafusion/pull/16839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-21 Thread via GitHub
parthchandra commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3099856269 > It seems that Comet falls back to Spark execution (`Comet native execution is disabled due to: unsupported Spark partitioning: ArrayBuffer(PageViews#463L DESC NULLS

[I] Comet fails to run clickbench query [datafusion-comet]

2025-07-21 Thread via GitHub
parthchandra opened a new issue, #2038: URL: https://github.com/apache/datafusion-comet/issues/2038 ### Describe the bug One of the queries in the clickbench suite fails with ``` org.apache.comet.CometNativeException: InternalError: Native cast invoked for unsupported cast from

Re: [PR] DataFusion `49.0.0` release post [datafusion-site]

2025-07-21 Thread via GitHub
alamb commented on PR #91: URL: https://github.com/apache/datafusion-site/pull/91#issuecomment-3099799819 > Thanks for your help with this @alamb, sorry I haven't done more. I've been deep trying to figure out a performance issue I've been having and haven't had the spare time to put into t

Re: [PR] fix: clean up iceberg integration APIs [datafusion-comet]

2025-07-21 Thread via GitHub
parthchandra commented on code in PR #2032: URL: https://github.com/apache/datafusion-comet/pull/2032#discussion_r2220495577 ## common/src/main/java/org/apache/comet/parquet/BatchReader.java: ## @@ -183,9 +183,7 @@ public BatchReader( this.taskContext = TaskContext$.MODULE$

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220450277 ## src/parser/mod.rs: ## @@ -485,10 +505,10 @@ impl<'a> Parser<'a> { match self.peek_token().token { Token::EOF => brea

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220449729 ## src/parser/mod.rs: ## @@ -4541,6 +4561,18 @@ impl<'a> Parser<'a> { return Ok(vec![]); } +if end_token == Token::Se

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220448930 ## src/dialect/mssql.rs: ## @@ -123,6 +123,10 @@ impl Dialect for MsSqlDialect { true } +fn supports_statements_without_semicolon

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220436802 ## src/dialect/mod.rs: ## @@ -1136,6 +1142,11 @@ pub trait Dialect: Debug + Any { fn supports_notnull_operator(&self) -> bool { false

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220434233 ## src/test_utils.rs: ## @@ -186,6 +187,37 @@ impl TestedDialects { statements } +/// The same as [`statements_parse_to`] but it

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220430113 ## src/parser/mod.rs: ## @@ -16464,7 +16505,28 @@ impl<'a> Parser<'a> { /// Parse [Statement::Return] fn parse_return(&mut self) -> Resul

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220368348 ## src/parser/mod.rs: ## @@ -4541,6 +4561,18 @@ impl<'a> Parser<'a> { return Ok(vec![]); } +if end_token == Token::Se

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220366992 ## src/parser/mod.rs: ## @@ -266,6 +266,22 @@ impl ParserOptions { self.unescape = unescape; self } + +/// Set if semicolo

Re: [I] Improve performance on ClickBench [datafusion-comet]

2025-07-21 Thread via GitHub
Iskander14yo commented on issue #2035: URL: https://github.com/apache/datafusion-comet/issues/2035#issuecomment-3099044466 Thanks for the feedback! **On the failing query:** Appreciate the reminder, I had forgotten that Comet can use different readers. To avoid extra tuning, I’ll

Re: [PR] DataFusion `49.0.0` release post [datafusion-site]

2025-07-21 Thread via GitHub
Omega359 commented on PR #91: URL: https://github.com/apache/datafusion-site/pull/91#issuecomment-3099013704 Thanks for your help with this @alamb, sorry I haven't done more. I've been deep trying to figure out a performance issue I've been having and haven't had the spare time to put into

Re: [PR] DataFusion `49.0.0` release post [datafusion-site]

2025-07-21 Thread via GitHub
alamb commented on PR #91: URL: https://github.com/apache/datafusion-site/pull/91#issuecomment-3098990614 😅 -- ok I think I filled out the major content parts of this post. It needs: 1. More honing / review 2. Review / rerun the performance chart numbers -- This is an automated mess

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220351888 ## tests/sqlparser_common.rs: ## @@ -272,20 +275,39 @@ fn parse_insert_default_values() { "INSERT INTO test_table DEFAULT VALUES (some_colum

[I] Support for `IN $placeholder` syntax [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
ryanschneider opened a new issue, #1962: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1962 Some DBs support using placeholders for `IN` clauses in prepared statements. For example in DuckDB: ``` $ duckdb DuckDB v1.3.0 (Ossivalis) 71c5c07cdd Enter ".help" for

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220337141 ## src/test_utils.rs: ## @@ -186,6 +187,37 @@ impl TestedDialects { statements } +/// The same as [`statements_parse_to`] but it

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
aharpervc commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2220333518 ## src/parser/mod.rs: ## @@ -485,10 +505,10 @@ impl<'a> Parser<'a> { match self.peek_token().token { Token::EOF => brea

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-3098624432 I made a follow on PR to update the docs a bit: - https://github.com/apache/datafusion/pull/16846 This is so exciting -- This is an automated message from the Apache Git Ser

[PR] Improve async_udf example and docs [datafusion]

2025-07-21 Thread via GitHub
alamb opened a new pull request, #16846: URL: https://github.com/apache/datafusion/pull/16846 ## Which issue does this PR close? - A follow on to https://github.com/apache/datafusion/pull/14837 from @goldmedal ## Rationale for this change I was working on writin

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-21 Thread via GitHub
findepi commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2220229050 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1260,24 @@ pub fn collect_subquery_cols( }) } +#[macro_export] +macro_rules! udf_equals_hash { Review Comm

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-21 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files > If you could make a reproducer with synthetic data and file a ticket I would > be happy to look into this further I created a public Gist which you can

Re: [I] [Proposal] Support User-Defined Types (UDT) [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #7923: URL: https://github.com/apache/datafusion/issues/7923#issuecomment-3098064055 Let's continue the discussion on https://github.com/apache/datafusion/issues/12644 so closing this issue -- This is an automated message from the Apache Git Service. To respond to

Re: [I] [Proposal] Support User-Defined Types (UDT) [datafusion]

2025-07-21 Thread via GitHub
alamb closed issue #7923: [Proposal] Support User-Defined Types (UDT) URL: https://github.com/apache/datafusion/issues/7923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Blog: Fix page overflow [datafusion-site]

2025-07-21 Thread via GitHub
alamb commented on PR #92: URL: https://github.com/apache/datafusion-site/pull/92#issuecomment-3097994077 Thank you for the review @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add note to upgrade guide about MSRV update [datafusion]

2025-07-21 Thread via GitHub
comphead commented on code in PR #16845: URL: https://github.com/apache/datafusion/pull/16845#discussion_r2219982510 ## docs/source/library-user-guide/upgrading.md: ## @@ -24,6 +24,14 @@ **Note:** DataFusion `49.0.0` has not been released yet. The information provided in this

Re: [I] [datafusion-spark] Implement Spark `string` function `luhn_check` [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #16612: URL: https://github.com/apache/datafusion/issues/16612#issuecomment-3097961177 > Hi [@alamb](https://github.com/alamb) I saw there's a draft PR attached to this issue from two weeks ago, and it was just labeled as good first issue. I’d love to work on this!

Re: [I] [Bug] Aggregate + TopK fails when asc = false [datafusion]

2025-07-21 Thread via GitHub
avantgardnerio commented on issue #16837: URL: https://github.com/apache/datafusion/issues/16837#issuecomment-3097945249 Sorry about that @niebayes I've talked to my employer and we can schedule it for our next sprint. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Expand parse without semicolons [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
alamb commented on code in PR #1949: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1949#discussion_r2219942253 ## src/parser/mod.rs: ## @@ -266,6 +266,22 @@ impl ParserOptions { self.unescape = unescape; self } + +/// Set if semicolon st

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-21 Thread via GitHub
alamb commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2219930888 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1260,24 @@ pub fn collect_subquery_cols( }) } +#[macro_export] +macro_rules! udf_equals_hash { Review Commen

Re: [I] ScalarUDFImpl::equals default implementation is error-prone [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #16677: URL: https://github.com/apache/datafusion/issues/16677#issuecomment-3097906043 I think this idea sounds great -- as long as it is documented (which I think we can given your description) it will be great -- This is an automated message from the Apache Git

Re: [PR] feat(spark): implement Spark datetime function last_day [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #16828: URL: https://github.com/apache/datafusion/pull/16828#issuecomment-3097902331 > Hi @alamb, I’ve added the `last_day` function. However, running `cargo test --test sqllogictests -- spark` produces some errors. I’m looking into it, but please let me know if you ha

Re: [I] [Bug] Aggregate + TopK fails when asc = false [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #16837: URL: https://github.com/apache/datafusion/issues/16837#issuecomment-3097900047 Thanks @niebayes -- I agree this sounds like a real bug I don't think the PrimitiveHeap code has been changed for quite a while. It might be an excellent opportunity to rev

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #14886: URL: https://github.com/apache/datafusion/issues/14886#issuecomment-3097894189 Closed -- thanks @nssalian and @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Code clean for new datafusion-cli streaming printing logic [datafusion]

2025-07-21 Thread via GitHub
alamb closed issue #14886: Code clean for new datafusion-cli streaming printing logic URL: https://github.com/apache/datafusion/issues/14886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Physical plan pushdown for volatile predicates [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #16545: URL: https://github.com/apache/datafusion/issues/16545#issuecomment-3097890993 > I expect the physical plan optimiser doesn't perform pushdown of volatile predicates. I am not sure -- does this result in wrong results? It does make sense in gen

Re: [I] Move code in `user_defined_plan.rs` to the `extending-operators` doc [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #15774: URL: https://github.com/apache/datafusion/issues/15774#issuecomment-3097884303 Note there is a PR here - https://github.com/apache/datafusion/pull/15832 I think this ticket is largely a documentation exercise in writing docs and will be tough for

Re: [PR] Bump the MSRV to `1.85.1` due to transitive dependencies (`aws-sdk`) [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #16728: URL: https://github.com/apache/datafusion/pull/16728#issuecomment-3097875684 I hit this during the delta-rs upgrade, so I made a PR proposing a note in the upgrade guide: - https://github.com/apache/datafusion/pull/16845 -- This is an automated message fr

[PR] Add note to upgrade guide about MSRV update [datafusion]

2025-07-21 Thread via GitHub
alamb opened a new pull request, #16845: URL: https://github.com/apache/datafusion/pull/16845 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/16235 ## Rationale for this change - While working on the upgrade to delt

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3097813946 > Fyi, the [branch-49](https://github.com/apache/datafusion/tree/branch-49) has come out, we can do tests with it. > > [@alamb](https://github.com/alamb) When do you think w

Re: [PR] 48.0.1 [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #16755: URL: https://github.com/apache/datafusion/pull/16755#issuecomment-3097775352 Marking as a draft as i don't think this is waiting on review and I am trying to go through the review backlog -- This is an automated message from the Apache Git Service. To respond

Re: [I] Regression: `DataFrameWriteOptions::with_single_file_output` produces a directory [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #13323: URL: https://github.com/apache/datafusion/issues/13323#issuecomment-3097701608 > [@alamb](https://github.com/alamb) , is this issue still valid? I am not sure @nssalian -- could you try reproducing the description and see if it still happens? -- Th

[PR] fix(build-wasm): put `arrow-ipc/zstd` dep under `compression` feature… [datafusion]

2025-07-21 Thread via GitHub
chrisvander opened a new pull request, #16844: URL: https://github.com/apache/datafusion/pull/16844 … flag ## Which issue does this PR close? - Closes #16843. ## Rationale for this change `zstd` requires Clang and native feature sets, and for `wasm

Re: [I] Dependency conflict with rquest due to async-compression and xz2 linking to lzma [datafusion]

2025-07-21 Thread via GitHub
alamb commented on issue #15342: URL: https://github.com/apache/datafusion/issues/15342#issuecomment-3097672323 Sounds like we should move to the non abandoned crate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2025-07-21 Thread via GitHub
comphead commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-3097559543 For `cargo check` it might be useful to start with `RUSTFLAGS="-Ztime-passes" cargo +nightly check -Zunstable-options` to see what step most of the time -- This is an automa

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-21 Thread via GitHub
codecov-commenter commented on PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#issuecomment-3097399284 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2037?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add benchmark for ByteViewGroupValueBuilder [datafusion]

2025-07-21 Thread via GitHub
alamb merged PR #16826: URL: https://github.com/apache/datafusion/pull/16826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-21 Thread via GitHub
alamb commented on code in PR #16762: URL: https://github.com/apache/datafusion/pull/16762#discussion_r2219635481 ## datafusion/physical-expr/src/utils/guarantee.rs: ## @@ -824,13 +894,87 @@ mod test { ); } +#[test] +fn test_disjunction_and_conjunction_mu

Re: [PR] docs: Fix broken links [datafusion]

2025-07-21 Thread via GitHub
jcsherin commented on code in PR #16839: URL: https://github.com/apache/datafusion/pull/16839#discussion_r2219633911 ## datafusion/datasource/src/mod.rs: ## @@ -102,9 +102,9 @@ pub struct PartitionedFile { /// You may use [`wrap_partition_value_in_dict`] to wrap them if you

Re: [PR] docs: Fix broken links [datafusion]

2025-07-21 Thread via GitHub
jcsherin commented on PR #16839: URL: https://github.com/apache/datafusion/pull/16839#issuecomment-3097357784 I've confirmed the URLs you fixed were returning 404. I used the [lychee cli link checker](https://lychee.cli.rs/) which flagged them with a `404 Not Found` error. Unlike `ca

[I] Clang requirement when building for WebAssembly, `cc-rs` through `zstd`, fails [datafusion]

2025-07-21 Thread via GitHub
chrisvander opened a new issue, #16843: URL: https://github.com/apache/datafusion/issues/16843 ### Describe the bug [This commit](https://github.com/apache/datafusion/commit/3c4e39ac0cf83bd8ead45722a5873bac731b53f1) introduces a non-optional dependency on `zstd`, which in turn relies

Re: [PR] fix: Refactor arithmetic serde and fix correctness issues with EvalMode::TRY [datafusion-comet]

2025-07-21 Thread via GitHub
mbutrovich merged PR #2018: URL: https://github.com/apache/datafusion-comet/pull/2018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219474642 ## datafusion/physical-plan/src/spill/get_size.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] SGA-11419 Added snowflake ability for if not exists after create view… [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
etgarperets commented on code in PR #1961: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1961#discussion_r2219392329 ## tests/sqlparser_common.rs: ## @@ -16183,3 +16190,21 @@ fn test_identifier_unicode_start() { ]); let _ = dialects.verified_stmt(sql);

Re: [I] ScalarUDFImpl::equals default implementation is error-prone [datafusion]

2025-07-21 Thread via GitHub
findepi commented on issue #16677: URL: https://github.com/apache/datafusion/issues/16677#issuecomment-3097012088 https://github.com/apache/datafusion/pull/16842 is a POC how we can convert hand-written equals, hash_value implementations into `PartialEq` and `Hash` traits, which would be go

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-21 Thread via GitHub
findepi commented on PR #16842: URL: https://github.com/apache/datafusion/pull/16842#issuecomment-3097009055 This is POC only for now. @alamb @kosiew @timsaucer PTAL and let me know if you agree with the direction. Note the PR goals: - reduce complexity of existing code, by

[PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-21 Thread via GitHub
findepi opened a new pull request, #16842: URL: https://github.com/apache/datafusion/pull/16842 Reduce boilerplate in cases where implementation of `{ScalarUDFImpl,AggregateUDFImpl,WindowUDFImpl}::{equals,hash_code}` can be derived using standard `PartialEq` and `Hash` traits. This i

Re: [PR] Snowflake: Support IDENTIFIER for GRANT ROLE [datafusion-sqlparser-rs]

2025-07-21 Thread via GitHub
yoavcloud commented on code in PR #1957: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1957#discussion_r2219249158 ## src/ast/mod.rs: ## @@ -6623,7 +6623,7 @@ pub enum Action { Replicate, ResolveAll, Role { -role: Ident, +role: Objec

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3096759238 https://github.com/user-attachments/assets/f5c33e12-a175-495b-b547-e7a5be94e855"; /> And the various [benchmarks](https://github.com/apache/datafusion/pull/16711#issuecomment-30

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2025-07-21 Thread via GitHub
findepi commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-3096736712 Sometimes different feature flags are result of different crates depending on different features. In such case, having workspace-level definition of features is a solution, as @

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219148029 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -125,6 +133,156 @@ impl SpillManager { self.spill_record_batch_and_finish(&batches, request

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219057789 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219143884 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219141893 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

[PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-21 Thread via GitHub
akupchinskiy opened a new pull request, #2037: URL: https://github.com/apache/datafusion-comet/pull/2037 ## Which issue does this PR close? Closes #. ## Rationale for this change Added support for spark_partition_id and monotonically_increasing_id expressions

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219098075 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -125,6 +133,156 @@ impl SpillManager { self.spill_record_batch_and_finish(&batches, request

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-21 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2219057789 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,345 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3096469193 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_pushdown Benchmark clickbench_pushdown.json

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-21 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3096409536 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

  1   2   >