[PR] Allow to customize the default null ordering [datafusion]

2025-07-29 Thread via GitHub
goldmedal opened a new pull request, #16963: URL: https://github.com/apache/datafusion/pull/16963 ## Which issue does this PR close? - Closes #. ## Rationale for this change The default null ordering is much different in different database. In Postgres, the null va

Re: [PR] Support for colon preceeded placeholders [datafusion-sqlparser-rs]

2025-07-29 Thread via GitHub
iffyio commented on code in PR #1979: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1979#discussion_r2239586088 ## src/tokenizer.rs: ## @@ -1756,6 +1761,30 @@ impl<'a> Tokenizer<'a> { } } +/// Tokenizes an identifier followed immediately after

Re: [I] Add dictionary support to unhex and test dictionary and scalar cases [datafusion-comet]

2025-07-29 Thread via GitHub
kination commented on issue #477: URL: https://github.com/apache/datafusion-comet/issues/477#issuecomment-3131401938 @kazuyukitanimura sorry for question. Is it meaning `unhex` cannot decode correctly if value is dict? For example... ``` SELECT decode(unhex(''), 'UTF-8'); ```

[PR] chore(deps): bump ctor from 0.4.2 to 0.4.3 [datafusion]

2025-07-29 Thread via GitHub
dependabot[bot] opened a new pull request, #16961: URL: https://github.com/apache/datafusion/pull/16961 Bumps [ctor](https://github.com/mmastrac/rust-ctor) from 0.4.2 to 0.4.3. Commits See full diff in https://github.com/mmastrac/rust-ctor/commits";>compare view

[PR] feat(spark): implement Spark string function like/ilike [datafusion]

2025-07-29 Thread via GitHub
chenkovsky opened a new pull request, #16962: URL: https://github.com/apache/datafusion/pull/16962 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? support spark like/ilike udf ## Are these

[PR] chore(deps): bump rand from 0.9.1 to 0.9.2 [datafusion]

2025-07-29 Thread via GitHub
dependabot[bot] opened a new pull request, #16960: URL: https://github.com/apache/datafusion/pull/16960 Bumps [rand](https://github.com/rust-random/rand) from 0.9.1 to 0.9.2. Changelog Sourced from https://github.com/rust-random/rand/blob/master/CHANGELOG.md";>rand's changelog.

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-29 Thread via GitHub
akupchinskiy commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2239434120 ## native/spark-expr/src/nondetermenistic_funcs/monotonically_increasing_id.rs: ## @@ -0,0 +1,164 @@ +// Licensed to the Apache Software Foundation (ASF)

Re: [PR] Refactor `DataFusionError::ObjectStore` to Avoid Leaking `object_store::Error` in Public API [datafusion]

2025-07-29 Thread via GitHub
crepererum commented on PR #16947: URL: https://github.com/apache/datafusion/pull/16947#issuecomment-3131816841 I think the question here is what the purpose of this shadowing/hiding is: I think it's OK if `object_store` is a purely internal library that the API user never interacts with. H

Re: [PR] Refactor `DataFusionError::ObjectStore` to Avoid Leaking `object_store::Error` in Public API [datafusion]

2025-07-29 Thread via GitHub
crepererum commented on code in PR #16947: URL: https://github.com/apache/datafusion/pull/16947#discussion_r2239333092 ## datafusion/common/src/error.rs: ## @@ -47,6 +47,61 @@ pub type SharedResult = result::Result>; /// Error type for generic operations that could result in D

Re: [PR] Add ODBC escape syntax support for time expressions [datafusion-sqlparser-rs]

2025-07-29 Thread via GitHub
iffyio merged PR #1953: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for `SHOW CHARSET` [datafusion-sqlparser-rs]

2025-07-29 Thread via GitHub
iffyio merged PR #1974: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Add support for partition column type [DATETIME] for ICEBERG table sink [datafusion-comet]

2025-07-29 Thread via GitHub
mixermt commented on issue #2045: URL: https://github.com/apache/datafusion-comet/issues/2045#issuecomment-3131272982 Sorry wrong project 😸 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-07-29 Thread via GitHub
Standing-Man commented on code in PR #16937: URL: https://github.com/apache/datafusion/pull/16937#discussion_r2239117928 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -0,0 +1,525 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

[I] Add support for partition column type [DATETIME] for ICEBERG table sink [datafusion-comet]

2025-07-29 Thread via GitHub
mixermt opened a new issue, #2045: URL: https://github.com/apache/datafusion-comet/issues/2045 ### What is the problem the feature request solves? Right now there is no option to insert to Iceberg tables ``` starrocks> CREATE TABLE iceberg_catalog.test_schema.test_table (

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-07-29 Thread via GitHub
Standing-Man commented on code in PR #16937: URL: https://github.com/apache/datafusion/pull/16937#discussion_r2239135975 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -0,0 +1,525 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat: Support distinct window for sum [datafusion]

2025-07-29 Thread via GitHub
crepererum merged PR #16943: URL: https://github.com/apache/datafusion/pull/16943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] AttachΒ `Diagnostic`Β to "duplicate table name" error [datafusion]

2025-07-29 Thread via GitHub
vegarsti commented on issue #14436: URL: https://github.com/apache/datafusion/issues/14436#issuecomment-3131087119 `DFParser::parse_sql(query)` returns an `Ok` for both examples in the OP πŸ€” Did something change in the meantime or am I missing something? My test is ``` #[test] f

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-07-29 Thread via GitHub
Standing-Man commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3131131180 Hi @alamb and @irenjj, I would like to understand how to implement the Correlated Subquery feature in DataFusion. Is there a timeline organized by feature milestones and co

Re: [I] Regression: `DataFrameWriteOptions::with_single_file_output` produces a directory [datafusion]

2025-07-29 Thread via GitHub
hknlof commented on issue #13323: URL: https://github.com/apache/datafusion/issues/13323#issuecomment-3131126145 This still is happening with DataFusion 49. Using `.parquet` suffix in output str, aligns to expected behavior of this issue. ```rust use datafusion::{ dataframe:

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-29 Thread via GitHub
zhuqi-lucas commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3131466594 Thank you @alamb , i do some investigate in do smaller changes in latest PR, could you please trigger the benchmark again, let's see the result now, thanks! And the size

Re: [PR] feat(spark): implement Spark math function rint [datafusion]

2025-07-29 Thread via GitHub
shehabgamin commented on code in PR #16924: URL: https://github.com/apache/datafusion/pull/16924#discussion_r2239266847 ## datafusion/spark/src/function/math/rint.rs: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lice

Re: [PR] Blog on Extending SQL to create own SQL Dialects [datafusion-site]

2025-07-29 Thread via GitHub
alamb commented on PR #97: URL: https://github.com/apache/datafusion-site/pull/97#issuecomment-3131704408 I am starting to check this one out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Pin github actions to commit sha [datafusion]

2025-07-29 Thread via GitHub
gopidesupavan commented on PR #16964: URL: https://github.com/apache/datafusion/pull/16964#issuecomment-3132380118 https://octopin.readthedocs.io/en/latest/ very helpful library to pin actions -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Make the max temp directory size (for spills) configurable through configuration API [datafusion]

2025-07-29 Thread via GitHub
alamb closed issue #16922: Make the max temp directory size (for spills) configurable through configuration API URL: https://github.com/apache/datafusion/issues/16922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Make the temporary directory (for spills) configurable through configuration API [datafusion]

2025-07-29 Thread via GitHub
alamb closed issue #16921: Make the temporary directory (for spills) configurable through configuration API URL: https://github.com/apache/datafusion/issues/16921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args` [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16902: URL: https://github.com/apache/datafusion/pull/16902#issuecomment-3133497704 Thank you @geetanshjuneja πŸ™ This is technically an API change so should have a entry in the upgrade guide I think -- I'll push a commit to this PR to do so -- This is an auto

Re: [PR] docs: Remove references to DataFusion for Ray sub project [datafusion]

2025-07-29 Thread via GitHub
alamb merged PR #16966: URL: https://github.com/apache/datafusion/pull/16966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: Remove references to DataFusion for Ray sub project [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16966: URL: https://github.com/apache/datafusion/pull/16966#issuecomment-3133491660 Thanks @andygrove and @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Enable physical filter pushdown for hash joins [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16954: URL: https://github.com/apache/datafusion/pull/16954#discussion_r2240550577 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -944,6 +949,53 @@ impl ExecutionPlan for HashJoinExec { try_embed_projection(projection, self)

Re: [PR] Chore: refactor Comparison out of QueryPlanSerde [datafusion-comet]

2025-07-29 Thread via GitHub
mbutrovich commented on code in PR #2028: URL: https://github.com/apache/datafusion-comet/pull/2028#discussion_r2240559841 ## spark/src/main/scala/org/apache/comet/serde/comparisons.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Add `temp_directory` and `max_temp_directory_size` runtime config variables [datafusion]

2025-07-29 Thread via GitHub
alamb merged PR #16934: URL: https://github.com/apache/datafusion/pull/16934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args` [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16902: URL: https://github.com/apache/datafusion/pull/16902#discussion_r2240582471 ## docs/source/library-user-guide/upgrading.md: ## @@ -24,6 +24,48 @@ **Note:** DataFusion `50.0.0` has not been released yet. The information provided in this sec

Re: [PR] disallow pushdown of volatile functions [datafusion]

2025-07-29 Thread via GitHub
alamb merged PR #16861: URL: https://github.com/apache/datafusion/pull/16861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] disallow pushdown of volatile functions [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16861: URL: https://github.com/apache/datafusion/pull/16861#issuecomment-3133521466 πŸš€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Physical plan pushdown for volatile predicates [datafusion]

2025-07-29 Thread via GitHub
alamb closed issue #16545: Physical plan pushdown for volatile predicates URL: https://github.com/apache/datafusion/issues/16545 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3133322533 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

[PR] Implemented Upper/lower for REE [datafusion]

2025-07-29 Thread via GitHub
rich-t-kid-datadog opened a new pull request, #16969: URL: https://github.com/apache/datafusion/pull/16969 ## Which issue does this PR close? Work towards closing [Ree Epic](https://github.com/apache/arrow-rs/issues/3520) ## Rationale for this change ## What

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3133413747 πŸ€–: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow_56.0.0 Benchmark clickbench_extended.json

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3133413853 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3133435713 πŸ€–: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow_56.0.0 Benchmark clickbench_pushdown.json

Re: [I] Separable Python and Rust components [datafusion-python]

2025-07-29 Thread via GitHub
timsaucer commented on issue #1193: URL: https://github.com/apache/datafusion-python/issues/1193#issuecomment-3133448537 I think this might be a duplicate of https://github.com/apache/datafusion-python/issues/853 which may need to be reopened. I suspect your immediate problem is res

Re: [I] Filtering and counting afterwards causes overflow panic in interval arithmetics [datafusion]

2025-07-29 Thread via GitHub
90degs2infty commented on issue #16736: URL: https://github.com/apache/datafusion/issues/16736#issuecomment-3133111979 Thank you very much for looking into this, everyone! πŸ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] unnest should preserve the input's equivalence properties for uninvolved columns [datafusion]

2025-07-29 Thread via GitHub
vegarsti commented on issue #15231: URL: https://github.com/apache/datafusion/issues/15231#issuecomment-3133124876 Can I try to take this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-29 Thread via GitHub
andygrove commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2240291837 ## native/spark-expr/src/nondetermenistic_funcs/monotonically_increasing_id.rs: ## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF) und

[PR] Add 'regexp_extract' function [datafusion]

2025-07-29 Thread via GitHub
galibey opened a new pull request, #16967: URL: https://github.com/apache/datafusion/pull/16967 Added implementation of `regexp_extract' function similar to https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.regexp_extract.html -- This is an au

Re: [PR] Add 'regexp_extract' function [datafusion]

2025-07-29 Thread via GitHub
galibey closed pull request #16967: Add 'regexp_extract' function URL: https://github.com/apache/datafusion/pull/16967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[PR] remove warning from every file open [datafusion]

2025-07-29 Thread via GitHub
adriangb opened a new pull request, #16968: URL: https://github.com/apache/datafusion/pull/16968 this is too noisy and not helpful yet, we don't have a fully implemented alternative -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-29 Thread via GitHub
akupchinskiy commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2240365504 ## native/spark-expr/src/nondetermenistic_funcs/monotonically_increasing_id.rs: ## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF)

Re: [I] Consider split expr.proto into multiple files [datafusion-comet]

2025-07-29 Thread via GitHub
viirya commented on issue #214: URL: https://github.com/apache/datafusion-comet/issues/214#issuecomment-3133233984 > [WARNING] Files with unapproved licenses: You don't have the license headers in the new proto files. -- This is an automated message from the Apache Git Service. To

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3133237990 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] DRAFT: Update arrow/parquet to 56.0.0 [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16690: URL: https://github.com/apache/datafusion/pull/16690#issuecomment-3133238622 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-29 Thread via GitHub
akupchinskiy commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2239491712 ## native/spark-expr/src/nondetermenistic_funcs/monotonically_increasing_id.rs: ## @@ -0,0 +1,164 @@ +// Licensed to the Apache Software Foundation (ASF)

[I] Incorrect results of aggregation with grouping sets with single target partition [datafusion]

2025-07-29 Thread via GitHub
mpurins-coralogix opened a new issue, #16965: URL: https://github.com/apache/datafusion/issues/16965 ### Describe the bug When datafusion.execution.target_partitions is set to 1 then following query gives incorrect results -- `select id from (select 'id' as id union all select 'id' a

Re: [PR] Metadata handling announcement [datafusion-site]

2025-07-29 Thread via GitHub
timsaucer commented on PR #73: URL: https://github.com/apache/datafusion-site/pull/73#issuecomment-3132515212 @paleolimbot I've pushed a [repository here](https://github.com/timsaucer/datafusion_extension_type_examples) that demonstrates using scalar UDFs for working with UUIDs. Can you tak

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-29 Thread via GitHub
akupchinskiy commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2239441463 ## native/spark-expr/src/nondetermenistic_funcs/monotonically_increasing_id.rs: ## @@ -0,0 +1,164 @@ +// Licensed to the Apache Software Foundation (ASF)

Re: [PR] feat: support alter schema for bigquery [datafusion-sqlparser-rs]

2025-07-29 Thread via GitHub
iffyio commented on code in PR #1980: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1980#discussion_r2239425948 ## src/ast/mod.rs: ## @@ -3381,6 +3381,17 @@ pub enum Statement { iceberg: bool, }, /// ```sql +/// ALTER SCHEMA +/// ``` +

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-29 Thread via GitHub
akupchinskiy commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2239436301 ## native/core/src/execution/planner.rs: ## @@ -798,6 +799,13 @@ impl PhysicalPlanner { let seed = expr.seed.wrapping_add(self.partition.

Re: [PR] feat(spark): implement Spark math function rint [datafusion]

2025-07-29 Thread via GitHub
chenkovsky commented on code in PR #16924: URL: https://github.com/apache/datafusion/pull/16924#discussion_r2239370817 ## datafusion/spark/src/function/math/rint.rs: ## @@ -0,0 +1,162 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Allow to customize the default null ordering [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16963: URL: https://github.com/apache/datafusion/pull/16963#discussion_r2240591332 ## datafusion/sql/src/planner.rs: ## @@ -147,10 +159,60 @@ impl From<&SqlParserOptions> for ParserOptions { enable_options_value_normalization: options

[PR] Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-07-29 Thread via GitHub
Omega359 opened a new pull request, #16970: URL: https://github.com/apache/datafusion/pull/16970 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/13519 This is a continuation of @alamb's PR https://github.com/apache/datafusion/pull/16661

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16937: URL: https://github.com/apache/datafusion/pull/16937#discussion_r2240602283 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -0,0 +1,525 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Pin github actions to commit sha [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16964: URL: https://github.com/apache/datafusion/pull/16964#discussion_r2240605914 ## .github/workflows/dev.yml: ## @@ -27,15 +27,15 @@ jobs: runs-on: ubuntu-latest name: Check License Header steps: - - uses: actions/checkout@v4

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3133571617 πŸ€– `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] Implementing partition_statistics for EmptyExec (Issue #15873) [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16941: URL: https://github.com/apache/datafusion/pull/16941#discussion_r2240765144 ## datafusion/physical-plan/src/empty.rs: ## @@ -165,23 +169,28 @@ impl ExecutionPlan for EmptyExec { ); } } +// Build

Re: [PR] We have now the CI ensure all doc strings remain formatted [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16916: URL: https://github.com/apache/datafusion/pull/16916#discussion_r2240760143 ## .github/workflows/ci.yml: ## @@ -0,0 +1,18 @@ +name: CI + +on: + push: +branches: [ main ] + pull_request: + +jobs: + fmt: +runs-on: ubuntu-latest +

Re: [I] Rewrite expression in FilterExec instead of the data [datafusion]

2025-07-29 Thread via GitHub
alamb commented on issue #16957: URL: https://github.com/apache/datafusion/issues/16957#issuecomment-3133787665 I think closing it is a good idea unless we can come up with an example πŸ€” -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] feat: Cache Parquet metadata [datafusion]

2025-07-29 Thread via GitHub
nuno-faria commented on code in PR #16971: URL: https://github.com/apache/datafusion/pull/16971#discussion_r2240645876 ## datafusion/execution/src/cache/cache_manager.rs: ## @@ -59,6 +75,13 @@ impl CacheManager { if let Some(lc) = &config.list_files_cache {

Re: [I] Rewrite expression in FilterExec instead of the data [datafusion]

2025-07-29 Thread via GitHub
alamb commented on issue #16957: URL: https://github.com/apache/datafusion/issues/16957#issuecomment-3133672279 What might be worth doing is writing some tests before we write some code I think the hard bit about writing tests for this particular feature is that this optimization is a

Re: [PR] fix: `TrivialValueAccumulators` to ignore null states for `ignore nulls` [datafusion]

2025-07-29 Thread via GitHub
comphead commented on PR #16918: URL: https://github.com/apache/datafusion/pull/16918#issuecomment-3133671551 I'm still on that, my Comet test likely showing some race conditions. Sounds like depending on incoming data order DF can give incorrect result, but I still trying to create a stabl

Re: [I] Rewrite expression in FilterExec instead of the data [datafusion]

2025-07-29 Thread via GitHub
adriangb closed issue #16957: Rewrite expression in FilterExec instead of the data URL: https://github.com/apache/datafusion/issues/16957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-29 Thread via GitHub
coderfender commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2241291653 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -113,6 +113,16 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-07-29 Thread via GitHub
Standing-Man commented on code in PR #16937: URL: https://github.com/apache/datafusion/pull/16937#discussion_r2241296292 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -0,0 +1,514 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Implement Spark `url` function `parse_url` [datafusion]

2025-07-29 Thread via GitHub
Standing-Man commented on code in PR #16937: URL: https://github.com/apache/datafusion/pull/16937#discussion_r2241300096 ## datafusion/spark/src/function/url/parse_url.rs: ## @@ -0,0 +1,514 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] optimize `initcap` function by avoiding memory allocation [datafusion]

2025-07-29 Thread via GitHub
waynexia commented on PR #16878: URL: https://github.com/apache/datafusion/pull/16878#issuecomment-3134103490 Thanks for those links! Yes they are a bit sophisticated to implement... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: Cache Parquet metadata [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3134095332 πŸ€–: Benchmark completed Details ``` Comparing HEAD and cache_parquet_metadata Benchmark clickbench_extended.json ---

Re: [D] DISCUSSION: DataFusion Meetup in Boston, USA - Nov 12, 2025 [datafusion]

2025-07-29 Thread via GitHub
GitHub user edmondop added a comment to the discussion: DISCUSSION: DataFusion Meetup in Boston, USA - Nov 12, 2025 50.0.0 should be released by then, I propose @alamb gives a talk about the history of DataFusion that includes the most important milestones GitHub link: https://github.com/a

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3133792914 πŸ€– `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~

Re: [PR] feat: Cache Parquet metadata [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3133790023 Thank you @nuno-faria - I plan to review this tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] optimize `initcap` function by avoiding memory allocation [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16878: URL: https://github.com/apache/datafusion/pull/16878#issuecomment-3133806226 > But I'm wondering if we can wrap something like ViewItemBuilder that holds the similar logic as [here](https://docs.rs/arrow-array/55.2.0/src/arrow_array/builder/generic_bytes_view_b

Re: [I] Regression: `DataFrameWriteOptions::with_single_file_output` produces a directory [datafusion]

2025-07-29 Thread via GitHub
alamb commented on issue #13323: URL: https://github.com/apache/datafusion/issues/13323#issuecomment-3133808835 Thanks for checking @hknlof -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] fix error result in execute&pre_selection [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16930: URL: https://github.com/apache/datafusion/pull/16930#discussion_r2240750126 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -375,7 +375,19 @@ impl PhysicalExpr for BinaryExpr { // as it takes into account cases

Re: [I] Incorrect results of aggregation with grouping sets with single target partition [datafusion]

2025-07-29 Thread via GitHub
chenkovsky commented on issue #16965: URL: https://github.com/apache/datafusion/issues/16965#issuecomment-3134584082 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: Fix `EquivalenceClass` calculation for Union queries [datafusion]

2025-07-29 Thread via GitHub
chenkovsky commented on PR #16185: URL: https://github.com/apache/datafusion/pull/16185#issuecomment-3134581064 > THanks @chenkovsky -- I will try and find time to review this PR in more detail tomorrow @alamb could you please help review this PR. -- This is an automated message f

Re: [PR] Emptyexec partitionstats [datafusion]

2025-07-29 Thread via GitHub
vim89 closed pull request #16974: Emptyexec partitionstats URL: https://github.com/apache/datafusion/pull/16974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] Emptyexec partitionstats [datafusion]

2025-07-29 Thread via GitHub
vim89 opened a new pull request, #16974: URL: https://github.com/apache/datafusion/pull/16974 Added known statistics values -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Blog post on async user defined functions [datafusion-site]

2025-07-29 Thread via GitHub
Adez017 commented on PR #96: URL: https://github.com/apache/datafusion-site/pull/96#issuecomment-3134958052 CC: @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Emptyexec partitionstats [datafusion]

2025-07-29 Thread via GitHub
vim89 commented on PR #16974: URL: https://github.com/apache/datafusion/pull/16974#issuecomment-3134971334 Duplicate PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Tracking fs-hdfs issues [datafusion-comet]

2025-07-29 Thread via GitHub
jiayuasu commented on issue #2034: URL: https://github.com/apache/datafusion-comet/issues/2034#issuecomment-3135041300 @parthchandra I noticed that the PRs on `fs-hdfs` do not get reviewed by the maintainers. Do you know someone who have write access to that repo so we can merge PRs? --

Re: [PR] Support for colon preceeded placeholders [datafusion-sqlparser-rs]

2025-07-29 Thread via GitHub
xitep commented on code in PR #1979: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1979#discussion_r2241698474 ## src/dialect/mod.rs: ## @@ -841,6 +841,12 @@ pub trait Dialect: Debug + Any { false } +/// Returns true if this dialect allow colon

Re: [PR] Support for colon preceeded placeholders [datafusion-sqlparser-rs]

2025-07-29 Thread via GitHub
xitep commented on code in PR #1979: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1979#discussion_r2241698474 ## src/dialect/mod.rs: ## @@ -841,6 +841,12 @@ pub trait Dialect: Debug + Any { false } +/// Returns true if this dialect allow colon

Re: [PR] feat: Cache Parquet metadata [datafusion]

2025-07-29 Thread via GitHub
Dandandan commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3135079538 > πŸ€–: Benchmark completed > > Details I think this doesn't show anything as it's not enabled by default? Should we enable it? -- This is an automated message from th

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3133626834 EPIC! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-07-29 Thread via GitHub
alamb commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3133659852 Hi @Standing-Man -- that is a great question. @irenjj perhaps this would be a good exercise for the next weekly update -- make an "Epic" style description of the steps needed and

Re: [PR] Enable Projection Pushdown Optimization for Recursive CTEs [datafusion]

2025-07-29 Thread via GitHub
alamb commented on code in PR #16696: URL: https://github.com/apache/datafusion/pull/16696#discussion_r2240645344 ## datafusion/core/tests/sql/recursive_cte.rs: ## @@ -0,0 +1,130 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3133737059 πŸ€–: Benchmark completed Details ``` group main reduce_expr_size -

Re: [I] Ensure GroupByHash does not error when trying to spill (calling try_resize where error is not acceptible) [datafusion]

2025-07-29 Thread via GitHub
alamb commented on issue #14851: URL: https://github.com/apache/datafusion/issues/14851#issuecomment-3133654528 I agree that the problem described on this ticket should probably be handled with the cascaded merge solution However, without a reproducer it is not easy to verify. Maybe w

Re: [PR] feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16771: URL: https://github.com/apache/datafusion/pull/16771#issuecomment-3133972910 πŸ€–: Benchmark completed Details ``` group main reduce_expr_size -

Re: [PR] feat: Cache Parquet metadata [datafusion]

2025-07-29 Thread via GitHub
alamb commented on PR #16971: URL: https://github.com/apache/datafusion/pull/16971#issuecomment-3133973079 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [I] Separable Python and Rust components [datafusion-python]

2025-07-29 Thread via GitHub
awhyte commented on issue #1193: URL: https://github.com/apache/datafusion-python/issues/1193#issuecomment-3134235415 Thanks very much for the quick response, Tim. Sadly passing `col("a").expr` as `my_lib_function(col("a").expr)` also fails with the same message. Perhaps the build co

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-29 Thread via GitHub
andygrove commented on code in PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#discussion_r2241134819 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -113,6 +113,16 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

  1   2   >