Re: [PR] feat: Improve fallback mechanism for ANSI mode [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove commented on code in PR #2211: URL: https://github.com/apache/datafusion-comet/pull/2211#discussion_r2304050293 ## dev/diffs/3.4.3.diff: ## @@ -894,6 +894,19 @@ index 525d97e4998..8a3e7457618 100644 AccumulatorSuite.verifyPeakExecutionMemorySet(sparkContext, "ext

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-08-27 Thread via GitHub
alamb commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3228359549 In terms of designing an API, I think it would help me to understand what behavior you want in Ballista when a user runs `DataFrame::cache` (aka how wll you use this API)?

[I] Add type parameter to `CometAggregateExpressionSerde` [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove opened a new issue, #2245: URL: https://github.com/apache/datafusion-comet/issues/2245 ### What is the problem the feature request solves? I would llike to add a type parameter to `CometAggregateExpressionSerde` to make it consistent with `CometExpressionSerde` and `CometOpe

[PR] chore: Fix `array_interest` test [datafusion-comet]

2025-08-27 Thread via GitHub
comphead opened a new pull request, #2246: URL: https://github.com/apache/datafusion-comet/pull/2246 ## Which issue does this PR close? Closes #2174 . ## Rationale for this change ## What changes are included in this PR? ## How are these cha

Re: [I] Core tests fail when run alone [datafusion]

2025-08-27 Thread via GitHub
alamb closed issue #17141: Core tests fail when run alone URL: https://github.com/apache/datafusion/issues/17141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Add `cfg(feature = "avro")` attribute to Avro example in SQL API docs [datafusion]

2025-08-27 Thread via GitHub
alamb merged PR #17142: URL: https://github.com/apache/datafusion/pull/17142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Integrate OpenDAL to support more file serivces [datafusion-comet]

2025-08-27 Thread via GitHub
wForget opened a new issue, #2243: URL: https://github.com/apache/datafusion-comet/issues/2243 ### What is the problem the feature request solves? I noticed that [fs-hdfs](https://github.com/datafusion-contrib/fs-hdfs) repo we had imported was not very active and we had to maintain a

[PR] feat: Support hdfs with OpenDAL [datafusion-comet]

2025-08-27 Thread via GitHub
wForget opened a new pull request, #2244: URL: https://github.com/apache/datafusion-comet/pull/2244 ## Which issue does this PR close? Closes #2243. ## Rationale for this change I also noticed the [Apache OpenDAL](https://github.com/apache/opendal) project, which

Re: [PR] Expose and generalize cast_column to enable struct → struct casting in more contexts [datafusion]

2025-08-27 Thread via GitHub
kosiew commented on code in PR #17281: URL: https://github.com/apache/datafusion/pull/17281#discussion_r2303202853 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -210,9 +210,22 @@ impl ColumnarValue { ) -> Result { let cast_options = cast_options.cloned().

Re: [PR] feat: add configurable cache mode (local_cache) with LogicalPlan::Cache (#17297) [datafusion]

2025-08-27 Thread via GitHub
MrGranday commented on PR #17314: URL: https://github.com/apache/datafusion/pull/17314#issuecomment-3227235253 Updates in this PR: - Added `DataFrame.cache()` method (eager materialization into MemTable) - Reverted `physical_planner.rs` back to its default (no unnecessary changes)

Re: [PR] fix EquivalenceProperties calculation in DataSourceExec [datafusion]

2025-08-27 Thread via GitHub
xudong963 commented on code in PR #17323: URL: https://github.com/apache/datafusion/pull/17323#discussion_r2303233301 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -595,7 +595,7 @@ fn test_no_pushdown_through_aggregates() { Ok: - F

Re: [PR] feat: add configurable cache mode (local_cache) with LogicalPlan::Cache (#17297) [datafusion]

2025-08-27 Thread via GitHub
milenkovicm commented on PR #17314: URL: https://github.com/apache/datafusion/pull/17314#issuecomment-3227261199 Sorry @MrGranday i see no changes in cache logic, just comment update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: Support hdfs with OpenDAL [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2244: URL: https://github.com/apache/datafusion-comet/pull/2244#issuecomment-3227310091 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2244?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: fix typos [datafusion]

2025-08-27 Thread via GitHub
waynexia commented on PR #17135: URL: https://github.com/apache/datafusion/pull/17135#issuecomment-3229948067 cross reference the follow up pr here https://github.com/apache/datafusion/pull/17339 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] fix: Remove unreachable code in `CometScanRule` [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2252: URL: https://github.com/apache/datafusion-comet/pull/2252#issuecomment-3229923899 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2252?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] `cast StructType to StringType` test fails if default scan is `auto` [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on issue #2175: URL: https://github.com/apache/datafusion-comet/issues/2175#issuecomment-3230221526 related to https://github.com/apache/datafusion-comet/pull/2246#issuecomment-3228649992 @andygrove @mbutrovich it fails on ``` checkSparkA

Re: [PR] Unnest Correlated Subquery [datafusion]

2025-08-27 Thread via GitHub
irenjj commented on PR #17110: URL: https://github.com/apache/datafusion/pull/17110#issuecomment-3230232039 > duckdb has a notion of `PropagatesNullValues`, i wonder if we have some thing similar > > ``` > impl ExprSchemable for Expr { > fn nullable(&self, input_schema: &dyn

[PR] chore: fix struct to string test for `native_iceberg_compat` [datafusion-comet]

2025-08-27 Thread via GitHub
comphead opened a new pull request, #2253: URL: https://github.com/apache/datafusion-comet/pull/2253 ## Which issue does this PR close? Closes #2175 . ## Rationale for this change The test uses UINTs which is known as behaving not the same as Spark and explained

[PR] chore: Introduce `strict-warning` profile for Scala [datafusion-comet]

2025-08-27 Thread via GitHub
comphead opened a new pull request, #2254: URL: https://github.com/apache/datafusion-comet/pull/2254 ## Which issue does this PR close? Follow up on https://github.com/apache/datafusion-comet/pull/2252#pullrequestreview-3162500690. ## Rationale for this change Cr

Re: [PR] fix: Remove unreachable code in `CometScanRule` [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on PR #2252: URL: https://github.com/apache/datafusion-comet/pull/2252#issuecomment-3231000650 Filed https://github.com/apache/datafusion-comet/pull/2254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] chore: Fix Scala warns [datafusion-comet]

2025-08-27 Thread via GitHub
comphead opened a new issue, #2255: URL: https://github.com/apache/datafusion-comet/issues/2255 ### What is the problem the feature request solves? Build the project with `strict-warnings` profile enabled and need to fix scala code hygiene issues ``` make release PROFILES="

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-08-27 Thread via GitHub
github-actions[bot] closed pull request #16386: Improved experience when remote object store URL does not end in / URL: https://github.com/apache/datafusion/pull/16386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-08-27 Thread via GitHub
github-actions[bot] closed pull request #15958: [datafusion-spark] Implement ceil&floor function for spark URL: https://github.com/apache/datafusion/pull/15958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count [datafusion]

2025-08-27 Thread via GitHub
github-actions[bot] commented on PR #16359: URL: https://github.com/apache/datafusion/pull/16359#issuecomment-3231186634 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Tracking: date_time related features [datafusion]

2025-08-27 Thread via GitHub
xudong963 commented on issue #14661: URL: https://github.com/apache/datafusion/issues/14661#issuecomment-3231252003 done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Tracking: date_time related features [datafusion]

2025-08-27 Thread via GitHub
xudong963 closed issue #14661: Tracking: date_time related features URL: https://github.com/apache/datafusion/issues/14661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [Draft] Add Run End Encoding casts to Datafusion [datafusion]

2025-08-27 Thread via GitHub
github-actions[bot] closed pull request #16446: [Draft] Add Run End Encoding casts to Datafusion URL: https://github.com/apache/datafusion/pull/16446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore: Refactor `Cast` serde to avoid code duplication [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on code in PR #2242: URL: https://github.com/apache/datafusion-comet/pull/2242#discussion_r2305572328 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2386,7 +2307,7 @@ case class Compatible(notes: Option[String] = None) extends Sup

Re: [PR] chore: Refactor `Cast` serde to avoid code duplication [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on code in PR #2242: URL: https://github.com/apache/datafusion-comet/pull/2242#discussion_r2305580436 ## spark/src/main/scala/org/apache/comet/expressions/CometCast.scala: ## @@ -136,7 +187,7 @@ object CometCast { // https://github.com/apache/datafusi

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#issuecomment-3229916135 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2251?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore(deps): bump actions/setup-java from 4 to 5 [datafusion-comet]

2025-08-27 Thread via GitHub
comphead merged PR #2225: URL: https://github.com/apache/datafusion-comet/pull/2225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add `cfg(feature = "avro")` attribute to Avro example in SQL API docs [datafusion]

2025-08-27 Thread via GitHub
kosiew commented on PR #17142: URL: https://github.com/apache/datafusion/pull/17142#issuecomment-3231040648 Thanks @alamb for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] chore: Introduce `strict-warning` profile for Scala [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2254: URL: https://github.com/apache/datafusion-comet/pull/2254#issuecomment-3231140426 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2254?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] Potential bug(?): Inconsistent usage of column() / col() and literal() / lit() [datafusion-python]

2025-08-27 Thread via GitHub
HeWhoHeWho opened a new issue, #1214: URL: https://github.com/apache/datafusion-python/issues/1214 **Describe the bug** I couldn't locate the exact documentation that lays out the usage of column() / col() and literal() / lit(). Here's what I encountered, I will start describing the prob

[I] Memory should not blow up after Arrow IPC write-read round trip during spilling [datafusion]

2025-08-27 Thread via GitHub
ding-young opened a new issue, #17340: URL: https://github.com/apache/datafusion/issues/17340 ### Describe the bug This issue was observed in https://github.com/apache/datafusion/pull/17029, where the memory size of a RecordBatch after reading from spill (via Arrow IPC) is significa

Re: [I] Memory should not blow up after Arrow IPC write-read round trip during spilling [datafusion]

2025-08-27 Thread via GitHub
ding-young commented on issue #17340: URL: https://github.com/apache/datafusion/issues/17340#issuecomment-3231574231 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] chore(deps): bump actions/setup-java from 4 to 5 [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on PR #2225: URL: https://github.com/apache/datafusion-comet/pull/2225#issuecomment-3230128446 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] chore: Refactor `Cast` serde to avoid code duplication [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on code in PR #2242: URL: https://github.com/apache/datafusion-comet/pull/2242#discussion_r2305570632 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1875,8 +1875,8 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-27 Thread via GitHub
Adez017 commented on code in PR #17018: URL: https://github.com/apache/datafusion/pull/17018#discussion_r2305845680 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -527,6 +901,17 @@ tanh(numeric_expression) - **numeric_expression**: Numeric expression to operate on.

[PR] feat: Make Parquet EncryptionFactory async [datafusion]

2025-08-27 Thread via GitHub
adamreeve opened a new pull request, #17342: URL: https://github.com/apache/datafusion/pull/17342 ## Which issue does this PR close? - Closes #17341 ## Rationale for this change To allow users to implement async network access within the encryption factory without block

[I] Make EncryptionFactory trait async [datafusion]

2025-08-27 Thread via GitHub
adamreeve opened a new issue, #17341: URL: https://github.com/apache/datafusion/issues/17341 So that users can implement async access to a KMS, this trait should be async. This trait isn't included in a release yet so this won't be a breaking change. See https://github.com/apa

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-27 Thread via GitHub
adamreeve commented on PR #16779: URL: https://github.com/apache/datafusion/pull/16779#issuecomment-3231844498 I've made a PR to change this to async: #17342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[I] bug: cast StructType to String fails for duplicated col [datafusion-comet]

2025-08-27 Thread via GitHub
comphead opened a new issue, #2256: URL: https://github.com/apache/datafusion-comet/issues/2256 ### Describe the bug The `cast StructType to StringType` fails on duplicated columns with auto and native_scan ``` checkSparkAnswerAndOperator( "SELEC

Re: [PR] add a ci job for typo checking [datafusion]

2025-08-27 Thread via GitHub
Jefffrey commented on code in PR #17339: URL: https://github.com/apache/datafusion/pull/17339#discussion_r2306212070 ## datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs: ## @@ -1095,7 +1095,7 @@ async fn test_hashjoin_dynamic_filter_pushdown_partitioned() {

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-27 Thread via GitHub
Jefffrey commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2306265446 ## datafusion/spark/src/function/bitwise/mod.rs: ## @@ -34,8 +38,29 @@ pub mod expr_fn { "Returns the number of bits set in the binary representation of

Re: [PR] Fix incorrect memory accounting for sliced `StringViewArray` [datafusion]

2025-08-27 Thread via GitHub
ctsk commented on code in PR #17315: URL: https://github.com/apache/datafusion/pull/17315#discussion_r2306281608 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -194,7 +195,84 @@ impl GetSlicedSize for RecordBatch { for array in self.columns() {

Re: [PR] chore(deps): bump clap from 4.5.45 to 4.5.46 [datafusion]

2025-08-27 Thread via GitHub
Jefffrey merged PR #17338: URL: https://github.com/apache/datafusion/pull/17338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] add a ci job for typo checking [datafusion]

2025-08-27 Thread via GitHub
Jefffrey merged PR #17339: URL: https://github.com/apache/datafusion/pull/17339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] optimizer: Rewrite `IS NOT DISTINCT FROM` joins as Hash Joins [datafusion]

2025-08-27 Thread via GitHub
jonathanc-n commented on code in PR #17319: URL: https://github.com/apache/datafusion/pull/17319#discussion_r2306274977 ## datafusion/optimizer/src/extract_equijoin_predicate.rs: ## @@ -112,22 +151,82 @@ impl OptimizerRule for ExtractEquijoinPredicate { } } +/// Splits a

Re: [PR] Unnest Correlated Subquery [datafusion]

2025-08-27 Thread via GitHub
duongcongtoai commented on PR #17110: URL: https://github.com/apache/datafusion/pull/17110#issuecomment-3232046173 ok, then maybe i'll add a new method to `ExprSchemable` trait with a default impl to avoid breaking change -- This is an automated message from the Apache Git Service. To res

Re: [PR] Fix incorrect memory accounting for sliced `StringViewArray` [datafusion]

2025-08-27 Thread via GitHub
ctsk commented on code in PR #17315: URL: https://github.com/apache/datafusion/pull/17315#discussion_r2306281608 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -194,7 +195,84 @@ impl GetSlicedSize for RecordBatch { for array in self.columns() {

Re: [PR] Fix incorrect memory accounting for sliced `StringViewArray` [datafusion]

2025-08-27 Thread via GitHub
ding-young commented on code in PR #17315: URL: https://github.com/apache/datafusion/pull/17315#discussion_r2306363399 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -194,7 +195,84 @@ impl GetSlicedSize for RecordBatch { for array in self.columns() {

Re: [PR] Fix incorrect memory accounting for sliced `StringViewArray` [datafusion]

2025-08-27 Thread via GitHub
ding-young commented on code in PR #17315: URL: https://github.com/apache/datafusion/pull/17315#discussion_r2306363399 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -194,7 +195,84 @@ impl GetSlicedSize for RecordBatch { for array in self.columns() {

Re: [PR] Impl spark bit not function [datafusion]

2025-08-27 Thread via GitHub
Jefffrey commented on code in PR #17155: URL: https://github.com/apache/datafusion/pull/17155#discussion_r2306234233 ## datafusion/spark/src/function/bitwise/mod.rs: ## @@ -34,8 +36,9 @@ pub mod expr_fn { "Returns the number of bits set in the binary representation of t

Re: [PR] Impl spark bit not function [datafusion]

2025-08-27 Thread via GitHub
Jefffrey commented on code in PR #17155: URL: https://github.com/apache/datafusion/pull/17155#discussion_r2306246262 ## datafusion/spark/src/function/bitwise/bit_not.rs: ## @@ -0,0 +1,229 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] chore(deps): bump korandoru/hawkeye from 6.1.1 to 6.2.0 [datafusion]

2025-08-27 Thread via GitHub
Jefffrey merged PR #17321: URL: https://github.com/apache/datafusion/pull/17321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] add a ci job for typo checking [datafusion]

2025-08-27 Thread via GitHub
waynexia opened a new pull request, #17339: URL: https://github.com/apache/datafusion/pull/17339 ## Which issue does this PR close? - follow up of https://github.com/apache/datafusion/pull/17135#pullrequestreview-3155345954 ## Rationale for this change Ad

Re: [PR] Implement `partition_statistics` API for `InterleaveExec` [datafusion]

2025-08-27 Thread via GitHub
liamzwbao commented on PR #17051: URL: https://github.com/apache/datafusion/pull/17051#issuecomment-3229943534 Hi @xudong963, this PR is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-27 Thread via GitHub
comphead commented on code in PR #17018: URL: https://github.com/apache/datafusion/pull/17018#discussion_r2305718642 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -527,6 +901,17 @@ tanh(numeric_expression) - **numeric_expression**: Numeric expression to operate on.

Re: [PR] chore: fix struct to string test for `native_iceberg_compat` [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2253: URL: https://github.com/apache/datafusion-comet/pull/2253#issuecomment-3230426802 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2253?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: split `expr.proto` file [datafusion-comet]

2025-08-27 Thread via GitHub
kination commented on PR #2046: URL: https://github.com/apache/datafusion-comet/pull/2046#issuecomment-3231284758 Sorry I couldn't reproduce PR build failure in local(macOS). Could somebody give me some tip to check these? -- This is an automated message from the Apache Git Service. To r

Re: [PR] chore(deps): bump actions/checkout from 4 to 5 [datafusion-comet]

2025-08-27 Thread via GitHub
comphead merged PR #2229: URL: https://github.com/apache/datafusion-comet/pull/2229 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] [iceberg] TestSparkDataWrite show mismatch results [datafusion-comet]

2025-08-27 Thread via GitHub
hsiang-c closed issue #2118: [iceberg] TestSparkDataWrite show mismatch results URL: https://github.com/apache/datafusion-comet/issues/2118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-27 Thread via GitHub
adamreeve commented on PR #16779: URL: https://github.com/apache/datafusion/pull/16779#issuecomment-3231493585 One thing we might want to change before releasing this is making the `EncryptionFactory` methods async. I guess for the Comet use case, calling into Java will need to be synchrono

[PR] fix: Remove unreachable code in `CometScanRule` [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove opened a new pull request, #2252: URL: https://github.com/apache/datafusion-comet/pull/2252 ## Which issue does this PR close? N/A ## Rationale for this change Fix this compiler warning: ``` CometScanRule.scala:327: unreachable code due

Re: [PR] chore(deps): bump url from 2.5.6 to 2.5.7 [datafusion]

2025-08-27 Thread via GitHub
comphead merged PR #17324: URL: https://github.com/apache/datafusion/pull/17324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Fix `array_interest` test [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on PR #2246: URL: https://github.com/apache/datafusion-comet/pull/2246#issuecomment-3228649992 The cause the test is failed related to parquet data file reader. The test data created by Parquet writer manually not with DataFrame API and the output schema is ```

Re: [PR] feat: Reset data buf of NativeBatchDecoderIterator on close [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove commented on code in PR #2235: URL: https://github.com/apache/datafusion-comet/pull/2235#discussion_r2304287564 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/NativeBatchDecoderIterator.scala: ## @@ -182,14 +182,22 @@ case class NativeBatchDecoder

Re: [PR] feat(spark): implement Spark conditional function if [datafusion]

2025-08-27 Thread via GitHub
chenkovsky commented on code in PR #16946: URL: https://github.com/apache/datafusion/pull/16946#discussion_r2304241929 ## datafusion/spark/src/function/conditional/if.rs: ## @@ -0,0 +1,101 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] chore(deps): bump regex from 1.11.1 to 1.11.2 [datafusion]

2025-08-27 Thread via GitHub
comphead merged PR #17325: URL: https://github.com/apache/datafusion/pull/17325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore(deps): bump actions/checkout from 4 to 5 [datafusion-comet]

2025-08-27 Thread via GitHub
comphead commented on PR #2229: URL: https://github.com/apache/datafusion-comet/pull/2229#issuecomment-3228687161 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: Reset data buf of NativeBatchDecoderIterator on close [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove commented on code in PR #2235: URL: https://github.com/apache/datafusion-comet/pull/2235#discussion_r2304332108 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/NativeBatchDecoderIterator.scala: ## @@ -182,14 +182,22 @@ case class NativeBatchDecoder

Re: [PR] fix: separate type checking for CometExchange and CometColumnarExchange [datafusion-comet]

2025-08-27 Thread via GitHub
mbutrovich commented on PR #2241: URL: https://github.com/apache/datafusion-comet/pull/2241#issuecomment-3228969461 Will merge after this latest CI run, then will take a look at #2247. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: Reset data buf of NativeBatchDecoderIterator on close [datafusion-comet]

2025-08-27 Thread via GitHub
mbutrovich commented on PR #2235: URL: https://github.com/apache/datafusion-comet/pull/2235#issuecomment-3228971143 Will merge once CI goes green, and then will take a look at #2247. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] fix: `UnresolvedShuffleExec` should support `with_new_children` [datafusion-ballista]

2025-08-27 Thread via GitHub
milenkovicm opened a new pull request, #1300: URL: https://github.com/apache/datafusion-ballista/pull/1300 # Which issue does this PR close? Closes #. # Rationale for this change At the moment `UnresolvedShuffleExec` does not support `with_new_children` which is wrong,

Re: [PR] fix: Fix potential resource leak in native shuffle block reader [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2247: URL: https://github.com/apache/datafusion-comet/pull/2247#issuecomment-3228908659 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2247?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Fix `array_interest` test [datafusion-comet]

2025-08-27 Thread via GitHub
codecov-commenter commented on PR #2246: URL: https://github.com/apache/datafusion-comet/pull/2246#issuecomment-3228609191 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2246?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add `TableProvider::scan_with_args` to support pushdown sorting [datafusion]

2025-08-27 Thread via GitHub
alamb commented on code in PR #17273: URL: https://github.com/apache/datafusion/pull/17273#discussion_r2304261369 ## datafusion/catalog/src/table.rs: ## @@ -299,6 +328,75 @@ pub trait TableProvider: Debug + Sync + Send { } } +#[derive(Debug, Clone, Default)] +pub struct

Re: [PR] feat(spark): implement Spark `width_bucket` function [datafusion]

2025-08-27 Thread via GitHub
comphead commented on code in PR #17331: URL: https://github.com/apache/datafusion/pull/17331#discussion_r2304277268 ## datafusion/spark/src/function/math/width_bucket.rs: ## @@ -0,0 +1,506 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] GC string views on hash join build side [datafusion]

2025-08-27 Thread via GitHub
ctsk closed pull request #16463: GC string views on hash join build side URL: https://github.com/apache/datafusion/pull/16463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] External sort failing with non-spillable operators as input (RepartitionExec) [datafusion]

2025-08-27 Thread via GitHub
16pierre opened a new issue, #17334: URL: https://github.com/apache/datafusion/issues/17334 ### Describe the bug I'm trying to sort some input (union of filtered Parquet files) with fixed parallelism, in order to do so I'm manually fiddling with a round-robin `RepartitionExec` operat

[PR] fix: Fix potential resource leak in native shuffle block reader [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove opened a new pull request, #2247: URL: https://github.com/apache/datafusion-comet/pull/2247 ## Which issue does this PR close? N/A ## Rationale for this change ## What changes are included in this PR? NativeBatchDecoderIterator

Re: [PR] feat: add configurable cache mode (local_cache) with LogicalPlan::Cache (#17297) [datafusion]

2025-08-27 Thread via GitHub
MrGranday commented on PR #17314: URL: https://github.com/apache/datafusion/pull/17314#issuecomment-3228797126 @milenkovicm can you review it again have edited the `cache fn` and added the `CacheNode` -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] fix: Fix potential resource leak in native shuffle block reader [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove commented on PR #2247: URL: https://github.com/apache/datafusion-comet/pull/2247#issuecomment-3228827455 @wForget fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-27 Thread via GitHub
comphead commented on PR #17018: URL: https://github.com/apache/datafusion/pull/17018#issuecomment-3228905382 @Adez017 I'll take a look today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat(spark): implement Spark `width_bucket` function [datafusion]

2025-08-27 Thread via GitHub
davidlghellin commented on code in PR #17331: URL: https://github.com/apache/datafusion/pull/17331#discussion_r2304571969 ## datafusion/sqllogictest/test_files/spark/math/width_bucket.slt: ## @@ -21,42 +21,58 @@ # For more information, please see: # https://github.com/apache

[I] `auto` scan mode should not select `native_iceberg_compat` when there is an unsupported credential provider [datafusion-comet]

2025-08-27 Thread via GitHub
andygrove opened a new issue, #2248: URL: https://github.com/apache/datafusion-comet/issues/2248 ### Describe the bug `auto` scan mode should not select `native_iceberg_compat` when there is an unsupported credential provider ### Steps to reproduce _No response_ #

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-08-27 Thread via GitHub
milenkovicm commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3228944531 Ballista will create cache partitions locally on some of the executors. Handling the physical part of execution is specific to the implementor, such as Ballista in this case

Re: [PR] feat(spark): implement Spark conditional function if [datafusion]

2025-08-27 Thread via GitHub
chenkovsky commented on code in PR #16946: URL: https://github.com/apache/datafusion/pull/16946#discussion_r2304241929 ## datafusion/spark/src/function/conditional/if.rs: ## @@ -0,0 +1,101 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Improve performance of TPCH q4, q7, q9 at TPCH SF 100 [datafusion]

2025-08-27 Thread via GitHub
ctsk commented on issue #17259: URL: https://github.com/apache/datafusion/issues/17259#issuecomment-3228609342 Looks like you have a copy/paste error in your benchmark code for datafusion @MrPowers https://github.com/MrPowers/querybench/blob/03b615d51f3f3b700516738b543c6b6fdec0665c/

Re: [PR] chore: split hash join to smaller modules [datafusion]

2025-08-27 Thread via GitHub
alamb commented on PR #17300: URL: https://github.com/apache/datafusion/pull/17300#issuecomment-3228613429 EPIC! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Add `TableProvider::scan_with_args` to support pushdown sorting [datafusion]

2025-08-27 Thread via GitHub
adriangb commented on PR #17273: URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3228672310 > There are multiple things going on in this PR: > > 1. Update TableProvider API > 2. Pushdown sorts > 3. rework how filter pushdown works during physical planning >

Re: [PR] feat(spark): implement Spark `width_bucket` function [datafusion]

2025-08-27 Thread via GitHub
davidlghellin commented on PR #17331: URL: https://github.com/apache/datafusion/pull/17331#issuecomment-3228862362 Thanks @comphead , I'm checking this because I don't understand the errors I'm getting on my machine. I'm not sure if it's something related to my setup or if I missed somethin

Re: [PR] Update Scalar_functions.md [datafusion]

2025-08-27 Thread via GitHub
Adez017 commented on PR #17018: URL: https://github.com/apache/datafusion/pull/17018#issuecomment-3228870673 Any Updates @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] feat(spark): implement Spark `width_bucket` function [datafusion]

2025-08-27 Thread via GitHub
davidlghellin commented on code in PR #17331: URL: https://github.com/apache/datafusion/pull/17331#discussion_r2304559249 ## datafusion/sqllogictest/test_files/spark/math/width_bucket.slt: ## @@ -21,42 +21,58 @@ # For more information, please see: # https://github.com/apache

Re: [I] Reset data buf of NativeBatchDecoderIterator on close [datafusion-comet]

2025-08-27 Thread via GitHub
mbutrovich closed issue #2234: Reset data buf of NativeBatchDecoderIterator on close URL: https://github.com/apache/datafusion-comet/issues/2234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] `array_intersect` test fails when default scan is `auto` [datafusion-comet]

2025-08-27 Thread via GitHub
mbutrovich closed issue #2174: `array_intersect` test fails when default scan is `auto` URL: https://github.com/apache/datafusion-comet/issues/2174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] fix: Disable CollectLeft join as it is broken in ballista [datafusion-ballista]

2025-08-27 Thread via GitHub
milenkovicm opened a new pull request, #1301: URL: https://github.com/apache/datafusion-ballista/pull/1301 https://github.com/apache/datafusion-ballista/issues/1055 # Which issue does this PR close? Closes #. Relates to #1055 # Rationale for this change as desc

Re: [PR] chore: Fix `array_intersect` test [datafusion-comet]

2025-08-27 Thread via GitHub
mbutrovich merged PR #2246: URL: https://github.com/apache/datafusion-comet/pull/2246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] fix EquivalenceProperties calculation in DataSourceExec [datafusion]

2025-08-27 Thread via GitHub
adriangb commented on PR #17323: URL: https://github.com/apache/datafusion/pull/17323#issuecomment-3227815845 FWIW we've put this into production and haven't had issues stemming from this change -- This is an automated message from the Apache Git Service. To respond to the message, please

  1   2   >