[PR] fix: synchronize partition bounds reporting in HashJoin [datafusion]

2025-09-05 Thread via GitHub
rkrishn7 opened a new pull request, #17452: URL: https://github.com/apache/datafusion/pull/17452 ## Which issue does this PR close? - Closes #17451 ## What changes are included in this PR? Adds a simple synchronization mechanism (`BoundsWaiter`) for tasks to wait

Re: [I] Ensure dynamic filter expr is built before fetching probe batch in HashJoin [datafusion]

2025-09-05 Thread via GitHub
rkrishn7 commented on issue #17451: URL: https://github.com/apache/datafusion/issues/17451#issuecomment-3261319580 cc @adriangb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] Ensure dynamic filter expr is built before fetching probe batch in HashJoin [datafusion]

2025-09-05 Thread via GitHub
rkrishn7 opened a new issue, #17451: URL: https://github.com/apache/datafusion/issues/17451 ### Describe the bug Presently, partition bounds are reported after the build side is collected during a hash join. However, there is no synchronization to wait until all bounds have been repo

Re: [I] Enable `View` types [datafusion-ballista]

2025-09-05 Thread via GitHub
milenkovicm commented on issue #1294: URL: https://github.com/apache/datafusion-ballista/issues/1294#issuecomment-3261004725 This issue might be a corner case hard to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Implement `partition_statistics` API for `InterleaveExec` [datafusion]

2025-09-05 Thread via GitHub
xudong963 commented on PR #17051: URL: https://github.com/apache/datafusion/pull/17051#issuecomment-3256913306 @liamzwbao Sorry, I missed the PR, will review later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] better preserve statistics when applying limits [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on code in PR #17381: URL: https://github.com/apache/datafusion/pull/17381#discussion_r2326468844 ## datafusion/common/src/stats.rs: ## @@ -391,62 +391,85 @@ impl Statistics { /// parameter to compute global statistics in a multi-partition setting. p

Re: [I] Slow aggregrate query, Polars is 4 times faster for equal query [datafusion]

2025-09-05 Thread via GitHub
alamb commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3259715207 Thank you @valkum I tried the reproducer locally and I do also see the difference reported. I rewrote the query into the equivalent SQL as that was easier to profile for me

Re: [PR] better preserve statistics when applying limits [datafusion]

2025-09-05 Thread via GitHub
xudong963 commented on code in PR #17381: URL: https://github.com/apache/datafusion/pull/17381#discussion_r2326464415 ## datafusion/common/src/stats.rs: ## @@ -391,62 +391,85 @@ impl Statistics { /// parameter to compute global statistics in a multi-partition setting.

Re: [PR] fix bounds accumulator reset in HashJoinExec dynamic filter pushdown [datafusion]

2025-09-05 Thread via GitHub
rkrishn7 commented on code in PR #17371: URL: https://github.com/apache/datafusion/pull/17371#discussion_r2323972667 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -837,7 +842,6 @@ impl ExecutionPlan for HashJoinExec { )?, // Keep the dy

Re: [PR] fix: Expose hash to FFI udf/udaf/udwf to fix their Eq [datafusion]

2025-09-05 Thread via GitHub
findepi commented on PR #17350: URL: https://github.com/apache/datafusion/pull/17350#issuecomment-3257605231 You can think of this the following way. If I replace hash function with a function that always returns the same thing (e.g. `42`), is my code maybe slower, but still _correct_? If y

Re: [PR] feat: Support `FILTER` clause in aggregate window functions [datafusion]

2025-09-05 Thread via GitHub
Jefffrey commented on PR #17378: URL: https://github.com/apache/datafusion/pull/17378#issuecomment-3256506016 > > Would be nice to have a test via DataFrame API if possible. Also for the proto, I think we can raise an issue so we can have it tracked in GitHub. > > Should we also update so

Re: [I] Schema error in a query with `RIGHT ANTI JOIN` (SQLancer) [datafusion]

2025-09-05 Thread via GitHub
kumarUjjawal commented on issue #17390: URL: https://github.com/apache/datafusion/issues/17390#issuecomment-3257605678 @2010YOUY01 isn't logical schema for `Right AntI` intentionally “right-only”. So any references to the left-side table (t0) are out of scope once you’ve applied a `RightAnt

Re: [I] Sanity Check fails on UnionExec + constants [datafusion]

2025-09-05 Thread via GitHub
crepererum closed issue #17372: Sanity Check fails on UnionExec + constants URL: https://github.com/apache/datafusion/issues/17372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-05 Thread via GitHub
xanderbailey commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2325802938 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it

Re: [I] Improved experience when remote object store URL does not end in `/` [datafusion]

2025-09-05 Thread via GitHub
xiedeyantu commented on issue #16302: URL: https://github.com/apache/datafusion/issues/16302#issuecomment-3260145687 I have merged the code you helped me modify, and the CI has passed. Thank you very much! @alamb -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: Support distributed plan in `EXPLAIN` command [datafusion-ballista]

2025-09-05 Thread via GitHub
danielhumanmod commented on PR #1309: URL: https://github.com/apache/datafusion-ballista/pull/1309#issuecomment-3260134193 > looks like `Cargo.lock` needs to be updated and pushed Sorry, forgot to do so, already commit, thanks! -- This is an automated message from the Apache Git Se

[PR] feat: Support more data part expressions [datafusion-comet]

2025-09-05 Thread via GitHub
wForget opened a new pull request, #2316: URL: https://github.com/apache/datafusion-comet/pull/2316 ## Which issue does this PR close? Closes #2315. ## Rationale for this change Support more data part expressions ## What changes are included in this PR?

Re: [I] Enable `View` types [datafusion-ballista]

2025-09-05 Thread via GitHub
Huy1Ng commented on issue #1294: URL: https://github.com/apache/datafusion-ballista/issues/1294#issuecomment-3260119176 I tried to replicate the issue in #1182 with `schema_force_view_types=true` but there is no error raised. @andygrove can you share the configuration? Mine is this: --

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-05 Thread via GitHub
comphead commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3260105559 > I will do it later, thanks! I can back up if needed, let me know -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Refactor TableProvider::scan into TableProvider::scan_with_args [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on code in PR #17336: URL: https://github.com/apache/datafusion/pull/17336#discussion_r2324613129 ## datafusion/catalog/src/table.rs: ## @@ -171,6 +171,41 @@ pub trait TableProvider: Debug + Sync + Send { limit: Option, ) -> Result>; +/// C

Re: [PR] fix: TakeOrderedAndProjectExec is not reporting all fallback reasons [datafusion-comet]

2025-09-05 Thread via GitHub
codecov-commenter commented on PR #2323: URL: https://github.com/apache/datafusion-comet/pull/2323#issuecomment-3260097597 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2323?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-05 Thread via GitHub
shehabgamin commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3260092702 > > The DataFusion 50 upgrade may be challenging for many people. The update in Sail currently involves more than 100 file changes, and I am still not finished. [lakehq/sail

Re: [PR] chore: Refactor serde for named expressions `alias` and `attributeReference` [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove commented on PR #2290: URL: https://github.com/apache/datafusion-comet/pull/2290#issuecomment-3260085485 Thanks for the review @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Introduce a way to represent constrained statistics / bounds on values in Statistics [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on issue #8078: URL: https://github.com/apache/datafusion/issues/8078#issuecomment-3258698264 > > I don't think the arrow proposal handles the subtelty about intervals, known facts, etc but we should at least be aware of them (thanks [@edmondop](https://github.com/edmondo

Re: [PR] chore: Refactor serde for named expressions `alias` and `attributeReference` [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove merged PR #2290: URL: https://github.com/apache/datafusion-comet/pull/2290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: make sql an optional feature [datafusion]

2025-09-05 Thread via GitHub
mbutrovich commented on PR #17332: URL: https://github.com/apache/datafusion/pull/17332#issuecomment-3259502526 > I did a quick check and with the latest push + comet I no longer have any `sql` dependency in the cargo lock file! I _did_ have to make a change in comet, which was to set `data

[PR] fix: TakeOrderedAndProjectExec is not reporting all fallback reasons [datafusion-comet]

2025-09-05 Thread via GitHub
kazuyukitanimura opened a new pull request, #2323: URL: https://github.com/apache/datafusion-comet/pull/2323 ## Which issue does this PR close? Closes #2311 ## Rationale for this change Adding missing reasons for falling back ## What changes are included in this P

[PR] feat: Simplify CASE WHEN true THEN expr to expr [datafusion]

2025-09-05 Thread via GitHub
EeshanBembi opened a new pull request, #17450: URL: https://github.com/apache/datafusion/pull/17450 ## Summary Adds an optimization rule to simplify CASE expressions where the first condition is always true (literal `true`), reducing them to just the THEN expression. This fixe

Re: [PR] Re-enable page index for encrypted Parquet [datafusion]

2025-09-05 Thread via GitHub
alamb merged PR #17426: URL: https://github.com/apache/datafusion/pull/17426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Expose hash to FFI udf/udaf/udwf to fix their Eq [datafusion]

2025-09-05 Thread via GitHub
crystalxyz commented on PR #17350: URL: https://github.com/apache/datafusion/pull/17350#issuecomment-3260015902 Sorry I might be missing some context here. I understand that hash values can only be used to avoid more expensive comparisons if they are unequal. But if hash values are equal, i

Re: [I] Add Binary/LargeBinary/BinaryView/FixedSizeBinary to join_fuzz [datafusion]

2025-09-05 Thread via GitHub
jonathanc-n commented on issue #17447: URL: https://github.com/apache/datafusion/issues/17447#issuecomment-3259966628 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17337: URL: https://github.com/apache/datafusion/pull/17337#issuecomment-3259823889 🤔 this seems to have caused a massive slowdown in the sql planner benchmark somehow: ```shell Benchmarking physical_sorted_union_order_by_300: Warming up for 3. s Warni

[I] bug: Binary op between map and array failed [datafusion-comet]

2025-09-05 Thread via GitHub
comphead opened a new issue, #2321: URL: https://github.com/apache/datafusion-comet/issues/2321 ### Describe the bug ``` Caused by: org.apache.comet.CometNativeException: Cannot evaluate binary expression because of type mismatch: left List(Field { name: "item", data_type: Utf8, n

Re: [PR] Add PhysicalExpr::is_volatile_node to upgrade guide [datafusion]

2025-09-05 Thread via GitHub
adriangb merged PR #17443: URL: https://github.com/apache/datafusion/pull/17443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on PR #17337: URL: https://github.com/apache/datafusion/pull/17337#issuecomment-3259956317 > I also wonder if we should wait for the DataFusion 50 release before merging this I think we should wait until after v50 -- This is an automated message from the Apache G

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17364: URL: https://github.com/apache/datafusion/pull/17364#discussion_r2325838608 ## datafusion/datasource/src/url.rs: ## @@ -242,33 +242,64 @@ impl ListingTableUrl { ) -> Result>> { let exec_options = &ctx.config_options().execution

Re: [I] bug: Binary op between map and array failed [datafusion-comet]

2025-09-05 Thread via GitHub
comphead commented on issue #2321: URL: https://github.com/apache/datafusion-comet/issues/2321#issuecomment-3259944687 The issue is in the discrepancy how nullable fields calculated, The Comet always hardcode nullability flag as null for the ScalarFunction like map_keys(nullable == true),

Re: [PR] fix: lazy evaluation for coalesce [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17357: URL: https://github.com/apache/datafusion/pull/17357#discussion_r2325943637 ## datafusion/sqllogictest/test_files/select.slt: ## @@ -1656,10 +1656,10 @@ query TT explain select coalesce(1, y/x), coalesce(2, y/x) from t; logical_plan

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3259724178 Thanks @BlakeOrth -- I am back and catching up on reviews. I will review this one shortly -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] chore: [1941-Part2]: Introduce map_to_list scalar function [datafusion-comet]

2025-09-05 Thread via GitHub
codecov-commenter commented on PR #2312: URL: https://github.com/apache/datafusion-comet/pull/2312#issuecomment-3259420300 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2312?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on PR #17337: URL: https://github.com/apache/datafusion/pull/17337#issuecomment-3259832326 I'm sure it's just a dumb mistake on my end. Let me do a round of looking at your comments and investigating, thank you for your patience 🙏🏻 -- This is an automated message from

Re: [I] bug: Binary op between map and array failed [datafusion-comet]

2025-09-05 Thread via GitHub
comphead commented on issue #2321: URL: https://github.com/apache/datafusion-comet/issues/2321#issuecomment-3259510295 `map_keys` in comet returns nullability flag not the same as Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17431: URL: https://github.com/apache/datafusion/pull/17431#issuecomment-3259426528 > > I'm surprised binary data isnt part of the join fuzz testing, this could be put in a follow up issue > > Sounds good. Where is the fuzz testing, as I was looking for a place

Re: [I] Slow aggregrate query with `array_agg`, Polars is 4 times faster for equal query [datafusion]

2025-09-05 Thread via GitHub
valkum commented on issue #17446: URL: https://github.com/apache/datafusion/issues/17446#issuecomment-3259922290 Thanks a lot for your investigation. I am OOO until Wednesday. I have to check if I have enough time to work on an updated attempt. But I certainly want to give it a try. -- T

Re: [PR] docs: Add manual redirects from old pages that no longer exist [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove merged PR #2317: URL: https://github.com/apache/datafusion-comet/pull/2317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Push down sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2325658057 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2525,6 +2525,8 @@ pub struct TableScan { pub filters: Vec, /// Optional number of rows to read pub

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-05 Thread via GitHub
alamb commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3258826499 Update: I took a fairly extensive editing pass and made it up through this section: ``` ## Implementation for Hash Join Operator ``` I need to do some other things

[I] Unsupported expressions found in spark sql unit test [datafusion-comet]

2025-09-05 Thread via GitHub
wForget opened a new issue, #2314: URL: https://github.com/apache/datafusion-comet/issues/2314 ### What is the problem the feature request solves? Some unsupported expressions found in spark sql unit test: - [ ] `Abs`: Duplicate of https://github.com/apache/datafusion-comet/iss

Re: [PR] Add PhysicalExpr::is_volatile [datafusion]

2025-09-05 Thread via GitHub
findepi commented on code in PR #17351: URL: https://github.com/apache/datafusion/pull/17351#discussion_r2324504677 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -377,6 +377,19 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2326104726 ## datafusion/optimizer/src/push_down_sort.rs: ## @@ -0,0 +1,580 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] fix: prevent UnionExec panic with empty inputs [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17449: URL: https://github.com/apache/datafusion/pull/17449#discussion_r2326058904 ## datafusion/physical-plan/src/union.rs: ## @@ -101,19 +101,23 @@ pub struct UnionExec { impl UnionExec { /// Create a new UnionExec -pub fn new(inputs:

Re: [PR] feat: Simplify CASE WHEN true THEN expr to expr [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17450: URL: https://github.com/apache/datafusion/pull/17450#discussion_r2326050206 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1399,6 +1399,18 @@ impl TreeNodeRewriter for Simplifier<'_, S> { // Rules f

Re: [I] Enable the `ListFilesCache` to be available for partitioned tables [datafusion]

2025-09-05 Thread via GitHub
alamb commented on issue #17211: URL: https://github.com/apache/datafusion/issues/17211#issuecomment-3259722731 Interestingly enough I was reviewing https://github.com/apache/datafusion/pull/17364 from @xiedeyantu and it seems like there is already code to look for the listing files cache

Re: [I] Simplify `CASE WHEN true ...` [datafusion]

2025-09-05 Thread via GitHub
EeshanBembi commented on issue #17448: URL: https://github.com/apache/datafusion/issues/17448#issuecomment-3259671018 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] minor: `UnionExec` inputs validation [datafusion]

2025-09-05 Thread via GitHub
EeshanBembi commented on issue #17052: URL: https://github.com/apache/datafusion/issues/17052#issuecomment-3259667625 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] fix: prevent UnionExec panic with empty inputs [datafusion]

2025-09-05 Thread via GitHub
EeshanBembi opened a new pull request, #17449: URL: https://github.com/apache/datafusion/pull/17449 ## Summary This PR fixes a panic in `UnionExec` when constructed with empty inputs, replacing the crash with proper error handling and descriptive error messages. **Fixes:** #170

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-09-05 Thread via GitHub
milenkovicm commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3258739674 looking at spark, cache is represented as `InMemoryRelation` : ```python from pyspark.sql import SparkSession spark : SparkSession = SparkSession.builder.getOrCr

Re: [PR] chore: Split expression serde hash map into separate categories & update documentation [datafusion-comet]

2025-09-05 Thread via GitHub
codecov-commenter commented on PR #2322: URL: https://github.com/apache/datafusion-comet/pull/2322#issuecomment-3259640247 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2322?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: lazy evaluation for coalesce [datafusion]

2025-09-05 Thread via GitHub
mbutrovich commented on PR #17357: URL: https://github.com/apache/datafusion/pull/17357#issuecomment-3259624716 > Thanks @chenkovsky and @nuno-faria -- I think this PR is quite good and probably can be merged. My only potential concern is that we may mess up comet. Let's see if we get any m

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-05 Thread via GitHub
stuartcarnie commented on PR #17431: URL: https://github.com/apache/datafusion/pull/17431#issuecomment-3259603623 > I added some additional sqllogictests as @jonathanc-n suggested and verified they fail without the code changes in this PR That's great, thanks @alamb! -- This is an

[I] Simplify `CASE WHEN true ...` [datafusion]

2025-09-05 Thread via GitHub
alamb opened a new issue, #17448: URL: https://github.com/apache/datafusion/issues/17448 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/datafusion/pull/17357 I noticed that expressions like ```sql CASE WHEN true THEN

Re: [I] DictionaryKeyOverflowError on DataFrame.write_parquet [datafusion]

2025-09-05 Thread via GitHub
valkum commented on issue #17445: URL: https://github.com/apache/datafusion/issues/17445#issuecomment-3259550049 If you stick to the Utf8 datatype for `market`, the automatic dict encoding done by the parquet writer works. So assume some operation is not properly merging the dict keys. The

Re: [I] Improved experience when remote object store URL does not end in `/` [datafusion]

2025-09-05 Thread via GitHub
alamb commented on issue #16302: URL: https://github.com/apache/datafusion/issues/16302#issuecomment-3259575086 New Prs are up: - https://github.com/apache/datafusion/pull/17364 - https://github.com/xiedeyantu/datafusion/pull/2 -- This is an automated message from the Apache Git Serv

Re: [I] Documentation issues [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove closed issue #2319: Documentation issues URL: https://github.com/apache/datafusion-comet/issues/2319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[PR] chore: Split expression serde hash map into separate categories & update documentation [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove opened a new pull request, #2322: URL: https://github.com/apache/datafusion-comet/pull/2322 ## Which issue does this PR close? N/A ## Rationale for this change This makes it much easier to compare code and documentation. https://github.com

Re: [PR] feat: make sql an optional feature [datafusion]

2025-09-05 Thread via GitHub
timsaucer commented on PR #17332: URL: https://github.com/apache/datafusion/pull/17332#issuecomment-3259537242 I had to rebase, so you might want to pull again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: make sql an optional feature [datafusion]

2025-09-05 Thread via GitHub
timsaucer commented on PR #17332: URL: https://github.com/apache/datafusion/pull/17332#issuecomment-3259495898 > I was playing with this PR this morning to see if Comet could use it, and it looks like `datafusion-sql` is brought in with the `nested_expressions` feature (which Comet uses). I

Re: [I] bug: Binary op between map and array failed [datafusion-comet]

2025-09-05 Thread via GitHub
comphead commented on issue #2321: URL: https://github.com/apache/datafusion-comet/issues/2321#issuecomment-3259489120 The issue is in nullability flag, the binary comparison in DF is done by ``` if left_data_type.is_nested() { if !left_data_type.equals_data

Re: [I] Improve some confusing fallback reasons [datafusion-comet]

2025-09-05 Thread via GitHub
mbutrovich closed issue #2300: Improve some confusing fallback reasons URL: https://github.com/apache/datafusion-comet/issues/2300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: Improve some confusing fallback reasons [datafusion-comet]

2025-09-05 Thread via GitHub
mbutrovich merged PR #2301: URL: https://github.com/apache/datafusion-comet/pull/2301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: Add nested Array literal support [datafusion-comet]

2025-09-05 Thread via GitHub
comphead commented on PR #2181: URL: https://github.com/apache/datafusion-comet/pull/2181#issuecomment-3259483674 Test failed because of https://github.com/apache/datafusion-comet/issues/2321 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] docs: Fix broken links and other Sphinx warnings [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove merged PR #2320: URL: https://github.com/apache/datafusion-comet/pull/2320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Friendlier SQL in `SELECT` and `WHERE` [datafusion]

2025-09-05 Thread via GitHub
rkrishn7 commented on issue #17361: URL: https://github.com/apache/datafusion/issues/17361#issuecomment-3259414542 @alamb What are your thoughts on this? If it makes sense to you, happy to drive the draft PR up I have home (should just need some tests). -- This is an automated message fro

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17364: URL: https://github.com/apache/datafusion/pull/17364#issuecomment-3259429044 Sorry @xiedeyantu -- I have been on vacation. I am looking at this PR now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[I] Add Binary/LargeBinary/BinaryView/FixedSizeBinary to join_fuzz [datafusion]

2025-09-05 Thread via GitHub
alamb opened a new issue, #17447: URL: https://github.com/apache/datafusion/issues/17447 Thank you @stuartcarnie this looks good to me. I'm surprised binary data isnt part of the join fuzz testing, this could be put in a follow up issue _Originally posted by @jonathanc-n in https://g

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17431: URL: https://github.com/apache/datafusion/pull/17431#discussion_r2325818051 ## datafusion/physical-plan/src/joins/sort_merge_join/exec.rs: ## @@ -1923,6 +1974,100 @@ mod tests { Ok(()) } +#[tokio::test] +async fn join

Re: [I] Remove workaround disabling Parquet page index reading for encrypted files [datafusion]

2025-09-05 Thread via GitHub
alamb closed issue #17352: Remove workaround disabling Parquet page index reading for encrypted files URL: https://github.com/apache/datafusion/issues/17352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] fix: Support aggregate expressions in `QUALIFY` [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17313: URL: https://github.com/apache/datafusion/pull/17313#discussion_r2325782336 ## datafusion/sql/src/select.rs: ## @@ -944,15 +963,41 @@ impl SqlToRel<'_, S> { check_columns_satisfy_exprs( &column_exprs_post_aggr,

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17337: URL: https://github.com/apache/datafusion/pull/17337#issuecomment-3259288800 Here is a PR that avoids some clones, which might improve performance - https://github.com/pydantic/datafusion/pull/39 -- This is an automated message from the Apache Git Service.

[I] Slow aggregrate query, Polars is 4 times faster for equal query [datafusion]

2025-09-05 Thread via GitHub
valkum opened a new issue, #17446: URL: https://github.com/apache/datafusion/issues/17446 ### Describe the bug I encountered a slow query, which I already mentioned in Discord but got no answer. I went ahead and created a repro case with data I can share that showcases this slow quer

Re: [PR] fix: output_ordering converted to Vec> [datafusion]

2025-09-05 Thread via GitHub
destrex271 commented on PR #17439: URL: https://github.com/apache/datafusion/pull/17439#issuecomment-3259375391 @alamb , can you please trigger the test workflows again? Just updated the branch -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] fix: Implement AggregateUDFImpl::reverse_expr for StringAgg [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17165: URL: https://github.com/apache/datafusion/pull/17165#issuecomment-3259369563 I added a test using `string_agg` with both orderings in https://github.com/apache/datafusion/pull/17165/commits/84d58ac9380b925b7883ae915a7bcfec70674130 and it looks good to me

Re: [PR] fix: output_ordering converted to Vec> [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17439: URL: https://github.com/apache/datafusion/pull/17439#issuecomment-3259346255 @crepererum can you please review this PR (as you filed the request for https://github.com/apache/datafusion/issues/17354) -- This is an automated message from the Apache Git Servic

Re: [PR] Improve `PartialEq`, `Eq` speed for `LexOrdering`, make `PartialEq` and `PartialOrd` consistent [datafusion]

2025-09-05 Thread via GitHub
alamb commented on code in PR #17442: URL: https://github.com/apache/datafusion/pull/17442#discussion_r2325732445 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -414,27 +427,28 @@ impl LexOrdering { self.exprs.truncate(len); true } +} -//

[I] DictionaryKeyOverflowError on DataFrame.write_parquet [datafusion]

2025-09-05 Thread via GitHub
valkum opened a new issue, #17445: URL: https://github.com/apache/datafusion/issues/17445 ### Describe the bug I tried to come up with a repro case for a slow query (compared to polars) and encountered this bug. As the error is originating in https://github.com/apache/arrow-rs, I ass

Re: [PR] feat: make sql an optional feature [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17332: URL: https://github.com/apache/datafusion/pull/17332#issuecomment-3259351259 > I was playing with this PR this morning to see if Comet could use it, and it looks like `datafusion-sql` is brought in with the `nested_expressions` feature (which Comet uses). I won

Re: [PR] chore: Collect fallback reasons for spark sql tests [datafusion-comet]

2025-09-05 Thread via GitHub
mbutrovich merged PR #2313: URL: https://github.com/apache/datafusion-comet/pull/2313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Fix regressions in `CometToPrettyStringSuite` [datafusion-comet]

2025-09-05 Thread via GitHub
hsiang-c commented on issue #2307: URL: https://github.com/apache/datafusion-comet/issues/2307#issuecomment-3259234523 Take. I took a look at the test and I think `c13: binary` column might be the culprit. This might be related to https://github.com/apache/datafusion-comet/issues/37

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-05 Thread via GitHub
alamb commented on PR #17337: URL: https://github.com/apache/datafusion/pull/17337#issuecomment-3259231839 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~

Re: [I] Add push down sort to the source (table provider) [datafusion]

2025-09-05 Thread via GitHub
alamb commented on issue #10433: URL: https://github.com/apache/datafusion/issues/10433#issuecomment-3259221766 @adriangb is in the process of implementing "preferred sort orders" in - https://github.com/apache/datafusion/issues/17348 -- This is an automated message from the Apache Gi

Re: [PR] feat: allow passing a slice to and expression with the [] indexing [datafusion-python]

2025-09-05 Thread via GitHub
timsaucer merged PR #1215: URL: https://github.com/apache/datafusion-python/pull/1215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Consolidate statistics aggregation [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on issue #8229: URL: https://github.com/apache/datafusion/issues/8229#issuecomment-3259053140 Bump on seeing if we should close this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Refactor TableProvider::scan into TableProvider::scan_with_args [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on code in PR #17336: URL: https://github.com/apache/datafusion/pull/17336#discussion_r2325523600 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1166,6 +1166,22 @@ impl TableProvider for ListingTable { filters: &[Expr], limit: Opt

[PR] docs: Add manual redirects from old pages that no longer exist [datafusion-comet]

2025-09-05 Thread via GitHub
andygrove opened a new pull request, #2317: URL: https://github.com/apache/datafusion-comet/pull/2317 ## Which issue does this PR close? N/A ## Rationale for this change Now that the user guide moved from `/user-guide` to `/user-guide/latest`, users will

Re: [PR] Refactor HashJoinExec to progressively accumulate dynamic filter bounds instead of computing them after data is accumulated [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on PR #17444: URL: https://github.com/apache/datafusion/pull/17444#issuecomment-3258798713 cc @nuno-faria -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-05 Thread via GitHub
adriangb commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3258925193 I just gave this a read through and think it's looking great! I'd like to add a benchmark showing join performance numbers (@nuno-faria I think you had something already, would you

[PR] Improve Hash and Ord speed for dyn LogicalType [datafusion]

2025-09-05 Thread via GitHub
findepi opened a new pull request, #17437: URL: https://github.com/apache/datafusion/pull/17437 `LogicalType::signature` is logical type's unique identifier. It is the method contract and is leveraged in `Eq` impl for `dyn LogicalType`. Leverage this in `Hash` and `Ord` too. -- This

Re: [PR] Add PhysicalExpr::is_volatile [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on code in PR #17351: URL: https://github.com/apache/datafusion/pull/17351#discussion_r2324658328 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -560,3 +573,26 @@ pub fn is_dynamic_physical_expr(expr: &Arc) -> bool { // If the generation i

Re: [PR] better preserve statistics when applying limits [datafusion]

2025-09-05 Thread via GitHub
adriangb commented on code in PR #17381: URL: https://github.com/apache/datafusion/pull/17381#discussion_r2324805601 ## datafusion/common/src/stats.rs: ## @@ -391,62 +391,85 @@ impl Statistics { /// parameter to compute global statistics in a multi-partition setting. p

Re: [PR] fix: updated `output_ordering` in file_scan_config.rs to use Vec> instead of just Vec [datafusion]

2025-09-05 Thread via GitHub
destrex271 closed pull request #17423: fix: updated `output_ordering` in file_scan_config.rs to use Vec> instead of just Vec URL: https://github.com/apache/datafusion/pull/17423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

  1   2   >