Re: [PR] GC string views on hash join build side [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 commented on PR #16463: URL: https://github.com/apache/datafusion/pull/16463#issuecomment-2990142949 Thank you! This is great. I got some minor suggestions: 1. The issue explained the motivation of the change clearly, I recommend to add the same rationale to the code commen

Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-20 Thread via GitHub
rishvin commented on issue #1820: URL: https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2990142584 > [@andygrove](https://github.com/andygrove) can I backport [SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of datafusion ? I tried updating with d

Re: [PR] Ignore `sort_query_fuzzer_runner` [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 merged PR #16462: URL: https://github.com/apache/datafusion/pull/16462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add compression option to SpillManager [datafusion]

2025-06-20 Thread via GitHub
xudong963 commented on PR #16268: URL: https://github.com/apache/datafusion/pull/16268#issuecomment-2990161090 > 'here is the API changes that might break your system during upgrades', and this PR is like a new feature you might want to try in the new release -- do we have a separate place

Re: [I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-20 Thread via GitHub
hendrikmakait commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2990158610 > Now for datafusion, maybe the best thing to do is simply sum(get_record_batch_memory_size()), this should work for common cases, and open an github issue to explain the

Re: [PR] GC string views on hash join build side [datafusion]

2025-06-20 Thread via GitHub
ctsk commented on PR #16463: URL: https://github.com/apache/datafusion/pull/16463#issuecomment-2990163724 I will do both of these things later today. I am concerned about the performance impact for smaller-scale tasks. I suspect many users of datafusion are not doing such large joins

Re: [PR] Ignore `sort_query_fuzzer_runner` [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 commented on PR #16462: URL: https://github.com/apache/datafusion/pull/16462#issuecomment-2990191348 > Isn't this fixed by #16465? Ah yes, and thank you for the pointer. #16465 suppressed the specific failing test case, so now we should be able to let the fuzzer run

Re: [PR] Make clickbench query IDs 1-based [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on PR #16455: URL: https://github.com/apache/datafusion/pull/16455#issuecomment-2990187740 @AdamGS would a PR that splits queries.sql into a file per query be acceptable? There's precedent for that in some of the other benchmarks, and that seems to be a reasonable comprom

Re: [PR] Fix parsing error when having fields after nested struct in BigQuery [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
git-hulk commented on PR #1897: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1897#issuecomment-2990949535 @iffyio Could you please take a look at this fix? The root cause is that it might have more field definitions after the trailing bracket, but it is now always regarded a

[PR] Fix parsing error when having fields after nested struct in BigQuery [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
git-hulk opened a new pull request, #1897: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1897 Before applying this patch, the following SQL cannot be parsed: ```SQL CREATE TABLE my_table ( f0 STRING, f1 STRUCT>, f2 STRING, ) ``` But it's a v

[PR] Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage [datafusion-python]

2025-06-20 Thread via GitHub
kosiew opened a new pull request, #1161: URL: https://github.com/apache/datafusion-python/pull/1161 ## Which issue does this PR close? - Closes #1158 ## Rationale for this change Currently, the documentation includes redundant entries for "DataFrame" and "API", which cau

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2991004224 @alamb @Dandandan Another update. The realization that Tokio is not P/E-core aware out of the box and that you have very little to no explicit control over thread affinity on macOS

Re: [PR] Fix duplicate field name error in Join::try_new_with_project_input during physical planning [datafusion]

2025-06-20 Thread via GitHub
LiaCastaneda commented on code in PR #16454: URL: https://github.com/apache/datafusion/pull/16454#discussion_r2158698230 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1667,6 +1667,34 @@ pub fn build_join_schema( dfschema.with_functional_dependencies(func_dependenci

Re: [I] Make `datafusion-cli` read parquet folders [datafusion]

2025-06-20 Thread via GitHub
hendrikmakait commented on issue #16460: URL: https://github.com/apache/datafusion/issues/16460#issuecomment-2991018757 That's what it looks like, I couldn't reproduce the problem with a directory that only contains `.parquet` files. -- This is an automated message from the Apache Git Ser

Re: [I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2990112035 > > A quick reminder for someone who is willing to implement it: It's possible that multiple `Array`s share the same underlying buffer -- those `Array`s can be within the sam

Re: [I] Support compression in spill files [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 closed issue #16130: Support compression in spill files URL: https://github.com/apache/datafusion/issues/16130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Add compression option to SpillManager [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 merged PR #16268: URL: https://github.com/apache/datafusion/pull/16268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Make clickbench query IDs 1-based [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on PR #16455: URL: https://github.com/apache/datafusion/pull/16455#issuecomment-2990302693 Ok, thanks for your opinion. Little developer quality of life improvements like this are worth it imo. I'll make a PR that contains the necessary code changes and a little bash scri

[PR] fix: document and fix macro hygiene for `config_field!` [datafusion]

2025-06-20 Thread via GitHub
crepererum opened a new pull request, #16473: URL: https://github.com/apache/datafusion/pull/16473 ## Which issue does this PR close? \- ## Rationale for this change The macro -- while being public and actually helpful -- is currently broken & undocumented. ## What change

[PR] Clickhouse map settings support [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
solontsev opened a new pull request, #1895: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1895 https://clickhouse.com/docs/operations/settings/settings#additional_table_filters Current implementation only parses simple settings with key = value format -- This is an au

Re: [PR] Clickhouse map settings support [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
solontsev closed pull request #1895: Clickhouse map settings support URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[I] Docker build kube/Dockerfile failed with ### COMPILER BUG DETECTED ### [datafusion-comet]

2025-06-20 Thread via GitHub
zhangx opened a new issue, #1917: URL: https://github.com/apache/datafusion-comet/issues/1917 ### Describe the bug I got this error when run`docker build -t apache/datafusion-comet -f kube/Dockerfile .` to build a docker image. ```shell #11 16.43Compiling untrusted v0.9.

Re: [PR] Make clickbench query IDs 1-based [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on PR #16455: URL: https://github.com/apache/datafusion/pull/16455#issuecomment-2991261702 Take 2 in #16476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Result from `array_has` depends on runtime array type [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on issue #16459: URL: https://github.com/apache/datafusion/issues/16459#issuecomment-2991278170 Stepping through this with the debugger, it seems like a string view vector containing a `null` array is treated as a vector of length `0` (or at least `is_empty() == true`) wh

Re: [PR] Ignore `sort_query_fuzzer_runner` [datafusion]

2025-06-20 Thread via GitHub
ozankabak commented on PR #16462: URL: https://github.com/apache/datafusion/pull/16462#issuecomment-2990175453 Isn't this fixed by #16465? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] Revert "Ignore `sort_query_fuzzer_runner` (#16462)" [datafusion]

2025-06-20 Thread via GitHub
2010YOUY01 opened a new pull request, #16470: URL: https://github.com/apache/datafusion/pull/16470 This reverts commit aa1e6dac7c5dfdfb2c0f52282638886bff194a5d. ## Which issue does this PR close? - Closes #. ## Rationale for this change See https://gith

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-06-20 Thread via GitHub
xudong963 commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2990200877 @kosiew 's suggestions look good to me, I have one another question: **FilterPushdownPhase::Pre** IMO, `FilterPushdownPhase::Pre` runs before most other physical opti

[PR] feat: derive `Debug` and `Clone` for `ScalarFunctionArgs` [datafusion]

2025-06-20 Thread via GitHub
crepererum opened a new pull request, #16471: URL: https://github.com/apache/datafusion/pull/16471 ## Which issue does this PR close? \- ## Rationale for this change Better DX. ## What changes are included in this PR? Auto-derives. ## Are these changes tested?

[PR] feat: make `with_work_table` a trait method for `ExecutionPlan` [datafusion]

2025-06-20 Thread via GitHub
geoffreyclaude opened a new pull request, #16469: URL: https://github.com/apache/datafusion/pull/16469 ## Which issue does this PR close? This PR addresses the extensibility limitation that causes recursive queries to fail when used with the `datafusion-tracing` crate, as reported in

[I] array_has function returns null for an empty list ([]) instead of false [datafusion]

2025-06-20 Thread via GitHub
bert-beyondloops opened a new issue, #16474: URL: https://github.com/apache/datafusion/issues/16474 ### Describe the bug When using the array_has function with an empty list, the result is Currently : array_has([], 1) => **null**. array_has(null, 1) => null.

[I] Unify write_parquet signatures [datafusion-python]

2025-06-20 Thread via GitHub
timsaucer opened a new issue, #1162: URL: https://github.com/apache/datafusion-python/issues/1162 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This is a follow on to https://github.com/apache/datafusion-python/pull/1123

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-20 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-2991145431 > One other thing I'm curious about. This write-up discusses the change in terms of enabling long-running tasks to be cancelled, but would making CPU-intensive exec blocks more cooperat

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-20 Thread via GitHub
alamb commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-2991169813 @pepijnve I pushed an Acknowedgement and "About DataFusion" sections ![Screenshot 2025-06-20 at 7 41 54  AM](https://github.com/user-attachments/assets/f9a880b2-cc56-4bf8-ae12-aca

Re: [PR] GC string views on hash join build side [datafusion]

2025-06-20 Thread via GitHub
Dandandan commented on PR #16463: URL: https://github.com/apache/datafusion/pull/16463#issuecomment-2990166497 We have quite some implementations of gc-ing arrays. I am wondering in this case if the performance can be improved for smaller tables by this heuristic used here: https://

[PR] Optimize allocation rate for `int64` array in `hex` function [datafusion]

2025-06-20 Thread via GitHub
Fly-Style opened a new pull request, #16483: URL: https://github.com/apache/datafusion/pull/16483 ## Which issue does this PR close? Comment from #15947 regarding potentially optimized memory usage https://github.com/apache/datafusion/pull/15947#discussion_r2073948962 ## Rati

[I] Add tests for yielding / cancelling in SpillManager [datafusion]

2025-06-20 Thread via GitHub
alamb opened a new issue, #16482: URL: https://github.com/apache/datafusion/issues/16482 We also need to add spill merge testing case since we add the yield to spill manager already. _Originally posted by @zhuqi-lucas in https://github.com/apache/datafusion/pull/16398#discussion_r215

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
alamb commented on code in PR #16398: URL: https://github.com/apache/datafusion/pull/16398#discussion_r2159146937 ## datafusion/core/tests/execution/coop.rs: ## @@ -0,0 +1,722 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2991894796 @alamb thanks for taking the time to verify a bit further on your end. Just FYI (and some shameless PR promotion), #16476 and #16477 will help a bit in making investigation of indiv

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2991901283 @alamb I agree with merging this PR, the change should not affect the performance since we already use MPSC for SPM, etc, thanks! -- This is an automated message from the Apach

Re: [PR] feat: derive `Debug` and `Clone` for `ScalarFunctionArgs` [datafusion]

2025-06-20 Thread via GitHub
crepererum merged PR #16471: URL: https://github.com/apache/datafusion/pull/16471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] fix: conflict between #1905 and #1892. [datafusion-comet]

2025-06-20 Thread via GitHub
codecov-commenter commented on PR #1919: URL: https://github.com/apache/datafusion-comet/pull/1919#issuecomment-2991924758 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1919?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] Minor: Add more links to cooperative / scheduling docs [datafusion]

2025-06-20 Thread via GitHub
alamb opened a new pull request, #16484: URL: https://github.com/apache/datafusion/pull/16484 - draft as it builds on https://github.com/apache/datafusion/pull/16398 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/16398 from @pepijnve

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
alamb commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159067305 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

[PR] Record `output_bytes` in `EXPLAIN ANALYZE` [datafusion]

2025-06-20 Thread via GitHub
hendrikmakait opened a new pull request, #16481: URL: https://github.com/apache/datafusion/pull/16481 ## Which issue does this PR close? - Closes #16244. ## Rationale for this change ## What changes are included in this PR? ## Are these chan

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16395: URL: https://github.com/apache/datafusion/pull/16395#issuecomment-2991749429 > The example print logs, it's good, thanks! this is so cool! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-20 Thread via GitHub
timsaucer commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2991776198 Sorry it took so long to respond. I went to reproduce the error and took a few minutes to dig into this. I found the bug and will be putting up a PR in the upstream repo

[PR] fix: column indices in FFI partition evaluator [datafusion]

2025-06-20 Thread via GitHub
timsaucer opened a new pull request, #16480: URL: https://github.com/apache/datafusion/pull/16480 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion-python/issues/1144 - ## Rationale for this change There is a bug in how we compute the indice

Re: [PR] Fix constant window for evaluate stateful [datafusion]

2025-06-20 Thread via GitHub
andygrove commented on PR #16430: URL: https://github.com/apache/datafusion/pull/16430#issuecomment-2991855995 > [apache/datafusion-comet#1913](https://github.com/apache/datafusion-comet/pull/1913) @andygrove CI seems to have passed.🎉 Yes, I confirmed that the test passes now:

Re: [PR] [ignore] test DataFusion PR: Fix constant window for evaluate stateful [datafusion-comet]

2025-06-20 Thread via GitHub
andygrove closed pull request #1913: [ignore] test DataFusion PR: Fix constant window for evaluate stateful URL: https://github.com/apache/datafusion-comet/pull/1913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] 404 for indexed docs page [datafusion]

2025-06-20 Thread via GitHub
adriangb closed issue #16438: 404 for indexed docs page URL: https://github.com/apache/datafusion/issues/16438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159136194 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-20 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2991723331 > Do you think we need to do `48.0.1` to include fix for [#16444](https://github.com/apache/datafusion/issues/16444)? I think we can argue if that was a regression or not (i

Re: [I] Add `output_bytes` metrics to Explain Analyze [datafusion]

2025-06-20 Thread via GitHub
hendrikmakait commented on issue #16244: URL: https://github.com/apache/datafusion/issues/16244#issuecomment-2991828951 It looks like there are other sets of metrics that would benefit from adding `output_bytes` for improved data in `EXPLAIN ANALYZE`, namely `UnnestMetrics`, `SortMergeJoinM

[PR] fix: conflict between #1905 and #1892. [datafusion-comet]

2025-06-20 Thread via GitHub
mbutrovich opened a new pull request, #1919: URL: https://github.com/apache/datafusion-comet/pull/1919 ## Which issue does this PR close? Closes #. ## Rationale for this change fix compilation on main ## What changes are included in this PR?

Re: [PR] Add an example of embedding indexes inside a parquet file [datafusion]

2025-06-20 Thread via GitHub
zhuqi-lucas commented on code in PR #16395: URL: https://github.com/apache/datafusion/pull/16395#discussion_r2159119104 ## datafusion-examples/examples/embedding_parquet_indexes.rs: ## @@ -0,0 +1,380 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

[PR] Support for Map values in ClickHouse settings [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
solontsev opened a new pull request, #1896: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1896 https://clickhouse.com/docs/operations/settings/settings#additional_table_filters Current implementation only parses simple settings with key = value format -- This is an au

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2991881151 Thank you also @zhuqi-lucas for all your help and attention to this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2991878307 > Zooming in on query 4 (it would be super convenient if these had 1-based indices BTW to match line numbers in the file). Yes i also find the ClickHouse numbering schemes confu

Re: [PR] Fix parsing error when having fields after nested struct in BigQuery [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
iffyio commented on code in PR #1897: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1897#discussion_r2158909944 ## tests/sqlparser_bigquery.rs: ## @@ -2472,3 +2472,52 @@ fn test_struct_field_options() { ")", )); } + +#[test] +fn test_struct_trailing

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-06-20 Thread via GitHub
adriangb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2991542430 > Is there a duplication with push_down_filter in the logical optimizer? I don't think so. The logical optimizer pushes things down into the table scan phase, the physica

Re: [I] Advanced Interval Analysis [datafusion]

2025-06-20 Thread via GitHub
rutvik-max commented on issue #14515: URL: https://github.com/apache/datafusion/issues/14515#issuecomment-2991372461 Great to connect—excited to see what we can build together! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Enable schema evolution for nested structs via adapt_column and custom adapter support in ListingTable [datafusion]

2025-06-20 Thread via GitHub
adriangb commented on code in PR #16371: URL: https://github.com/apache/datafusion/pull/16371#discussion_r2158922999 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -83,17 +85,16 @@ pub struct ListingTableConfig { pub options: Option, /// Tracks the source of

Re: [PR] Fix `impl Ord for Ident` [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
iffyio commented on PR #1893: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1893#issuecomment-2991529806 Thanks @eliaperantoni! Could you add some test cases to demonstrate the behavior being fixed? (Also there's some CI failures on the branch that need adressing it seems)

[PR] fix `limit` in subqueries [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
Dimchikkk opened a new pull request, #1899: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1899 Fixes https://github.com/apache/datafusion-sqlparser-rs/issues/1898. The bug was introduced in https://github.com/apache/datafusion-sqlparser-rs/pull/1793 I think. -- This i

Re: [PR] chore(deps): Update sqlparser to 0.56.0 [datafusion]

2025-06-20 Thread via GitHub
Dimchikkk commented on PR #16456: URL: https://github.com/apache/datafusion/pull/16456#issuecomment-2991498323 (one of the test failures is related to [#1898](https://github.com/apache/datafusion-sqlparser-rs/issues/1898) and would be fixed if that patch is backported to 0.56) -- This is

Re: [PR] Fix parsing error when having fields after nested struct in BigQuery [datafusion-sqlparser-rs]

2025-06-20 Thread via GitHub
git-hulk commented on code in PR #1897: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1897#discussion_r2158957718 ## tests/sqlparser_bigquery.rs: ## @@ -2472,3 +2472,52 @@ fn test_struct_field_options() { ")", )); } + +#[test] +fn test_struct_traili

Re: [PR] Redirect user defined function webpage [datafusion]

2025-06-20 Thread via GitHub
adriangb merged PR #16475: URL: https://github.com/apache/datafusion/pull/16475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Revert "Ignore `sort_query_fuzzer_runner` (#16462)" [datafusion]

2025-06-20 Thread via GitHub
alamb merged PR #16470: URL: https://github.com/apache/datafusion/pull/16470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Revert "Ignore `sort_query_fuzzer_runner` (#16462)" [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16470: URL: https://github.com/apache/datafusion/pull/16470#issuecomment-2991746450 Thank you everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support array_max [datafusion-comet]

2025-06-20 Thread via GitHub
mbutrovich merged PR #1892: URL: https://github.com/apache/datafusion-comet/pull/1892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Fix duplicate field name error in Join::try_new_with_project_input during physical planning [datafusion]

2025-06-20 Thread via GitHub
LiaCastaneda commented on code in PR #16454: URL: https://github.com/apache/datafusion/pull/16454#discussion_r2158853002 ## datafusion/core/src/physical_planner.rs: ## @@ -969,8 +972,24 @@ impl DefaultPhysicalPlanner { // Remove temporary projected columns

Re: [PR] Enable schema evolution for nested structs via adapt_column and custom adapter support in ListingTable [datafusion]

2025-06-20 Thread via GitHub
adriangb commented on code in PR #16371: URL: https://github.com/apache/datafusion/pull/16371#discussion_r2158922999 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -83,17 +85,16 @@ pub struct ListingTableConfig { pub options: Option, /// Tracks the source of

Re: [I] 404 for indexed docs page [datafusion]

2025-06-20 Thread via GitHub
alamb commented on issue #16438: URL: https://github.com/apache/datafusion/issues/16438#issuecomment-2991716663 I just tried accessing - https://datafusion.apache.org/library-user-guide/adding-udfs.html And it correctly goes to the new page now - https://datafusion.apache.org/lib

Re: [PR] Fix constant window for evaluate stateful [datafusion]

2025-06-20 Thread via GitHub
suibianwanwank commented on PR #16430: URL: https://github.com/apache/datafusion/pull/16430#issuecomment-2991732448 > Thank you very much @suibianwanwank -- this looks great to me > > I took the liberty of merging up from main and adding a link to the github issue in the test but I th

Re: [PR] Temporarily fix bug in dynamic top-k optimization [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16465: URL: https://github.com/apache/datafusion/pull/16465#issuecomment-2991746831 Thanks @adriangb and @blaginin for sorting this out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] minor: fix kube/Dockerfile build failed [datafusion-comet]

2025-06-20 Thread via GitHub
codecov-commenter commented on PR #1918: URL: https://github.com/apache/datafusion-comet/pull/1918#issuecomment-2991761756 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1918?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: conflict between #1905 and #1892. [datafusion-comet]

2025-06-20 Thread via GitHub
mbutrovich merged PR #1919: URL: https://github.com/apache/datafusion-comet/pull/1919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Fix constant window for evaluate stateful [datafusion]

2025-06-20 Thread via GitHub
alamb commented on code in PR #16430: URL: https://github.com/apache/datafusion/pull/16430#discussion_r2158877685 ## datafusion/core/tests/physical_optimizer/window_optimize.rs: ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] Add dynamic filter (bounds) pushdown to HashJoinExec [datafusion]

2025-06-20 Thread via GitHub
adriangb commented on PR #16445: URL: https://github.com/apache/datafusion/pull/16445#issuecomment-2991563630 > It's hard to say generally, but a hashtable lookup which fits into cache on a `u64` key can be really fast. I guess only benchmarks can tell. But I still think the scalar bo

[PR] feat: Finalize support for `RightMark` join [datafusion]

2025-06-20 Thread via GitHub
jonathanc-n opened a new pull request, #16488: URL: https://github.com/apache/datafusion/pull/16488 ## Which issue does this PR close? - Closes #13138 + #16385. ## Rationale for this change We need to support Sort Merge Join, Symmetric Hash Join and swapping join si

[PR] feat: Encapsulate Parquet objects [datafusion-comet]

2025-06-20 Thread via GitHub
huaxingao opened a new pull request, #1920: URL: https://github.com/apache/datafusion-comet/pull/1920 ## Which issue does this PR close? Closes #. ## Rationale for this change Iceberg shades Parquet. We can't pass Parquet objects from Iceberg to Comet. In order to ge

[I] [DISCUSS] Distinguish between "performance" and "efficiency" cores [datafusion]

2025-06-20 Thread via GitHub
alamb opened a new issue, #16490: URL: https://github.com/apache/datafusion/issues/16490 ### Is your feature request related to a problem or challenge? Modern Macs / Apple Silicon (and Intel 12th gen processors) make a distinction between Performance (`P`) and Efficiency (`E`) cores.

Re: [PR] Example for using a separate threadpool for CPU bound work (try 3) [datafusion]

2025-06-20 Thread via GitHub
adriangb commented on code in PR #16331: URL: https://github.com/apache/datafusion/pull/16331#discussion_r2159381725 ## datafusion-examples/examples/thread_pools.rs: ## @@ -0,0 +1,349 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Optimize allocation rate for `int64` array in `hex` function [datafusion]

2025-06-20 Thread via GitHub
Dandandan commented on PR #16483: URL: https://github.com/apache/datafusion/pull/16483#issuecomment-2992269293 I think this is probably faster! Do we have any benchmarks for this to show it? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] fix: Add continue after append_null when casting float to decimal [datafusion-comet]

2025-06-20 Thread via GitHub
leung-ming commented on PR #1914: URL: https://github.com/apache/datafusion-comet/pull/1914#issuecomment-2992275004 > Would you like to add a unit test to cover this case? Ok, do it later. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2992232572 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
pepijnve commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2992237210 Oh my, it landed 🎉. Thanks @alamb, great start of my weekend! Still working on the Tokio PR. I'll create a follow-up issue to referring to the todo I left in the code. -- T

[I] Update cooperative scheduling code to use `poll_proceed` when/if it becomes available in Tokio [datafusion]

2025-06-20 Thread via GitHub
pepijnve opened a new issue, #16489: URL: https://github.com/apache/datafusion/issues/16489 ### Is your feature request related to a problem or challenge? PR #16398 revised the cooperative scheduling support in DataFusion to make use of Tokio's task budget system. This was implemented

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-06-20 Thread via GitHub
alamb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2992278637 > Is there a duplication with push_down_filter in the logical optimizer? In my opinion there is duplication in the sense that they both do similar things (push filters down

Re: [I] Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work) [datafusion]

2025-06-20 Thread via GitHub
alamb commented on issue #12393: URL: https://github.com/apache/datafusion/issues/12393#issuecomment-2992238567 I am still looking for a committer to help review the next round of this example (which uses the object_store APIs) - https://github.com/apache/datafusion/pull/16331 -- This

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2992260580 > 🤔 now that I write '10-core MacBook', I'm wondering if the 10 core part is where my variability is coming from. That's 6 performance and 4 efficiency cores. Ideally DataFusion keeps

Re: [I] Result from `array_has` depends on runtime array type [datafusion]

2025-06-20 Thread via GitHub
bert-beyondloops commented on issue #16459: URL: https://github.com/apache/datafusion/issues/16459#issuecomment-2992321190 As I understood, this issue is triggered when the whole ListVector has no item entries at all. So every "row" has a null or empty list.(empty offsets buffer) -- Th

Re: [PR] test: Trigger Spark 3.4.3 SQL tests for iceberg-compat [datafusion-comet]

2025-06-20 Thread via GitHub
parthchandra merged PR #1912: URL: https://github.com/apache/datafusion-comet/pull/1912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] [DISCUSS] Distinguish between "performance" and "efficiency" cores [datafusion]

2025-06-20 Thread via GitHub
alamb commented on issue #16490: URL: https://github.com/apache/datafusion/issues/16490#issuecomment-2992289216 A good quite quite https://developer.apple.com/news/?id=vk3m204o > Statically pre-assigning pieces of a parallel workload to worker threads will leave threads idle before th

Re: [PR] Consolidate DataFrame Docs: Merge HTML Rendering Section as Subpage [datafusion-python]

2025-06-20 Thread via GitHub
timsaucer commented on PR #1161: URL: https://github.com/apache/datafusion-python/pull/1161#issuecomment-2992538234 This is good, but I still see a couple of issues when I build it locally. In the screen shot I am getting two different "API Reference" sections. https://github.c

Re: [I] Make `datafusion` read parquet folders if non parquet files exists [datafusion]

2025-06-20 Thread via GitHub
hendrikmakait commented on issue #16460: URL: https://github.com/apache/datafusion/issues/16460#issuecomment-2992553343 As a side note, the docs at https://github.com/apache/datafusion/blob/a91e0421ebadf3a155508e28e272f5fb8356bca1/docs/source/user-guide/cli/datasources.md#parquet don't matc

Re: [PR] Use Tokio's task budget consistently, better APIs to support task cancellation [datafusion]

2025-06-20 Thread via GitHub
alamb commented on PR #16398: URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2991892466 Actually, @zhuqi-lucas it wasn't 100% clear to me from your comments -- are you ok with this PR being merged? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] feat: support array_max [datafusion-comet]

2025-06-20 Thread via GitHub
drexler-sky commented on PR #1892: URL: https://github.com/apache/datafusion-comet/pull/1892#issuecomment-2992460842 Thanks @mbutrovich @andygrove @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] Split clickbench query set into one file per query [datafusion]

2025-06-20 Thread via GitHub
pepijnve opened a new pull request, #16476: URL: https://github.com/apache/datafusion/pull/16476 ## Which issue does this PR close? None ## Rationale for this change Clickbench query IDs are zero-based while most editors are one-based wrt line numbers. This causes a litt

Re: [I] [Potentially] Release DataFusion `48.0.1` [datafusion]

2025-06-20 Thread via GitHub
timsaucer commented on issue #16486: URL: https://github.com/apache/datafusion/issues/16486#issuecomment-2992456109 If we do a patch release I would like to get in this bugfix that I found today: https://github.com/apache/datafusion/pull/16480 -- This is an automated message from the Apac

  1   2   3   >