Re: [PR] fix: modulo op with negative zero divisor produces Nan [datafusion-comet]

2024-07-26 Thread via GitHub
vaibhawvipul commented on PR #585: URL: https://github.com/apache/datafusion-comet/pull/585#issuecomment-2252340865 @andygrove / @kazuyukitanimura can we please get CI triggered? and also a review? -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs contain input types [datafusion]

2024-07-26 Thread via GitHub
lewiszlw opened a new pull request, #11666: URL: https://github.com/apache/datafusion/pull/11666 ## Which issue does this PR close? Closes #. ## Rationale for this change It confused me when I read these code that AggregateFunctionExpr / AccumulatorArgs /

Re: [PR] [Bug] fix bug in return type inference of `utf8_to_int_type` [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11662: URL: https://github.com/apache/datafusion/pull/11662#issuecomment-2252459254 I think clippy is failing because the string-view2 branch needs to get the same fixes we have applied to main. I will make a PR to do this shortly -- This is an automated mes

Re: [I] Allow comparison of Timestamps with different Timezones [datafusion]

2024-07-26 Thread via GitHub
alamb commented on issue #11653: URL: https://github.com/apache/datafusion/issues/11653#issuecomment-2252474666 > can the comparison operator be a function that accepts timestamps in different zones? It could, but I think that would just postpone the conversation to execution (and it

Re: [PR] [Bug] fix bug in return type inference of `utf8_to_int_type` [datafusion]

2024-07-26 Thread via GitHub
alamb commented on code in PR #11662: URL: https://github.com/apache/datafusion/pull/11662#discussion_r1692871493 ## datafusion/functions/src/utils.rs: ## @@ -41,8 +41,8 @@ macro_rules! get_optimal_return_type { DataType::LargeUtf8 | DataType::LargeBinary => $la

[PR] Merge string-view2 branch to main [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new pull request, #11667: URL: https://github.com/apache/datafusion/pull/11667 Draft until arrow `52.2.0` is released to crates.io (expected Sat July 27) ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10918 ## Rationale f

Re: [PR] [Bug] fix bug in return type inference of `utf8_to_int_type` [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11662: URL: https://github.com/apache/datafusion/pull/11662#issuecomment-2252500434 I merged up from main to try and get CI passing on this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Add LimitPushdown optimization rule and CoalesceBatchesExec fetch [datafusion]

2024-07-26 Thread via GitHub
ozankabak commented on code in PR #11652: URL: https://github.com/apache/datafusion/pull/11652#discussion_r1692893313 ## datafusion/core/src/physical_optimizer/limit_pushdown.rs: ## @@ -166,37 +154,111 @@ fn extract_limit(plan: &Arc) -> Option { } } -/// Merge the limit

Re: [PR] chore(deps): update sqlparser requirement from 0.48 to 0.49 [datafusion]

2024-07-26 Thread via GitHub
jonahgao commented on PR #11630: URL: https://github.com/apache/datafusion/pull/11630#issuecomment-2252580831 CI [failed](https://github.com/apache/datafusion/actions/runs/10110013078/job/27959063798) when checking circular dependency. It might be caused by https://github.com/Byron/gitoxid

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
timsaucer commented on code in PR #771: URL: https://github.com/apache/datafusion-python/pull/771#discussion_r1693012963 ## python/datafusion/functions.py: ## @@ -1480,31 +1481,26 @@ def last_value( ) -def bit_and(*args: Expr, distinct: bool = False) -> Expr: +def bit_a

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward commented on code in PR #771: URL: https://github.com/apache/datafusion-python/pull/771#discussion_r1693037454 ## src/functions.rs: ## @@ -30,13 +31,147 @@ use datafusion::functions_aggregate; use datafusion_common::{Column, ScalarValue, TableReference}; use dat

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward commented on code in PR #771: URL: https://github.com/apache/datafusion-python/pull/771#discussion_r1693040247 ## src/functions.rs: ## @@ -293,21 +569,23 @@ fn col(name: &str) -> PyResult { }) } +// TODO: should we just expose this in python? /// Create a

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward commented on code in PR #771: URL: https://github.com/apache/datafusion-python/pull/771#discussion_r1693046030 ## python/datafusion/functions.py: ## @@ -1480,31 +1481,26 @@ def last_value( ) -def bit_and(*args: Expr, distinct: bool = False) -> Expr: +def

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward commented on code in PR #771: URL: https://github.com/apache/datafusion-python/pull/771#discussion_r1693046030 ## python/datafusion/functions.py: ## @@ -1480,31 +1481,26 @@ def last_value( ) -def bit_and(*args: Expr, distinct: bool = False) -> Expr: +def

Re: [PR] [Bug] fix bug in return type inference of `utf8_to_int_type` [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao commented on code in PR #11662: URL: https://github.com/apache/datafusion/pull/11662#discussion_r1693060380 ## datafusion/functions/src/utils.rs: ## @@ -41,8 +41,8 @@ macro_rules! get_optimal_return_type { DataType::LargeUtf8 | DataType::LargeBinary

[I] Use `AccumulatorArgs::is_reversed` in `NthValueAgg` [datafusion]

2024-07-26 Thread via GitHub
jcsherin opened a new issue, #11668: URL: https://github.com/apache/datafusion/issues/11668 ### Is your feature request related to a problem or challenge? The changes in #11564 introduced `AccumulatorArgs::is_reversed`. This indicates that the sort order of the aggregation is reversed

Re: [PR] Ensure statistic defaults in parquet writers are in sync [datafusion]

2024-07-26 Thread via GitHub
alamb commented on code in PR #11656: URL: https://github.com/apache/datafusion/pull/11656#discussion_r1693130571 ## datafusion/sqllogictest/test_files/information_schema.slt: ## @@ -202,7 +202,7 @@ datafusion.execution.parquet.pruning true datafusion.execution.parquet.pushdown

[PR] Reverse expr nth value [datafusion]

2024-07-26 Thread via GitHub
jcsherin opened a new pull request, #11669: URL: https://github.com/apache/datafusion/pull/11669 ## Which issue does this PR close? Closes #11668. ## Rationale for this change The `AccmulatorArgs::is_reversed` was introduced in #11564 for indicating that

[PR] Docs: adding explicit mention of test_utils to docs [datafusion]

2024-07-26 Thread via GitHub
edmondop opened a new pull request, #11670: URL: https://github.com/apache/datafusion/pull/11670 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[I] circular dependency check CI check is failing with compile error [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new issue, #11671: URL: https://github.com/apache/datafusion/issues/11671 ### Describe the bug CI runs a dependencies check like https://github.com/apache/datafusion/blob/f715d8c6e52ede26ff5b260ad724c7f0c4608cc7/.github/workflows/dependencies.yml#L37-L36

Re: [PR] Merge string-view2 branch to main [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11667: URL: https://github.com/apache/datafusion/pull/11667#issuecomment-2252864547 CI appears to be failing due to https://github.com/apache/datafusion/issues/11671 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] Fix depcheck by updating dependency to cargo 0.81.0 [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new pull request, #11672: URL: https://github.com/apache/datafusion/pull/11672 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11671 ## Rationale for this change Fix https://github.com/apache/datafusion/issues/11671

Re: [I] circular dependency check CI check is failing with compile error [datafusion]

2024-07-26 Thread via GitHub
alamb commented on issue #11671: URL: https://github.com/apache/datafusion/issues/11671#issuecomment-2252886450 I made https://github.com/apache/datafusion/pull/11672 to track this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Fix depcheck by updating dependency to cargo 0.81.0 [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11672: URL: https://github.com/apache/datafusion/pull/11672#issuecomment-2252887410 The dependencies check now passes on this PR: https://github.com/apache/datafusion/actions/runs/10112845692/job/27967849703?pr=11672 -- This is an automated message from the Apache G

Re: [PR] chore(deps): update sqlparser requirement from 0.48 to 0.49 [datafusion]

2024-07-26 Thread via GitHub
alamb merged PR #11630: URL: https://github.com/apache/datafusion/pull/11630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] circular dependency check CI check is failing with compile error [datafusion]

2024-07-26 Thread via GitHub
alamb closed issue #11671: circular dependency check CI check is failing with compile error URL: https://github.com/apache/datafusion/issues/11671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix depcheck by updating to cargo `0.81.0` [datafusion]

2024-07-26 Thread via GitHub
alamb closed pull request #11672: Fix depcheck by updating to cargo `0.81.0` URL: https://github.com/apache/datafusion/pull/11672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] chore(deps): update sqlparser requirement from 0.48 to 0.49 [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11630: URL: https://github.com/apache/datafusion/pull/11630#issuecomment-2252892519 Thanks @jonahgao -- I hadn't seen this PR yet when I filed https://github.com/apache/datafusion/issues/11671 I should have known you would have already found a fix ❤️ -- Thi

Re: [I] Optimize CASE expression for "expr or expr" usage [datafusion]

2024-07-26 Thread via GitHub
jatin510 commented on issue #11638: URL: https://github.com/apache/datafusion/issues/11638#issuecomment-2252896141 can I work on this issue? @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Run CI with latest (Rust 1.80), add ticket references to commented out tests [datafusion]

2024-07-26 Thread via GitHub
findepi commented on PR #11661: URL: https://github.com/apache/datafusion/pull/11661#issuecomment-2252912467 @jayzhan211 can you please approve & merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Rename `ColumnOptions` to `ParquetColumnOptions` [datafusion]

2024-07-26 Thread via GitHub
findepi commented on code in PR #11512: URL: https://github.com/apache/datafusion/pull/11512#discussion_r1693204221 ## datafusion/common/src/config.rs: ## @@ -1552,7 +1552,7 @@ config_namespace_with_hashmap! { /// Options controlling parquet format for individual columns.

Re: [PR] Minor: Rename `RepartitionMetrics::repartition_time` to `RepartitionMetrics::repart_time` to match metric [datafusion]

2024-07-26 Thread via GitHub
findepi commented on code in PR #11478: URL: https://github.com/apache/datafusion/pull/11478#discussion_r1693205467 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -415,7 +415,7 @@ struct RepartitionMetrics { /// Time in nanos to execute child operator and fetch b

Re: [PR] Minor: Rename `RepartitionMetrics::repartition_time` to `RepartitionMetrics::repart_time` to match metric [datafusion]

2024-07-26 Thread via GitHub
alamb commented on code in PR #11478: URL: https://github.com/apache/datafusion/pull/11478#discussion_r1693216328 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -415,7 +415,7 @@ struct RepartitionMetrics { /// Time in nanos to execute child operator and fetch bat

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2252947090 Let's do it. go go go 🚀 Thanks again everryone for your comments and help. I think this PR finally breaks open the path to separate out the last monolithic knot -- This is a

Re: [PR] doc: why nullable of list item is set to true [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11626: URL: https://github.com/apache/datafusion/pull/11626#issuecomment-2252948355 Thanks again - we can iterate on the docs in follow on PRs if there is more to do -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] Document why nullable of list item does not map to schema of first argument [datafusion]

2024-07-26 Thread via GitHub
alamb closed issue #11625: Document why nullable of list item does not map to schema of first argument URL: https://github.com/apache/datafusion/issues/11625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Get Clippy clean for Rust 1.80 and run it on CI [datafusion]

2024-07-26 Thread via GitHub
comphead closed issue #11657: Get Clippy clean for Rust 1.80 and run it on CI URL: https://github.com/apache/datafusion/issues/11657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Add support for USING to SQL unparser [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11636: URL: https://github.com/apache/datafusion/pull/11636#issuecomment-2252948911 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Run CI with latest (Rust 1.80), add ticket references to commented out tests [datafusion]

2024-07-26 Thread via GitHub
comphead merged PR #11661: URL: https://github.com/apache/datafusion/pull/11661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-26 Thread via GitHub
alamb merged PR #11516: URL: https://github.com/apache/datafusion/pull/11516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add support for USING to SQL unparser [datafusion]

2024-07-26 Thread via GitHub
alamb merged PR #11636: URL: https://github.com/apache/datafusion/pull/11636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-07-26 Thread via GitHub
alamb closed issue #10652: Support convert LogicalPlan JOIN with `Using` constraint to SQL String URL: https://github.com/apache/datafusion/issues/10652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] doc: why nullable of list item is set to true [datafusion]

2024-07-26 Thread via GitHub
alamb merged PR #11626: URL: https://github.com/apache/datafusion/pull/11626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-26 Thread via GitHub
findepi commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2252949921 🎉 thanks for all the review feedback and the merge! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Internal error when regex operator `~` is used with `List`s (SQLancer) [datafusion]

2024-07-26 Thread via GitHub
alamb closed issue #11622: Internal error when regex operator `~` is used with `List`s (SQLancer) URL: https://github.com/apache/datafusion/issues/11622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] fix: dont try to coerce list for regex match [datafusion]

2024-07-26 Thread via GitHub
alamb merged PR #11646: URL: https://github.com/apache/datafusion/pull/11646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: dont try to coerce list for regex match [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11646: URL: https://github.com/apache/datafusion/pull/11646#issuecomment-2252949474 Thanks again @tshauck and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [Bug] fix bug in return type inference of `utf8_to_int_type` [datafusion]

2024-07-26 Thread via GitHub
alamb merged PR #11662: URL: https://github.com/apache/datafusion/pull/11662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] doc: why nullable of list item is set to true [datafusion]

2024-07-26 Thread via GitHub
jcsherin commented on PR #11626: URL: https://github.com/apache/datafusion/pull/11626#issuecomment-2252959216 Thanks for the review feedback - @alamb, @comphead and for prior discussions @jayzhan211. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Allow comparison of Timestamps with different Timezones [datafusion]

2024-07-26 Thread via GitHub
jeffreyssmith2nd commented on issue #11653: URL: https://github.com/apache/datafusion/issues/11653#issuecomment-2252964106 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Custom planning behavior for selecting wildcard expression [datafusion]

2024-07-26 Thread via GitHub
goldmedal opened a new pull request, #11673: URL: https://github.com/apache/datafusion/pull/11673 ## Which issue does this PR close? Closes #11639 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [I] Optimize CASE expression for "expr or expr" usage [datafusion]

2024-07-26 Thread via GitHub
jatin510 commented on issue #11638: URL: https://github.com/apache/datafusion/issues/11638#issuecomment-2253000896 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Rewrite UDAF reversed expression name [datafusion]

2024-07-26 Thread via GitHub
dharanad commented on issue #11629: URL: https://github.com/apache/datafusion/issues/11629#issuecomment-2253008779 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Rewrite UDAF reversed expression name [datafusion]

2024-07-26 Thread via GitHub
dharanad commented on issue #11629: URL: https://github.com/apache/datafusion/issues/11629#issuecomment-2253012006 I plan to close this by 10th Aug. I hope that okay ? Feel free to re assign if this is a priority -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Change --string-view to only apply to parquet formats [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao commented on PR #11663: URL: https://github.com/apache/datafusion/pull/11663#issuecomment-2253034170 >Maybe what is needed is to do the same Utf8 --> Utf8View transformation on the file schema (rather than using the table schema) Absolutely! I've updated the related code

[I] clarify separation between rust code and python wrappers [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward opened a new issue, #779: URL: https://github.com/apache/datafusion-python/issues/779 #750 added `python` wrappers instead of directly exposing `pyo3` generated code. We should leverage that to clean up and improve the rust codebase by: 1) Letting `python` wrappe

[I] Incorporate unified Aggregate / Window function builder [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward opened a new issue, #780: URL: https://github.com/apache/datafusion-python/issues/780 @timsaucer contributed a unified `AggregateUDF`/`WindowUDF` builder to upstream datafusion: https://github.com/apache/datafusion/pull/11550 We should expose that and leverage that to c

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward commented on PR #771: URL: https://github.com/apache/datafusion-python/pull/771#issuecomment-2253056031 @timsaucer I've captured the follow-on issues and added your two to the tracking issue https://github.com/apache/datafusion-python/issues/776 I'd prefer to merge lar

Re: [I] Add remaining non-wrapped functions [datafusion-python]

2024-07-26 Thread via GitHub
Michael-J-Ward commented on issue #767: URL: https://github.com/apache/datafusion-python/issues/767#issuecomment-2253060657 Question: Have you ever used or do you know of a tool to run queries over python / rust codebases? It would be nice if we could generate a concrete report of

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-26 Thread via GitHub
jcsherin closed issue #11433: Add nullable in `StateFieldArgs` URL: https://github.com/apache/datafusion/issues/11433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-26 Thread via GitHub
jcsherin commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2253076728 Thanks again @jayzhan211. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] chore: Remove TPC-DS benchmark results [datafusion-comet]

2024-07-26 Thread via GitHub
andygrove opened a new pull request, #728: URL: https://github.com/apache/datafusion-comet/pull/728 ## Which issue does this PR close? N/A ## Rationale for this change I re-ran all of the benchmarks using the official 0.1.0 release. I was able to reproduc

Re: [PR] Minor: Rename `RepartitionMetrics::repartition_time` to `RepartitionMetrics::repart_time` to match metric [datafusion]

2024-07-26 Thread via GitHub
findepi commented on code in PR #11478: URL: https://github.com/apache/datafusion/pull/11478#discussion_r1693318551 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -415,7 +415,7 @@ struct RepartitionMetrics { /// Time in nanos to execute child operator and fetch b

[PR] Increase ByteViewMap block size to 2MB [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao opened a new pull request, #11674: URL: https://github.com/apache/datafusion/pull/11674 ## Which issue does this PR close? Closes #. ## Rationale for this change Increase the default block size from 8KB to 2MB, this significantly improves the aggrega

Re: [PR] Increase ByteViewMap block size to 2MB [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao commented on PR #11674: URL: https://github.com/apache/datafusion/pull/11674#issuecomment-2253148423 This the last piece to get the initial StringView support, testing on my machine shows that it can increase the string-intensive ClickBench performance by 20%-200%. -- This

Re: [PR] fix: skip negative scale checks for creating decimals [datafusion-comet]

2024-07-26 Thread via GitHub
kazuyukitanimura commented on PR #723: URL: https://github.com/apache/datafusion-comet/pull/723#issuecomment-2253289076 SF=100 ## getDecimal(Long) ### Before ![Screenshot 2024-07-26 at 11 39 30  AM](https://github.com/user-attachments/assets/b01b7efb-828d-410a-a2c0-d70b4513368b)

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1692419610 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -231,6 +235,10 @@ impl MemoryPool for FairSpillPool { } } +/// Constructs a resources error based upo

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693465283 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -231,6 +235,11 @@ impl MemoryPool for FairSpillPool { } } +/// Constructs a resources error based upo

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1692421398 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +423,56 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); ass

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1692420852 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +423,56 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); ass

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693467375 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693467375 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

[I] `pushdown_sorts` pushes a SortExec through a node in violation of its stated input ordering requirements [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new issue, #11675: URL: https://github.com/apache/datafusion/issues/11675 ### Describe the bug The `SanityChecker` added in https://github.com/apache/datafusion/pull/11196 from @mustafasrepo is triggering on some of our plans. I believe the SanityChecker is correctly r

Re: [PR] Add Optimizer Sanity Checker, improve sortedness equivalence properties [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11196: URL: https://github.com/apache/datafusion/pull/11196#issuecomment-2253300784 This check was being triggered in our downstream tests, but I think it actually found a real bug: https://github.com/apache/datafusion/issues/11675 -- This is an automated message fr

[PR] Implement native support StringView for character length [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao opened a new pull request, #11676: URL: https://github.com/apache/datafusion/pull/11676 ## Which issue does this PR close? Closes #. ## Rationale for this change Currently we rely on auto coerce rule to support StringViewArrays, which requires a cast (co

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693480102 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -231,6 +235,11 @@ impl MemoryPool for FairSpillPool { } } +/// Constructs a resources error based upo

Re: [PR] Provide actionable error messaging due to resource exhuastion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693480102 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -231,6 +235,11 @@ impl MemoryPool for FairSpillPool { } } +/// Constructs a resources error based upo

Re: [I] Implement native `StringView` support for CharacterLength [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao commented on issue #11677: URL: https://github.com/apache/datafusion/issues/11677#issuecomment-225230 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] Implement native `StringView` support for CharacterLength [datafusion]

2024-07-26 Thread via GitHub
XiangpengHao opened a new issue, #11677: URL: https://github.com/apache/datafusion/issues/11677 ### Is your feature request related to a problem or challenge? Part of #10918 Initial `StringView` is supported by #11667, which covers some most performance critical workloads, such

Re: [I] `pushdown_sorts` pushes a SortExec through a node in violation of its stated input ordering requirements [datafusion]

2024-07-26 Thread via GitHub
alamb commented on issue #11675: URL: https://github.com/apache/datafusion/issues/11675#issuecomment-2253339740 I made a PR that solves our problem, but I need to write some tests in datafusion: https://github.com/apache/datafusion/pull/11678 -- This is an automated message from the Apach

[PR] Do not push down Sorts if it violates the sort requirements [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new pull request, #11678: URL: https://github.com/apache/datafusion/pull/11678 (Draft as I need to write tests for this) ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11675 ## Rationale for this change See http

Re: [PR] Do not push down Sorts if it violates the sort requirements [datafusion]

2024-07-26 Thread via GitHub
alamb commented on code in PR #11678: URL: https://github.com/apache/datafusion/pull/11678#discussion_r1693499951 ## datafusion/core/src/physical_optimizer/sort_pushdown.rs: ## @@ -176,6 +176,7 @@ fn pushdown_requirement_to_children( || plan.as_any().is::() ||

Re: [PR] Provide actionable error messaging due to resource exhaustion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693517734 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

[I] [Epic] High cardinality aggregation performance wishlist [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new issue, #11679: URL: https://github.com/apache/datafusion/issues/11679 ### Is your feature request related to a problem or challenge? This is my wishlist for improving high cardinality aggregates (ideally for the next blog post in a few months #11631 ) Togethe

Re: [PR] Provide actionable error messaging due to resource exhaustion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693521876 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-26 Thread via GitHub
timsaucer commented on code in PR #771: URL: https://github.com/apache/datafusion-python/pull/771#discussion_r1693521807 ## src/functions.rs: ## @@ -293,21 +569,23 @@ fn col(name: &str) -> PyResult { }) } +// TODO: should we just expose this in python? /// Create a COUN

Re: [PR] Provide actionable error messaging due to resource exhaustion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693521876 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

Re: [PR] Provide actionable error messaging due to resource exhaustion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693521876 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

Re: [PR] Provide actionable error messaging due to resource exhaustion. [datafusion]

2024-07-26 Thread via GitHub
wiedld commented on code in PR #11665: URL: https://github.com/apache/datafusion/pull/11665#discussion_r1693517734 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -311,4 +458,179 @@ mod tests { let err = r4.try_grow(30).unwrap_err().strip_backtrace(); as

[I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

2024-07-26 Thread via GitHub
alamb opened a new issue, #11680: URL: https://github.com/apache/datafusion/issues/11680 ### Is your feature request related to a problem or challenge? As described on https://github.com/apache/datafusion/issues/11679, we can do better for high cardinality aggregates One thing

[I] CI should run rat tests on PRs that only have docs changes [datafusion-comet]

2024-07-26 Thread via GitHub
andygrove opened a new issue, #729: URL: https://github.com/apache/datafusion-comet/issues/729 ### What is the problem the feature request solves? We currently skip most CI checks for PRs that only touch files under the `docs` directory, making it easy to merge changes that would fail

Re: [I] Add remaining non-wrapped functions [datafusion-python]

2024-07-26 Thread via GitHub
timsaucer commented on issue #767: URL: https://github.com/apache/datafusion-python/issues/767#issuecomment-2253478528 No, but I did write a small script to check and this is what I see missing: ``` attribute,datafusion,Catalog, attribute,datafusion,Database, attribute,datafu

Re: [I] Add remaining non-wrapped functions [datafusion-python]

2024-07-26 Thread via GitHub
timsaucer commented on issue #767: URL: https://github.com/apache/datafusion-python/issues/767#issuecomment-2253480280 FWIW I don't know if all of these need to be exported. It's probably worth looking through each one. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] fix: modulo op with negative zero divisor produces Nan [datafusion-comet]

2024-07-26 Thread via GitHub
kazuyukitanimura commented on code in PR #585: URL: https://github.com/apache/datafusion-comet/pull/585#discussion_r1693605259 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -985,23 +984,82 @@ object QueryPlanSerde extends Logging with ShimQueryPlanS

Re: [PR] Minor: Rename `RepartitionMetrics::repartition_time` to `RepartitionMetrics::repart_time` to match metric [datafusion]

2024-07-26 Thread via GitHub
alamb commented on PR #11478: URL: https://github.com/apache/datafusion/pull/11478#issuecomment-2253520850 Good plan -- how about we merge this PR (which is just code reorg) and then I'll make a follow on to propose renaming the public metric name for separate consideration -- This is an

Re: [I] Allow sorting to improve `FixedSizeBinary` filtering [datafusion]

2024-07-26 Thread via GitHub
samuelcolvin commented on issue #11170: URL: https://github.com/apache/datafusion/issues/11170#issuecomment-2253527396 I think the problem is that [here](https://github.com/apache/arrow-rs/blob/f42d2420525a05a9b55461d83b359779ca5cc2a3/arrow-select/src/filter.rs#L320-L383) `arrow-rs` has spe

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-26 Thread via GitHub
edmondop commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2253542464 @jayzhan211 the change is dropping the limit in the physical plan node, I wasn't able to find out the source of it. Do you have any hint ? -- This is an automated message from the

Re: [PR] Add LimitPushdown optimization rule and CoalesceBatchesExec fetch [datafusion]

2024-07-26 Thread via GitHub
alamb commented on code in PR #11652: URL: https://github.com/apache/datafusion/pull/11652#discussion_r1693622103 ## datafusion/physical-plan/src/lib.rs: ## @@ -428,6 +428,18 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { fn statistics(&self) -> Result {

[PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-26 Thread via GitHub
Kimahriman opened a new pull request, #731: URL: https://github.com/apache/datafusion-comet/pull/731 ## Which issue does this PR close? Closes #730 ## Rationale for this change To support struct types in expressions, you need to be able to pull out values fr

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-26 Thread via GitHub
Kimahriman commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693649707 ## spark/src/main/scala/org/apache/spark/sql/comet/CometRowToColumnarExec.scala: ## @@ -60,8 +62,17 @@ case class CometRowToColumnarExec(child: SparkPlan)

  1   2   >