[PR] Move wildcard expansions to the analyzer [datafusion]

2024-07-27 Thread via GitHub
goldmedal opened a new pull request, #11681: URL: https://github.com/apache/datafusion/pull/11681 ## Which issue does this PR close? Closes #11639 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [PR] Move wildcard expansions to the analyzer [datafusion]

2024-07-27 Thread via GitHub
goldmedal commented on code in PR #11681: URL: https://github.com/apache/datafusion/pull/11681#discussion_r1693934130 ## datafusion/sql/src/select.rs: ## @@ -590,44 +589,35 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { if empty_from { re

Re: [I] Allow custom planning behavior for selecting wildcard expression [datafusion]

2024-07-27 Thread via GitHub
goldmedal commented on issue #11639: URL: https://github.com/apache/datafusion/issues/11639#issuecomment-2254103928 I have drafted a version https://github.com/apache/datafusion/pull/11681 for moving wildcard expansions to the analyzer. I think it's better than #11673. I might change the pu

Re: [PR] rfc: optional skipping partial aggregation [datafusion]

2024-07-27 Thread via GitHub
alamb commented on PR #11627: URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2254113651 > @alamb thank you for sharing benchmark results -- I'll check out if any of them benefited from this feature (I suppose it shouldn't be triggered in many of them) and will look for th

Re: [PR] Increase ByteViewMap block size to 2MB [datafusion]

2024-07-27 Thread via GitHub
alamb merged PR #11674: URL: https://github.com/apache/datafusion/pull/11674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Increase ByteViewMap block size to 2MB [datafusion]

2024-07-27 Thread via GitHub
alamb commented on code in PR #11674: URL: https://github.com/apache/datafusion/pull/11674#discussion_r1693940405 ## datafusion/physical-expr-common/src/binary_view_map.rs: ## @@ -149,7 +149,7 @@ where output_type, map: hashbrown::raw::RawTable::with_ca

Re: [I] use StringViewArray when reading String columns from Parquet [datafusion]

2024-07-27 Thread via GitHub
alamb commented on issue #10921: URL: https://github.com/apache/datafusion/issues/10921#issuecomment-2254115879 This is done on the string-view2 branch. Once we mereg https://github.com/apache/datafusion/pull/11667 we can close this ticket I think -- This is an automated message from the

[I] Enable `datafusion.execution.parquet.schema_force_string_view` by default [datafusion]

2024-07-27 Thread via GitHub
alamb opened a new issue, #11682: URL: https://github.com/apache/datafusion/issues/11682 ### Is your feature request related to a problem or challenge? As part of https://github.com/apache/datafusion/issues/10918, @XiangpengHao has threaded the use of `StringView` through parquet, ar

Re: [PR] Change `--string-view` to only apply to parquet formats [datafusion]

2024-07-27 Thread via GitHub
alamb merged PR #11663: URL: https://github.com/apache/datafusion/pull/11663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Merge string-view2 branch to main [datafusion]

2024-07-27 Thread via GitHub
alamb commented on PR #11667: URL: https://github.com/apache/datafusion/pull/11667#issuecomment-2254118604 Update here is that we are on track to release arrow `52.2.0` to crates.io tomorrow Saturday July 28. (thank you to @waynexia @viirya @wjones127 for verifying / voting 🙏 ). The

Re: [PR] Docs: adding explicit mention of test_utils to docs [datafusion]

2024-07-27 Thread via GitHub
alamb merged PR #11670: URL: https://github.com/apache/datafusion/pull/11670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Ensure statistic defaults in parquet writers are in sync [datafusion]

2024-07-27 Thread via GitHub
alamb merged PR #11656: URL: https://github.com/apache/datafusion/pull/11656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Inconsistent value for `data_page_max_rows` setting in DataFusion `ParquetOptions` and in `ArrowWriterOptions` [datafusion]

2024-07-27 Thread via GitHub
alamb closed issue #11367: Inconsistent value for `data_page_max_rows` setting in DataFusion `ParquetOptions` and in `ArrowWriterOptions` URL: https://github.com/apache/datafusion/issues/11367 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Add LimitPushdown optimization rule and CoalesceBatchesExec fetch [datafusion]

2024-07-27 Thread via GitHub
alamb commented on code in PR #11652: URL: https://github.com/apache/datafusion/pull/11652#discussion_r1693942386 ## datafusion/sqllogictest/test_files/group_by.slt: ## @@ -4334,8 +4335,9 @@ physical_plan 01)GlobalLimitExec: skip=0, fetch=5 02)--SortPreservingMergeExec: [name@

Re: [PR] Implement physical plan serialization for json Copy plans [datafusion]

2024-07-27 Thread via GitHub
alamb merged PR #11645: URL: https://github.com/apache/datafusion/pull/11645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Rename `input_type` --> `input_types` om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs [datafusion]

2024-07-27 Thread via GitHub
lewiszlw commented on PR #11666: URL: https://github.com/apache/datafusion/pull/11666#issuecomment-2254121549 Thanks for pointing out. I'll update pr in a few days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Rename `input_type` --> `input_types` om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs [datafusion]

2024-07-27 Thread via GitHub
alamb commented on PR #11666: URL: https://github.com/apache/datafusion/pull/11666#issuecomment-2254122031 I happened to have this PR opened locally (I get anxious with PRs that are open too long 😅 ) so I took the liberty of updating the docs as well in 1a3c5ca7b while I was merging up from

Re: [PR] Rename `input_type` --> `input_types` om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs [datafusion]

2024-07-27 Thread via GitHub
jcsherin commented on code in PR #11666: URL: https://github.com/apache/datafusion/pull/11666#discussion_r1693945138 ## datafusion/functions-aggregate/COMMENTS.md: ## @@ -54,7 +54,7 @@ first argument and the definition looks like this: // `input_type` : data type of the first

Re: [PR] Implement native support StringView for character length [datafusion]

2024-07-27 Thread via GitHub
alamb commented on code in PR #11676: URL: https://github.com/apache/datafusion/pull/11676#discussion_r1693945106 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -92,15 +81,32 @@ impl ScalarUDFImpl for CharacterLengthFunc { /// Returns number of characters in the

Re: [PR] Implement native support StringView for character length [datafusion]

2024-07-27 Thread via GitHub
alamb commented on code in PR #11676: URL: https://github.com/apache/datafusion/pull/11676#discussion_r1693945276 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -116,55 +122,54 @@ where mod tests { use crate::unicode::character_length::CharacterLengthFunc;

Re: [PR] Rename `input_type` --> `input_types` om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs [datafusion]

2024-07-27 Thread via GitHub
jcsherin commented on code in PR #11666: URL: https://github.com/apache/datafusion/pull/11666#discussion_r1693945138 ## datafusion/functions-aggregate/COMMENTS.md: ## @@ -54,7 +54,7 @@ first argument and the definition looks like this: // `input_type` : data type of the first

Re: [PR] Implement native support StringView for character length [datafusion]

2024-07-27 Thread via GitHub
alamb commented on code in PR #11676: URL: https://github.com/apache/datafusion/pull/11676#discussion_r1693945295 ## datafusion/functions/src/unicode/character_length.rs: ## @@ -116,55 +122,54 @@ where mod tests { use crate::unicode::character_length::CharacterLengthFunc;

Re: [PR] Implement native support StringView for character length [datafusion]

2024-07-27 Thread via GitHub
alamb merged PR #11676: URL: https://github.com/apache/datafusion/pull/11676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-07-27 Thread via GitHub
peter-toth opened a new pull request, #11683: URL: https://github.com/apache/datafusion/pull/11683 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11194. ## Rationale for this change This PR contains 2 ideas: 1. Unfortunately http

Re: [PR] chore: Remove TPC-DS benchmark results [datafusion-comet]

2024-07-27 Thread via GitHub
andygrove merged PR #728: URL: https://github.com/apache/datafusion-comet/pull/728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Implement native support StringView for character length [datafusion]

2024-07-27 Thread via GitHub
alamb commented on PR #11676: URL: https://github.com/apache/datafusion/pull/11676#issuecomment-2254137639 This is my attempt to improve the arrow-rs docs: https://github.com/apache/arrow-rs/pull/6141 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] chore: make Cast's logic reusable for other projects [datafusion-comet]

2024-07-27 Thread via GitHub
andygrove merged PR #716: URL: https://github.com/apache/datafusion-comet/pull/716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] rfc: optional skipping partial aggregation [datafusion]

2024-07-27 Thread via GitHub
alamb commented on PR #11627: URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2254139121 My plan here is to spend time tomorrow morning doing some additional investigation / testing on the branch and unless I find any blockers I think we should proceed with it. Wha

Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-07-27 Thread via GitHub
peter-toth commented on code in PR #11683: URL: https://github.com/apache/datafusion/pull/11683#discussion_r1693953169 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -454,136 +435,169 @@ impl CommonSubexprEliminate { group_expr, aggr_exp

Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-07-27 Thread via GitHub
peter-toth commented on PR #11683: URL: https://github.com/apache/datafusion/pull/11683#issuecomment-2254141455 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-07-27 Thread via GitHub
peter-toth commented on code in PR #11683: URL: https://github.com/apache/datafusion/pull/11683#discussion_r1693953502 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -353,96 +349,81 @@ impl CommonSubexprEliminate { window: Window, config: &dyn O

Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-07-27 Thread via GitHub
peter-toth commented on code in PR #11683: URL: https://github.com/apache/datafusion/pull/11683#discussion_r1693953666 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1963,6 +1944,52 @@ mod test { Ok(()) } +#[test] +fn test_non_top_level_c

Re: [PR] Make `CommonSubexprEliminate` top-down like [datafusion]

2024-07-27 Thread via GitHub
peter-toth commented on code in PR #11683: URL: https://github.com/apache/datafusion/pull/11683#discussion_r1693953710 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1963,6 +1944,52 @@ mod test { Ok(()) } +#[test] +fn test_non_top_level_c

Re: [PR] Rename `input_type` --> `input_types` om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs [datafusion]

2024-07-27 Thread via GitHub
alamb commented on code in PR #11666: URL: https://github.com/apache/datafusion/pull/11666#discussion_r1693954393 ## datafusion/functions-aggregate/COMMENTS.md: ## @@ -54,7 +54,7 @@ first argument and the definition looks like this: // `input_type` : data type of the first arg

[PR] Add missing exports for wrapper modules [datafusion-python]

2024-07-27 Thread via GitHub
timsaucer opened a new pull request, #782: URL: https://github.com/apache/datafusion-python/pull/782 # Which issue does this PR close? This addresses part of https://github.com/apache/datafusion-python/issues/767 but does not close the issue. We still should add wrapper classes for s

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-27 Thread via GitHub
timsaucer commented on PR #771: URL: https://github.com/apache/datafusion-python/pull/771#issuecomment-2254149933 I've pushed this PR that just ensures we haven't missed any exports with the new wrappers. It might make sense to get it into the 40.0 release. https://github.com/apache/

Re: [I] Allow custom planning behavior for selecting wildcard expression [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 commented on issue #11639: URL: https://github.com/apache/datafusion/issues/11639#issuecomment-2254154452 I think handling replace item in options is a good idea. We store the replace item in Expr::Wildcard in planning stage and convert it to expanded columns with expected replac

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-27 Thread via GitHub
Kimahriman commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693962017 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -59,12 +59,13 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

Re: [PR] chore: add more aggregate functions to benchmark test [datafusion-comet]

2024-07-27 Thread via GitHub
andygrove commented on code in PR #706: URL: https://github.com/apache/datafusion-comet/pull/706#discussion_r1693965490 ## spark/benchmarks/CometAggregateBenchmark-jdk11-results.txt: ## @@ -0,0 +1,464 @@ +==

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-27 Thread via GitHub
andygrove commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693967998 ## spark/src/main/scala/org/apache/spark/sql/comet/CometRowToColumnarExec.scala: ## @@ -60,8 +62,17 @@ case class CometRowToColumnarExec(child: SparkPlan)

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-27 Thread via GitHub
andygrove commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693967256 ## native/core/src/execution/datafusion/expressions/structs.rs: ## @@ -125,3 +125,103 @@ impl PartialEq for CreateNamedStruct { .unwrap_or(false)

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-27 Thread via GitHub
andygrove commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693967380 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1128,7 +1133,7 @@ object CometSparkSessionExtensions extends Logging {

Re: [PR] chore: move scalar_funcs into spark-expr [datafusion-comet]

2024-07-27 Thread via GitHub
codecov-commenter commented on PR #712: URL: https://github.com/apache/datafusion-comet/pull/712#issuecomment-2254174686 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/712?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. [datafusion-comet]

2024-07-27 Thread via GitHub
codecov-commenter commented on PR #704: URL: https://github.com/apache/datafusion-comet/pull/704#issuecomment-2254174646 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/704?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] fix: window function range offset should be long instead of int [datafusion-comet]

2024-07-27 Thread via GitHub
huaxingao commented on code in PR #733: URL: https://github.com/apache/datafusion-comet/pull/733#discussion_r1693979682 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -275,12 +284,20 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wi

Re: [PR] fix: window function range offset should be long instead of int [datafusion-comet]

2024-07-27 Thread via GitHub
huaxingao commented on code in PR #733: URL: https://github.com/apache/datafusion-comet/pull/733#discussion_r1693980607 ## spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala: ## @@ -63,23 +63,35 @@ class CometExecSuite extends CometTestBase { } } - test("

Re: [PR] chore: add more aggregate functions to benchmark test [datafusion-comet]

2024-07-27 Thread via GitHub
huaxingao commented on code in PR #706: URL: https://github.com/apache/datafusion-comet/pull/706#discussion_r1693981204 ## spark/benchmarks/CometAggregateBenchmark-jdk11-results.txt: ## @@ -0,0 +1,464 @@ +==

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-27 Thread via GitHub
Kimahriman commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693981199 ## native/core/src/execution/datafusion/expressions/structs.rs: ## @@ -125,3 +125,103 @@ impl PartialEq for CreateNamedStruct { .unwrap_or(false)

Re: [PR] feat: Add GetStructField expression [datafusion-comet]

2024-07-27 Thread via GitHub
Kimahriman commented on code in PR #731: URL: https://github.com/apache/datafusion-comet/pull/731#discussion_r1693981470 ## spark/src/main/scala/org/apache/spark/sql/comet/CometRowToColumnarExec.scala: ## @@ -60,8 +62,17 @@ case class CometRowToColumnarExec(child: SparkPlan)

Re: [PR] Minor: improve documentation on `SessionState` [datafusion]

2024-07-27 Thread via GitHub
comphead merged PR #11642: URL: https://github.com/apache/datafusion/pull/11642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-27 Thread via GitHub
Michael-J-Ward commented on PR #771: URL: https://github.com/apache/datafusion-python/pull/771#issuecomment-2254203418 An aside: To me, the process of releasing a new `datafusion-python` version: 1) Upgrade the `datafusion` deps and migrate code so new code compiles 2) Integ

Re: [PR] Add missing exports for wrapper modules [datafusion-python]

2024-07-27 Thread via GitHub
Michael-J-Ward commented on code in PR #782: URL: https://github.com/apache/datafusion-python/pull/782#discussion_r1693992444 ## python/datafusion/tests/test_wrapper_coverage.py: ## @@ -0,0 +1,49 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contrib

[PR] Better multi-column aggregation support with StringView [datafusion]

2024-07-27 Thread via GitHub
XiangpengHao opened a new pull request, #11684: URL: https://github.com/apache/datafusion/pull/11684 ## Which issue does this PR close? Related to #7000. ## Rationale for this change I get some time to implement the multi-column aggregation with StringView, the impl

Re: [PR] Upgrade Datafusion 40 [datafusion-python]

2024-07-27 Thread via GitHub
timsaucer commented on PR #771: URL: https://github.com/apache/datafusion-python/pull/771#issuecomment-2254217131 Thank you. That is very helpful! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] build(deps): bump datafusion-functions-array from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #783: URL: https://github.com/apache/datafusion-python/pull/783 Bumps [datafusion-functions-array](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7

[PR] build(deps): bump tokio from 1.39.1 to 1.39.2 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #785: URL: https://github.com/apache/datafusion-python/pull/785 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.39.1 to 1.39.2. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio v

[PR] build(deps): bump datafusion-common from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #787: URL: https://github.com/apache/datafusion-python/pull/787 Bumps [datafusion-common](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7a7a54e1aa

[PR] build(deps): bump datafusion-substrait from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #784: URL: https://github.com/apache/datafusion-python/pull/784 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7a7a54e

[PR] build(deps): bump datafusion-sql from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #788: URL: https://github.com/apache/datafusion-python/pull/788 Bumps [datafusion-sql](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7a7a54e1aaee8

[PR] build(deps): bump datafusion-optimizer from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #789: URL: https://github.com/apache/datafusion-python/pull/789 Bumps [datafusion-optimizer](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7a7a54e

[PR] build(deps): bump datafusion-expr from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #790: URL: https://github.com/apache/datafusion-python/pull/790 Bumps [datafusion-expr](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7a7a54e1aaee

[PR] build(deps): bump datafusion from 39.0.0 to 40.0.0 [datafusion-python]

2024-07-27 Thread via GitHub
dependabot[bot] opened a new pull request, #786: URL: https://github.com/apache/datafusion-python/pull/786 Bumps [datafusion](https://github.com/apache/datafusion) from 39.0.0 to 40.0.0. Commits https://github.com/apache/datafusion/commit/4cae81363e29f011c6602a7a7a54e1aaee84104

Re: [PR] Update cache key used in rust CI script [datafusion]

2024-07-27 Thread via GitHub
findepi commented on PR #11641: URL: https://github.com/apache/datafusion/pull/11641#issuecomment-2254238859 > But I'm not sure what is the real benefit of it technically none other than removing misleading comment & value -- This is an automated message from the Apache Git Service.

Re: [PR] Minor: Rename `RepartitionMetrics::repartition_time` to `RepartitionMetrics::repart_time` to match metric [datafusion]

2024-07-27 Thread via GitHub
findepi commented on PR #11478: URL: https://github.com/apache/datafusion/pull/11478#issuecomment-2254243788 SGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-27 Thread via GitHub
edmondop commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2254253666 @jayzhan211 I am a little confused about the test case here https://github.com/apache/datafusion/blob/a721be1b1d863b5b15a7a945c37ec051c449c46f/datafusion/sqllogictest/test_files/aggr

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-07-27 Thread via GitHub
BurntSushi commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2254274781 Author of Jiff here. I don't have a ton of context on the specific problem in this issue, but I'd be happy to field questions about DST safe arithmetic. Jiff in particular is

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1694065887 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -414,25 +484,6 @@ macro_rules! min_max_batch { $OP )

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1694066849 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -123,170 +201,163 @@ macro_rules! instantiate_max_accumulator { /// /// [`ArrowPrimitiveType`]: arrow:

[I] Remove unnecessary allocations in `struct` and `named_struct` [datafusion]

2024-07-27 Thread via GitHub
Rafferty97 opened a new issue, #11685: URL: https://github.com/apache/datafusion/issues/11685 ## Summary Improve the performance of the `struct` and `named_struct` functions by eliminating unnecessary heap allocations. ## Detail In the implementations of the `struct` and

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2254351344 I would like to add `eliminate min/max` to optimizer, I think it could simplify the optimizer a bit. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1694098591 ## datafusion/expr/src/type_coercion/aggregates.rs: ## @@ -163,7 +142,7 @@ pub fn check_arg_count( Ok(()) } -fn get_min_max_result_type(input_types: &[D

[I] Eliminate distinct of min/max with ExprBuilder [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 opened a new issue, #11686: URL: https://github.com/apache/datafusion/issues/11686 ### Is your feature request related to a problem or challenge? Given that distinct min/max is the same as non-distinct, it is easier for datafusion if we eliminate distinct as early as possib

[I] Replace `OnceLock` with `LazyLock` [datafusion]

2024-07-27 Thread via GitHub
jayzhan211 opened a new issue, #11687: URL: https://github.com/apache/datafusion/issues/11687 ### Is your feature request related to a problem or challenge? LazyLock is stabilized in 1.80 🚀 It is more ergonomic than OnceLock, it would be nice to switch to LazyLock ### Descri