[PR] Remove element's nullability of array_agg function [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 opened a new pull request, #11447: URL: https://github.com/apache/datafusion/pull/11447 ## Which issue does this PR close? Closes #. ## Rationale for this change I think the nullability of element in array_agg_* function doesn't help much about optimiz

Re: [I] Convert `ArrayAgg` to UDAF [datafusion]

2024-07-12 Thread via GitHub
eejbyfeldt commented on issue #10999: URL: https://github.com/apache/datafusion/issues/10999#issuecomment-2226771665 @jayzhan211 I will not be able to work on it for the two weeks due to being on vacation. So, someone else should feel to pick it up before then. -- This is an automated mes

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-12 Thread via GitHub
wiedld commented on PR #11444: URL: https://github.com/apache/datafusion/pull/11444#issuecomment-2226755050 The SessionState contains multiple copies of the ParquetOptions: * (`βŠƒ` denotes "contained within") *SessionState.config βŠƒ SessionConfig βŠƒ ConfigOptions βŠƒ ExecutionOpt

Re: [I] Review use of logical expressions in physical AggregateFunctionExpr [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #11359: URL: https://github.com/apache/datafusion/issues/11359#issuecomment-2226745321 Cool, maybe I could think about *pulling down* functions trait from `expr` instead of *pulling up* common things to `expr-common` πŸ€” -- This is an automated message from t

Re: [I] [Epic] Extract catalog functionality from the core to make it more modular [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #10782: URL: https://github.com/apache/datafusion/issues/10782#issuecomment-2226741942 > datafusion-catalog: contains traits datafusion-catalog-basic ~datafusion-catalog~ datafusion-catalog-common: contains traits ~datafusion-catalog-basic~ datafusio

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-12 Thread via GitHub
wiedld commented on PR #11444: URL: https://github.com/apache/datafusion/pull/11444#issuecomment-2226741174 > I wonder if we should make `ParquetOptions` be consistent with the `WriterOptions`? It seems like missing kv_metadata and column overrides might be an oversight rather than an inten

Re: [I] Convert `ArrayAgg` to UDAF [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #10999: URL: https://github.com/apache/datafusion/issues/10999#issuecomment-2226737918 @eejbyfeldt Dp you plan to work on `array_agg`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Minor: change internal error to not supported error for nested field … [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 merged PR #11446: URL: https://github.com/apache/datafusion/pull/11446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2226704815 My guess is because the query is not multiple phase aggregate so there is no result that check with `state_field`. Maybe we should figure out some complex query that has stat

Re: [PR] Standardize the separator in name [datafusion]

2024-07-12 Thread via GitHub
github-actions[bot] commented on PR #10363: URL: https://github.com/apache/datafusion/pull/10363#issuecomment-2226699774 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: add raw aggregate udf planner [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11371: URL: https://github.com/apache/datafusion/pull/11371#discussion_r1676601717 ## datafusion/expr/src/planner.rs: ## @@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync { ) -> Result>> { Ok(PlannerResult::Original(args))

Re: [PR] feat: add raw aggregate udf planner [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11371: URL: https://github.com/apache/datafusion/pull/11371#discussion_r1676601717 ## datafusion/expr/src/planner.rs: ## @@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync { ) -> Result>> { Ok(PlannerResult::Original(args))

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676554150 ## datafusion/physical-plan/src/lib.rs: ## @@ -1005,6 +1035,71 @@ mod tests { assert_eq!(RenamedEmptyExec::static_name(), "MyRenamedEmptyExec"); } +

Re: [PR] chore: Move temporal kernels and expressions to spark-expr crate [datafusion-comet]

2024-07-12 Thread via GitHub
codecov-commenter commented on PR #660: URL: https://github.com/apache/datafusion-comet/pull/660#issuecomment-2226573453 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/660?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-12 Thread via GitHub
efredine commented on code in PR #11289: URL: https://github.com/apache/datafusion/pull/11289#discussion_r1676560183 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1984,7 +1981,96 @@ async fn test_struct() { } .run(); } +// test nested struct +#[tokio::t

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-12 Thread via GitHub
efredine commented on code in PR #11289: URL: https://github.com/apache/datafusion/pull/11289#discussion_r1676555493 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1984,7 +1981,96 @@ async fn test_struct() { } .run(); } +// test nested struct +#[tokio::t

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on PR #11403: URL: https://github.com/apache/datafusion/pull/11403#issuecomment-2226514789 I've updated the PR with the latest changes that I hope reflect all the feedback received thus far. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] fix: Spark-4.0 widening type support [datafusion-comet]

2024-07-12 Thread via GitHub
kazuyukitanimura commented on PR #604: URL: https://github.com/apache/datafusion-comet/pull/604#issuecomment-2226513285 This is ready for review @andygrove @comphead @huaxingao @viirya @parthchandra already approved it. -- This is an automated message from the Apache Git Service. To re

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676536818 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -182,6 +182,24 @@ async fn merge_join() { .await } +#[tokio::test] +async fn sort_merge_join_spill

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-12 Thread via GitHub
Lordworms commented on code in PR #11289: URL: https://github.com/apache/datafusion/pull/11289#discussion_r1676519805 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1984,7 +1981,96 @@ async fn test_struct() { } .run(); } +// test nested struct +#[tokio::

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-12 Thread via GitHub
Lordworms commented on code in PR #11289: URL: https://github.com/apache/datafusion/pull/11289#discussion_r1676519805 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1984,7 +1981,96 @@ async fn test_struct() { } .run(); } +// test nested struct +#[tokio::

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676518816 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -182,6 +182,24 @@ async fn merge_join() { .await } +#[tokio::test] +async fn sort_merge_join_spill

Re: [PR] feat: Comet windows functions support [datafusion-comet]

2024-07-12 Thread via GitHub
comphead commented on PR #200: URL: https://github.com/apache/datafusion-comet/pull/200#issuecomment-2226440886 Closing it as this PR was moved to #599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: Comet windows functions support [datafusion-comet]

2024-07-12 Thread via GitHub
comphead closed pull request #200: feat: Comet windows functions support URL: https://github.com/apache/datafusion-comet/pull/200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore: Remove utils crate and move utils into spark-expr crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove merged PR #658: URL: https://github.com/apache/datafusion-comet/pull/658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Use IfExpr to check when input to log2 is <=0 and return null [datafusion-comet]

2024-07-12 Thread via GitHub
kazuyukitanimura commented on PR #506: URL: https://github.com/apache/datafusion-comet/pull/506#issuecomment-2226418991 I would rather do this in QueryPlanSerde, E.g. ``` case Log2(child) => val childExpr = exprToProtoInternal(nullIfNegative(child), inputs) d

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11218: URL: https://github.com/apache/datafusion/pull/11218#issuecomment-2226410209 I plan to review this PR later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] chore: Move temporal kernels and expressions to spark-expr crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove opened a new pull request, #660: URL: https://github.com/apache/datafusion-comet/pull/660 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

[I] [EPIC] Move remaining scalar functions to datafusion-comet-spark-expr crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove opened a new issue, #659: URL: https://github.com/apache/datafusion-comet/issues/659 ### What is the problem the feature request solves? We should move the following expressions to the new crate. - [ ] NegativeExpr - [ ] UnboundColumn - [ ] HourExec/MinuteExec/Se

Re: [I] Minimize the dependency on `SessionState` [datafusion]

2024-07-12 Thread via GitHub
cisaacson commented on issue #11420: URL: https://github.com/apache/datafusion/issues/11420#issuecomment-2226398313 @alamb I agree, removing it or using `Any` would make lots of things challenging. The idea of state for a SessionContext that is accessible from a variety of places is importa

Re: [I] Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) [datafusion]

2024-07-12 Thread via GitHub
westonpace commented on issue #7955: URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2226370295 > I wonder if TakeExec or something quite similar could also be used for dynamic join predicates? If there is a secondary index on `l_partkey` then I think a `TakeExec` c

Re: [PR] Use IfExpr to check when input to log2 is <=0 and return null [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #506: URL: https://github.com/apache/datafusion-comet/pull/506#discussion_r1676469979 ## core/src/execution/datafusion/planner.rs: ## @@ -1397,7 +1397,18 @@ impl PhysicalPlanner { args.is_empty(), )); -Ok(scalar_e

Re: [PR] feat: Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on PR #615: URL: https://github.com/apache/datafusion-comet/pull/615#issuecomment-2226366973 This is looking good @sujithjay. CI is failing due to clippy warnings. If you run clippy locally you should be able to see the same warnings as well as suggestions for fixing.

Re: [PR] feat: Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #615: URL: https://github.com/apache/datafusion-comet/pull/615#discussion_r1676467290 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1208,6 +1277,260 @@ fn do_cast_string_to_int< Ok(Some(result)) } +fn cast_string_to_decima

Re: [PR] feat: Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #615: URL: https://github.com/apache/datafusion-comet/pull/615#discussion_r1676466184 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1208,6 +1277,260 @@ fn do_cast_string_to_int< Ok(Some(result)) } +fn cast_string_to_decima

Re: [PR] feat: Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #615: URL: https://github.com/apache/datafusion-comet/pull/615#discussion_r1676465710 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1208,6 +1277,260 @@ fn do_cast_string_to_int< Ok(Some(result)) } +fn cast_string_to_decima

Re: [PR] feat: Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #615: URL: https://github.com/apache/datafusion-comet/pull/615#discussion_r1676465313 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1208,6 +1277,260 @@ fn do_cast_string_to_int< Ok(Some(result)) } +fn cast_string_to_decima

Re: [I] Expose inner field of struct within list-array [datafusion]

2024-07-12 Thread via GitHub
jleibs commented on issue #11419: URL: https://github.com/apache/datafusion/issues/11419#issuecomment-2226357791 Here's the proof-of-concept I wrote to handle this for one level of struct field extraction: https://gist.github.com/jleibs/853a8f2eae2445d5bcdf9198e08ea6a0 -- This is an au

Re: [PR] feat: Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on PR #615: URL: https://github.com/apache/datafusion-comet/pull/615#issuecomment-2226350136 > @andygrove @vaibhawvipul Could you please take a look? Apologies @sujithjay I had missed this ping. I will review this early next week. -- This is an automated message

Re: [PR] feat: Comet windows functions support [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on PR #200: URL: https://github.com/apache/datafusion-comet/pull/200#issuecomment-2226346415 Should we close this PR @comphead or are you still planning on working on this? I wasn't sure if this is still needed now that some window function support has been implemented b

[PR] Minor: change internal error to not supported error for nested field … [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new pull request, #11446: URL: https://github.com/apache/datafusion/pull/11446 …access ## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/11445 ## Rationale for this change the reproducer on https://github.com/apach

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#issuecomment-2226345212 I will go ahead and close this PR since there hasn't been any activity in a few weeks, but feel to reopen it @awol2005ex if you are are planning on picking this up again -- Th

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove closed pull request #491: Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 URL: https://github.com/apache/datafusion-comet/pull/491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Expose inner field of struct within list-array [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11419: URL: https://github.com/apache/datafusion/issues/11419#issuecomment-2226339500 DYI @duongcongtoai and @jayzhan211 who might have some pointers / suggestions -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Expose inner field of struct within list-array [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11419: URL: https://github.com/apache/datafusion/issues/11419#issuecomment-2226339750 https://github.com/alamb/datafusion/blob/ea92ae72f7ec2e941d35aa077c6a39f74523ab63/datafusion/functions/src/core/getfield.rs#L141-L214 is how the current field access code works

[I] "Nested identifiers not yet supported" error [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new issue, #11445: URL: https://github.com/apache/datafusion/issues/11445 ### Is your feature request related to a problem or challenge? This came up on discord: https://discord.com/channels/885562378132000778/885562378132000781/1261359404197089443 ### De

Re: [I] Inconsistent value for `data_page_max_rows` setting in DataFusion `ParquetOptions` and in `ArrowWriterOptions` [datafusion]

2024-07-12 Thread via GitHub
wiedld commented on issue #11367: URL: https://github.com/apache/datafusion/issues/11367#issuecomment-2226326990 Ah, I should be clear (not trying to step on toes πŸ˜… ). My goal was to delineate the differences btwn the APIs more, but to leave the actual fixing of the defaults (& tests to enf

Re: [PR] feat: Show user a more intuitive message when queries fall back to Spark [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove merged PR #656: URL: https://github.com/apache/datafusion-comet/pull/656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676438298 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -182,6 +182,24 @@ async fn merge_join() { .await } +#[tokio::test] +async fn sort_merge_join_spill

Re: [I] Inconsistent value for `data_page_max_rows` setting in DataFusion `ParquetOptions` and in `ArrowWriterOptions` [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11367: URL: https://github.com/apache/datafusion/issues/11367#issuecomment-2226319371 https://github.com/apache/datafusion/pull/11444 is similar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11444: URL: https://github.com/apache/datafusion/pull/11444#discussion_r1676430454 ## datafusion/common/src/config.rs: ## @@ -454,6 +470,80 @@ config_namespace! { } } +#[cfg(feature = "parquet")] +impl ParquetOptions { +/// Convert the

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11444: URL: https://github.com/apache/datafusion/pull/11444#issuecomment-2226306003 Possibly related: https://github.com/apache/datafusion/issues/11367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-12 Thread via GitHub
wiedld opened a new pull request, #11444: URL: https://github.com/apache/datafusion/pull/11444 ## Which issue does this PR close? Here's a proposed cleanup. **I'm not sure yet it this should be done**, so it's a draft. ## Rationale for this change * We have two session-

Re: [I] Make SQL strings generated from Exprs even "prettier" [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #10633: URL: https://github.com/apache/datafusion/issues/10633#issuecomment-2226299178 Good call -- thanks @MohamedAbdeen21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Make SQL strings generated from Exprs even "prettier" [datafusion]

2024-07-12 Thread via GitHub
alamb closed issue #10633: Make SQL strings generated from Exprs even "prettier" URL: https://github.com/apache/datafusion/issues/10633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Feature request: Support for lateral joins [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #10048: URL: https://github.com/apache/datafusion/issues/10048#issuecomment-2226298681 > > I am not familar enough with lateral joins to be sure without some more research > > The `LATERAL` join syntax from Postges[1](#user-content-fn-1-34707c1f2caf3760a3ce09

Re: [I] Review use of logical expressions in physical AggregateFunctionExpr [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11359: URL: https://github.com/apache/datafusion/issues/11359#issuecomment-2226298111 I agree we should document what the fields are used for now I personally recommend we finish #8708 before we try to do some other crate refactor. We are close with that one

Re: [I] Review use of logical expressions in physical AggregateFunctionExpr [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11359: URL: https://github.com/apache/datafusion/issues/11359#issuecomment-2226297000 I agree with @jayzhan211 that the core of the problem is that the user defined API for aggregates is in datafusion_expr so can only use `Expr` but is invoked / instantiated as pa

Re: [I] Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) [datafusion]

2024-07-12 Thread via GitHub
ahirner commented on issue #7955: URL: https://github.com/apache/datafusion/issues/7955#issuecomment-2226292167 From: https://github.com/apache/datafusion/discussions/9963 > TakeExec (index lookup) -- really like an indexed scan somehow> I wonder if `TakeExec` or something quite

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676416986 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -867,12 +951,17 @@ impl SMJStream { while !self.buffered_data.batches.is_empt

Re: [PR] feat: add raw aggregate udf planner [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11371: URL: https://github.com/apache/datafusion/pull/11371#discussion_r1676414149 ## datafusion/expr/src/planner.rs: ## @@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync { ) -> Result>> { Ok(PlannerResult::Original(args))

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-12 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676409356 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -182,6 +182,24 @@ async fn merge_join() { .await } +#[tokio::test] +async fn sort_merge_join_spill

Re: [PR] Docs: Document creating new extension APIs [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11425: URL: https://github.com/apache/datafusion/pull/11425#issuecomment-2226277835 > I'm not sure if we need to add such detail to the doc, but I thought leaving an example here could be helpful to future new collaborators. Thanks @ozankabak -- I think it is h

Re: [PR] Docs: Document creating new extension APIs [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11425: URL: https://github.com/apache/datafusion/pull/11425#issuecomment-2226278452 I hope to leave this PR open for a few more days to gather any more comments other community members might have on this content -- This is an automated message from the Apache Git Se

Re: [PR] Minor: fix giuthub action labeler rules [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11428: URL: https://github.com/apache/datafusion/pull/11428#discussion_r1676394474 ## .github/workflows/dev_pr/labeler.yml: ## @@ -17,11 +17,11 @@ development-process: - changed-files: - - any-glob-to-any-file: ['dev/**.*', '.github/**.*', 'ci

Re: [PR] Minor: fix giuthub action labeler rules [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11428: URL: https://github.com/apache/datafusion/pull/11428#issuecomment-2226258208 Thanks @jonahgao -- I'll watch and see if it works better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Minor: fix giuthub action labeler rules [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11428: URL: https://github.com/apache/datafusion/pull/11428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement TPCH substrait integration test, support tpch_13, tpch_14,16 [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11405: URL: https://github.com/apache/datafusion/pull/11405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement TPCH substrait integration test, support tpch_13, tpch_14,16 [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11405: URL: https://github.com/apache/datafusion/pull/11405#issuecomment-2226257169 Thanks again @Blizzara and @Lordworms πŸš€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Short term way to make `AggregateStatistics` still work when min/max is converted to udaf [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11261: URL: https://github.com/apache/datafusion/pull/11261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Minimize the dependency on `SessionState` [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11420: URL: https://github.com/apache/datafusion/issues/11420#issuecomment-2226253535 In theory I think you are right. However, I am worried about removing SessionState from the `scan` method as it is so widely used and has everything people need. In fact her

Re: [I] [Epic] Extract catalog functionality from the core to make it more modular [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #10782: URL: https://github.com/apache/datafusion/issues/10782#issuecomment-2226248126 > To move TableProvider out we need to avoid dependency on SessionState for scan function, but there is CatalogProviderList in SessionState πŸ€” we could potentially move Sess

[PR] chore: Remove utils crate and move utils into spark-expr crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove opened a new pull request, #658: URL: https://github.com/apache/datafusion-comet/pull/658 ## Which issue does this PR close? N/A ## Rationale for this change This is a follow on to https://github.com/apache/datafusion-comet/pull/654 related to t

Re: [PR] Avoid calling shutdown after failed write of AsyncWrite [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11415: URL: https://github.com/apache/datafusion/pull/11415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Avoid calling shutdown after failed write of AsyncWrite [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11415: URL: https://github.com/apache/datafusion/pull/11415#discussion_r1676379315 ## datafusion/core/src/datasource/file_format/write/orchestration.rs: ## @@ -50,7 +50,7 @@ pub(crate) async fn serialize_rb_stream_to_object_store( mut data_rx:

Re: [PR] Avoid calling shutdown after failed write of AsyncWrite [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11415: URL: https://github.com/apache/datafusion/pull/11415#issuecomment-2226238495 Thanks again @joroKr21 and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[I] Add some structure to write orchestration [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new issue, #11443: URL: https://github.com/apache/datafusion/issues/11443 Perhaps it would be good for this PR to describe in docs what is what in return megatype and later we can factor this out into separate strict type _Originally posted by @comphead i

Re: [PR] Combine the Roadmap / Quarterly Roadmap sections [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11426: URL: https://github.com/apache/datafusion/pull/11426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Combine the Roadmap / Quarterly Roadmap sections [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11426: URL: https://github.com/apache/datafusion/pull/11426#issuecomment-2226229730 Since this PR just moves some content I will merge it in. Let's have the roadmap discussion on https://github.com/apache/datafusion/issues/11442 -- This is an automated message from

Re: [PR] Combine the Roadmap / Quarterly Roadmap sections [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11426: URL: https://github.com/apache/datafusion/pull/11426#issuecomment-2226228993 Thank you @comphead and @wjones127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Combine the Roadmap / Quarterly Roadmap sections [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11426: URL: https://github.com/apache/datafusion/pull/11426#discussion_r1676372760 ## docs/source/contributor-guide/roadmap.md: ## @@ -43,3 +43,84 @@ start a conversation using a github issue or the make review efficient and avoid surprises. [T

Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11442: URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2226228146 πŸ€” I have stuff I would like to do. But that doesn't really count as a roadmap for the project πŸ˜† Here are some things I might guess 1. streaming stuff https://github.co

Re: [PR] chore: Move `cast` to `spark-expr` crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #654: URL: https://github.com/apache/datafusion-comet/pull/654#discussion_r1676368801 ## native/utils/src/lib.rs: ## @@ -34,3 +48,151 @@ pub fn down_cast_any_ref(any: &dyn Any) -> &dyn Any { any } } + +/// Preprocesses input array

Re: [PR] chore: Move `cast` to `spark-expr` crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove merged PR #654: URL: https://github.com/apache/datafusion-comet/pull/654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

[I] 2024 Q3-Q4 Roadmap? [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new issue, #11442: URL: https://github.com/apache/datafusion/issues/11442 ### Is your feature request related to a problem or challenge? @comphead asked https://github.com/apache/datafusion/pull/11426#discussion_r1676137598 > do we have a roadmap for 2024?

Re: [PR] chore: Move `cast` to `spark-expr` crate [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #654: URL: https://github.com/apache/datafusion-comet/pull/654#discussion_r1676367663 ## native/utils/src/lib.rs: ## @@ -34,3 +48,151 @@ pub fn down_cast_any_ref(any: &dyn Any) -> &dyn Any { any } } + +/// Preprocesses input array

Re: [PR] Add extension hooks for encoding and decoding UDAFs and UDWFs [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11417: URL: https://github.com/apache/datafusion/pull/11417#discussion_r1676357997 ## datafusion-examples/examples/composed_extension_codec.rs: ## @@ -239,53 +240,52 @@ struct ComposedPhysicalExtensionCodec { codecs: Vec>, } +impl ComposedP

Re: [I] [DISCUSSION] Support for Streaming in DataFusion [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11404: URL: https://github.com/apache/datafusion/issues/11404#issuecomment-2226181938 Also related blog post: https://www.linkedin.com/pulse/future-datafusion-streaming-matt-green-ril7c from @emgeee -- This is an automated message from the Apache Git Service. T

[I] Support `to_json` function on `struct` type [datafusion]

2024-07-12 Thread via GitHub
dharanad opened a new issue, #11441: URL: https://github.com/apache/datafusion/issues/11441 ### Is your feature request related to a problem or challenge? This request is to support `to_json` in comet https://github.com/apache/datafusion-comet/issues/631. Rather implementing this in

Re: [PR] Improved unparser documentation [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11395: URL: https://github.com/apache/datafusion/pull/11395#issuecomment-2226174754 > lgtm thanks @alamb pretty useful thing this unparser Yes, kudos belong to @backkem and @devinjdangelo for bringing the idea into DataFusion at first (from datafusion-federati

Re: [PR] fix: make sure JOIN ON expression is boolean type [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11423: URL: https://github.com/apache/datafusion/pull/11423#discussion_r1676334942 ## datafusion/sqllogictest/test_files/join.slt: ## @@ -998,11 +997,22 @@ CREATE TABLE t2 (v0 DOUBLE) AS VALUES (-1.663563947387); statement ok CREATE TABLE t3 (v0

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1676329493 ## datafusion/core/src/execution/session_state.rs: ## @@ -195,122 +196,10 @@ impl SessionState { runtime: Arc, catalog_list: Arc, ) -> Self {

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1676328034 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`]'s

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1676326033 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`]'s

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1676326553 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`]'s

Re: [I] [EPIC] A collection of issues for supporting the `MAP` DataType [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11429: URL: https://github.com/apache/datafusion/issues/11429#issuecomment-2226154134 > I have created the follow-up issues for #11268: Awesome -- thank you @goldmedal -- I added them to the ticket description too -- This is an automated message from the

[PR] minor: split repartition time and send time metrics [datafusion]

2024-07-12 Thread via GitHub
korowa opened a new pull request, #11440: URL: https://github.com/apache/datafusion/pull/11440 ## Which issue does this PR close? Closes #. ## Rationale for this change Currently `repart_time` and `send_time` metrics for `RepartitionExec` may have signifi

Re: [I] [Epic] Prepared Statement Support [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #4539: URL: https://github.com/apache/datafusion/issues/4539#issuecomment-2226145983 > What's the best way to handle `?` placeholder in datafusion? For example, `select * from t where a = ?`, it could be converted to a logical plan in datafusion, but the plan can't

Re: [PR] Move configuration information out of example usage page [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11300: URL: https://github.com/apache/datafusion/pull/11300#discussion_r1676315200 ## docs/source/user-guide/crate-configuration.md: ## @@ -0,0 +1,146 @@ + + +# Crate Configuration + +This section contains information on how to configure DataFusion

Re: [PR] Move configuration information out of example usage page [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11300: URL: https://github.com/apache/datafusion/pull/11300#issuecomment-2226142551 Thanks for the review @jonahgao and merge @comphead πŸ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

  1   2   3   >