Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-25 Thread via GitHub
ozankabak commented on code in PR #10650: URL: https://github.com/apache/datafusion/pull/10650#discussion_r1614446423 ## datafusion/physical-expr/src/expressions/cast.rs: ## @@ -170,7 +170,8 @@ impl PhysicalExpr for CastExpr { let target_type = &self.cast_type;

Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-25 Thread via GitHub
ozankabak commented on PR #10650: URL: https://github.com/apache/datafusion/pull/10650#issuecomment-2131118238 Maybe I can help clarify things a little bit. The purpose of the PR is not to change the current fallback behavior, just simplify its implementation. AFAICT @berkaysynnada is movin

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children` to return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614462365 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
ozankabak commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614484172 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&self

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-25 Thread via GitHub
ozankabak commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2131154079 The aim has been to make the `COPY` and `CREATE EXTERNAL TABLE` statements use the same `OPTIONS` syntax. The PR you mention was actually a part of multi-PR effort to make this so

[PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal opened a new pull request, #10661: URL: https://github.com/apache/datafusion/pull/10661 ## Which issue does this PR close? Closes #10658 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614503459 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614503459 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614503459 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Convert first, last aggregate function to UDAF [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10648: URL: https://github.com/apache/datafusion/pull/10648#discussion_r1614514043 ## datafusion/physical-expr-common/src/utils.rs: ## @@ -100,15 +103,37 @@ pub fn reverse_order_bys(order_bys: &[PhysicalSortExpr]) -> Vec`. +/// If conversion is no

[PR] fix: substring with negative indices should produce correct result [datafusion-comet]

2024-05-25 Thread via GitHub
sonhmai opened a new pull request, #470: URL: https://github.com/apache/datafusion-comet/pull/470 ## Which issue does this PR close? Closes #463 ## Rationale for this change Fix for substring with negative indices produces incorrect results ## What changes are

Re: [I] [Epic] Unify `AggregateFunction` Interface (remove built in list of `AggregateFunction` s), improve the system [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #8708: URL: https://github.com/apache/datafusion/issues/8708#issuecomment-2131167747 After https://github.com/apache/datafusion/pull/10648 and https://github.com/apache/datafusion/issues/10389 I think we have a pretty good set of examples of how to move aggregates o

Re: [PR] Fix incorrect statistics read for binary columns in parquet [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10645: URL: https://github.com/apache/datafusion/pull/10645#issuecomment-2131168957 > The gate issue is not related to the PR.. could we rerun? I restarted it FYI another trick we have found is if you close the PR and then reopen it, github will rerun al

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-25 Thread via GitHub
advancedxy commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1614516643 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or m

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614522462 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

[PR] Fix typo in Cargo.toml [datafusion]

2024-05-25 Thread via GitHub
alamb opened a new pull request, #10662: URL: https://github.com/apache/datafusion/pull/10662 ## Which issue does this PR close? Closes #. ## Rationale for this change When running `cargo build` locally I see ``` warning: /Users/andrewlamb/Software/data

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614522462 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614522462 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2131171739 > If this is the case then there is divergence between postgres and arrow-rs. Which takes priority? I would personally suggest we do whatever is consistent with arrow (and easie

Re: [PR] fix: substring with negative indices should produce correct result [datafusion-comet]

2024-05-25 Thread via GitHub
sonhmai commented on PR #470: URL: https://github.com/apache/datafusion-comet/pull/470#issuecomment-2131173613 @viirya @andygrove can you take a look. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614530863 ## datafusion/sqllogictest/test_files/select.slt: ## @@ -1473,7 +1473,7 @@ DROP TABLE t; # related to https://github.com/apache/datafusion/issues/8814 statement

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614503459 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614522462 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-25 Thread via GitHub
advancedxy commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1614532699 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -217,6 +325,25 @@ class CometExpressionCoverageSuite extends CometTestBase

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1614534288 ## datafusion/substrait/src/variation_const.rs: ## @@ -37,3 +38,58 @@ pub const DEFAULT_CONTAINER_TYPE_REF: u32 = 0; pub const LARGE_CONTAINER_TYPE_REF: u32 = 1; p

Re: [PR] fix wrong type validation on unnest expr [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10657: URL: https://github.com/apache/datafusion/pull/10657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Wrong error thrown when unnesting a list of struct [datafusion]

2024-05-25 Thread via GitHub
alamb closed issue #10656: Wrong error thrown when unnesting a list of struct URL: https://github.com/apache/datafusion/issues/10656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Fix incorrect statistics read for binary columns in parquet [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10645: URL: https://github.com/apache/datafusion/pull/10645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Incorrect statistics read for binary columns in parquet [datafusion]

2024-05-25 Thread via GitHub
alamb closed issue #10605: Incorrect statistics read for binary columns in parquet URL: https://github.com/apache/datafusion/issues/10605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10655: URL: https://github.com/apache/datafusion/pull/10655#discussion_r1614536504 ## datafusion/functions/src/core/getfield.rs: ## @@ -106,6 +106,9 @@ impl ScalarUDFImpl for GetFieldFunc { }; let access_schema = GetFieldAccessSch

Re: [I] Error on `NULL["field_name"]`: The expression to get an indexed field is only valid for `List`, `Struct`, or `Map` types, got Null [datafusion]

2024-05-25 Thread via GitHub
alamb closed issue #10654: Error on `NULL["field_name"]`: The expression to get an indexed field is only valid for `List`, `Struct`, or `Map` types, got Null URL: https://github.com/apache/datafusion/issues/10654 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10655: URL: https://github.com/apache/datafusion/pull/10655#issuecomment-2131186851 Thank you for the speedy review @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10655: URL: https://github.com/apache/datafusion/pull/10655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614539494 ## datafusion/optimizer/src/push_down_limit.rs: ## @@ -65,7 +65,6 @@ impl OptimizerRule for PushDownLimit { }; let Limit { skip, fetch, input

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614539989 ## datafusion/core/src/execution/context/mod.rs: ## @@ -2893,21 +2893,21 @@ mod tests { for function in all_default_functions() { let udf =

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614539989 ## datafusion/core/src/execution/context/mod.rs: ## @@ -2893,21 +2893,21 @@ mod tests { for function in all_default_functions() { let udf =

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1614539590 ## datafusion/functions-aggregate/src/median.rs: ## @@ -15,71 +15,105 @@ // specific language governing permissions and limitations // under the License. -//! #

Re: [PR] Update substrait requirement from 0.33.3 to 0.34.0 [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10632: URL: https://github.com/apache/datafusion/pull/10632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614546056 ## datafusion/core/src/execution/context/mod.rs: ## @@ -2860,6 +2863,57 @@ mod tests { Ok(()) } +#[tokio::test] +async fn test_register_defau

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10661: URL: https://github.com/apache/datafusion/pull/10661#issuecomment-2131195735 Thank you @goldmedal 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Implement protobuf serialization for LogicalPlan::Unnest [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10639: URL: https://github.com/apache/datafusion/issues/10639#issuecomment-2131196191 Thanks @jamesmcm -- this would be a good first issue I think for someone with Rust experience looking to help with DataFusion. The protobuf serialization code is a bit hairy I fi

Re: [I] Make TaskContext wrap SessionState [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10631: URL: https://github.com/apache/datafusion/issues/10631#issuecomment-2131197268 > I see TaskContext is just part of SessionState. Could we just make TaskContext wrap SessionState? I can't remember why TaskContext doesn't wrap SessionState -- maybe @tu

Re: [I] Make TaskContext wrap SessionState [datafusion]

2024-05-25 Thread via GitHub
tustvold commented on issue #10631: URL: https://github.com/apache/datafusion/issues/10631#issuecomment-2131201943 IIRC SessionConfig is the static configuration used to create a SessionContext, which is an interior mutable wrapper around SessionState. The idea was a query is planned

Re: [PR] Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) [datafusion]

2024-05-25 Thread via GitHub
jonahgao merged PR #10662: URL: https://github.com/apache/datafusion/pull/10662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] Suport unparsing `LogicalPlan::Distinct` to `DISTINCT` [datafusion]

2024-05-25 Thread via GitHub
alamb opened a new issue, #10663: URL: https://github.com/apache/datafusion/issues/10663 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/9726 to complete the LogialPlan --> SQL conversion Converting `LogicalPlan

[I] Suport unparsing `LogicalPlan::Window` to SQL [datafusion]

2024-05-25 Thread via GitHub
alamb opened a new issue, #10664: URL: https://github.com/apache/datafusion/issues/10664 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/9726 to complete the LogialPlan --> SQL conversion Converting `LogicalPlan

Re: [I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10652: URL: https://github.com/apache/datafusion/issues/10652#issuecomment-2131211386 > Is there any plan to support them? I think we should make plans to support them! I started collecting issues on https://github.com/apache/datafusion/issues/8661

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10623: URL: https://github.com/apache/datafusion/pull/10623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10623: URL: https://github.com/apache/datafusion/pull/10623#issuecomment-2131213174 Thanks everyone for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] [EPIC] Improved support for nested / structured types (`Struct` , `List`, `ListArray`, and other Composite types) [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #2326: URL: https://github.com/apache/datafusion/issues/2326#issuecomment-2131214534 > I added an issue to support recursive unnest: #10660, i think it shoul belong to this epic Added -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-25 Thread via GitHub
Abdullahsab3 commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2131223669 Thanks for filing the ticket and for all the detailed explanations! very enriching I wonder whether the Postgres behavior is actually that bad. Though it looks weird

Re: [I] Selecting struct field within field produces unexpected results [datafusion-python]

2024-05-25 Thread via GitHub
timsaucer commented on issue #715: URL: https://github.com/apache/datafusion-python/issues/715#issuecomment-2131229316 My statement above about testing on rust side is likely incorrect. I ran the same test above but loading the dataframe from a parquet file instead of creating in memory an

Re: [I] Library Guide: Building LogicalPlans [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #7306: URL: https://github.com/apache/datafusion/issues/7306#issuecomment-2131234075 > What do you think we should do? I can add that paragraph, but maybe we also want to link the examples? What else do you think it's needed to close this Those both sound like

Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10650: URL: https://github.com/apache/datafusion/pull/10650#issuecomment-2131235103 > Maybe we can do the following: Instead of making every TableProviderFactory responsible for implementing this behavior in create, we can add some general mechanism at the trait level

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2131235504 > This shouldn't have passed checks. > > ``` > + cargo fmt --all -- --check > `cargo metadata` exited with an error: error: failed to load manifest for workspace member `/o

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10640: URL: https://github.com/apache/datafusion/pull/10640#issuecomment-2131235829 Thanks @Blizzara and @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Minor: Add tests showing aggregate behavior for NaNs [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10634: URL: https://github.com/apache/datafusion/pull/10634#discussion_r1614628009 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4374,6 +4374,42 @@ GROUP BY dummy text1, text1, text1 +# Tests for aggregating with NaN values +

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on PR #10661: URL: https://github.com/apache/datafusion/pull/10661#issuecomment-2131238078 Thanks, @alamb! Actually, I tried to add some tests for `MemoryFunctionRegistry` before I fixed it, but I found everything was fine. I guess the reason is that we don't inse

[PR] Remove `GetFieldAccessSchema` [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new pull request, #10665: URL: https://github.com/apache/datafusion/pull/10665 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614638361 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode, from_type,

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614639502 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode, from_type,

Re: [PR] change version to 38.0.1 [datafusion-python]

2024-05-25 Thread via GitHub
andygrove merged PR #716: URL: https://github.com/apache/datafusion-python/pull/716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1614640283 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -39,6 +39,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-schema = { workspa

[PR] build(deps): bump syn from 2.0.63 to 2.0.66 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #717: URL: https://github.com/apache/datafusion-python/pull/717 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.63 to 2.0.66. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.66 Al

Re: [PR] build(deps): bump syn from 2.0.63 to 2.0.64 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] closed pull request #706: build(deps): bump syn from 2.0.63 to 2.0.64 URL: https://github.com/apache/datafusion-python/pull/706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] build(deps): bump syn from 2.0.63 to 2.0.64 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] commented on PR #706: URL: https://github.com/apache/datafusion-python/pull/706#issuecomment-2131243514 Superseded by #717. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10662: URL: https://github.com/apache/datafusion/pull/10662#issuecomment-2131244635 > oops.. my mistake. :P Many thanks. No worries! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614654486 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1778,7 +1778,7 @@ AS VALUES ('BB', 6, 1); query TII -select col1, col2, coalesce(sum_col3, 0) as sum_col3

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2131254591 Thank you @tustvold and @Abdullahsab3 and @mhilton and @appletreeisyellow for the thoughts. From my perspective, the current (non timezone aware) `date_bin` function has t

[PR] Minor: Use slice in `ConcreteTreeNode` [datafusion]

2024-05-25 Thread via GitHub
peter-toth opened a new pull request, #10666: URL: https://github.com/apache/datafusion/pull/10666 The idea of using slices to return a node's children came up here: https://github.com/apache/datafusion/pull/10543#discussion_r1614057653. While it is not possible in all `TreeNode` implemen

Re: [PR] Minor: Remove `GetFieldAccessSchema` [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10665: URL: https://github.com/apache/datafusion/pull/10665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614656536 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1778,7 +1778,7 @@ AS VALUES ('BB', 6, 1); query TII -select col1, col2, coalesce(sum_col3, 0) as sum

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614656536 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1778,7 +1778,7 @@ AS VALUES ('BB', 6, 1); query TII -select col1, col2, coalesce(sum_col3, 0) as sum

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614657172 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&sel

Re: [I] [Epic] Unify `AggregateFunction` Interface (remove built in list of `AggregateFunction` s), improve the system [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on issue #8708: URL: https://github.com/apache/datafusion/issues/8708#issuecomment-2131265763 > After #10648 and #10389 I think we have a pretty good set of examples of how to move aggregates out of the core (thanks to all the foundations layed by @jayzhan211 ) >

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1614666495 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -512,6 +572,62 @@ pub fn to_substrait_rel( } } +fn to_substrait_named_struct(schema: &DFSchema

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1614666750 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -642,11 +714,22 @@ fn should_enable_page_index( .unwrap_or(false) } -/// Factor

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2131268529 Awesome! I plan to merge this PR tomorrow unless anyone else would like time to review. -- This is an automated message from the Apache Git Service. To respond to the message

[I] Convert `Variance Sample` to UDAF [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new issue, #10667: URL: https://github.com/apache/datafusion/issues/10667 ### Is your feature request related to a problem or challenge? Part of #8708 1. Move variance aggregate expression in `datafusion/physical-expr/src/aggregate/variance.rs` to `functions-ag

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1614671030 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -685,6 +685,19 @@ async fn roundtrip_literal_struct() -> Result<()> { .await } +#[tok

[I] Convert Variance Population to UDAF [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new issue, #10668: URL: https://github.com/apache/datafusion/issues/10668 ### Is your feature request related to a problem or challenge? Similar to #10667 but for Variance Population function ### Describe the solution you'd like _No response_ ##

[PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-25 Thread via GitHub
vaibhawvipul opened a new pull request, #471: URL: https://github.com/apache/datafusion-comet/pull/471 ## Which issue does this PR close? Closes #465 . ## Rationale for this change ## What changes are included in this PR? ## How are these ch

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1614673020 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -685,6 +685,19 @@ async fn roundtrip_literal_struct() -> Result<()> { .await } +#[tok

Re: [I] Library Guide: Building LogicalPlans [datafusion]

2024-05-25 Thread via GitHub
edmondop commented on issue #7306: URL: https://github.com/apache/datafusion/issues/7306#issuecomment-2131289288 > > What do you think we should do? I can add that paragraph, but maybe we also want to link the examples? What else do you think it's needed to close this > > Those both s

Re: [PR] Minor: Improve ObjectStoreUrl docs + examples [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on PR #10619: URL: https://github.com/apache/datafusion/pull/10619#issuecomment-2131302454 On a related note, I found that the usage flow of ObjectStore in DataFusion involves registering the source as a table and then querying the table. This approach makes sense. Howev

[I] Add quote-style parameter for CSV options [datafusion]

2024-05-25 Thread via GitHub
DDtKey opened a new issue, #10669: URL: https://github.com/apache/datafusion/issues/10669 ### Is your feature request related to a problem or challenge? CSV writers usually supports configuration of quote style/mode with the following options: - `Always` - `Necessary` - `Neve

[I] CSV `quote` parameter is ignored [datafusion]

2024-05-25 Thread via GitHub
DDtKey opened a new issue, #10670: URL: https://github.com/apache/datafusion/issues/10670 ### Describe the bug quote` parameter of CSV writer is not getting passed to `arrow-csv` writer: https://github.com/apache/datafusion/blob/ea92ae72f7ec2e941d35aa077c6a39f74523ab63/datafusion/c

[PR] feat: Add "Comet Fuzz" fuzz-testing utility [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove opened a new pull request, #472: URL: https://github.com/apache/datafusion-comet/pull/472 ## Which issue does this PR close? N/A ## Rationale for this change Comet Fuzz is a standalone project for generating random data and queries and executing

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614788921 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode, from_type,

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614789127 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode, from_type,

[PR] fix: pass `quote` parameter to CSV writer [datafusion]

2024-05-25 Thread via GitHub
DDtKey opened a new pull request, #10671: URL: https://github.com/apache/datafusion/pull/10671 ## Which issue does this PR close? Closes #10670 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

[I] ArrayList data is leaked to outside unnest logical plan from the inner unnest logical plan with limit applied [datafusion]

2024-05-25 Thread via GitHub
duongcongtoai opened a new issue, #10672: URL: https://github.com/apache/datafusion/issues/10672 ### Describe the bug Given this query ``` select unnest(struct_c1c0), unnest(list_c2c0) from ( select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as list_c2c

Re: [I] ArrayList data is leaked to outside unnest logical plan from the inner unnest logical plan that comes before a Limit [datafusion]

2024-05-25 Thread via GitHub
duongcongtoai commented on issue #10672: URL: https://github.com/apache/datafusion/issues/10672#issuecomment-2131375997 I'm taking it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Add quote-style parameter for CSV options [datafusion]

2024-05-25 Thread via GitHub
DDtKey commented on issue #10669: URL: https://github.com/apache/datafusion/issues/10669#issuecomment-2131388179 I think this might be labeled with `good first issue`, there are links to the code that needs to be changed and it is also possible to write `sqllogictest` similar to https://git

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove merged PR #461: URL: https://github.com/apache/datafusion-comet/pull/461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

[PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya opened a new pull request, #473: URL: https://github.com/apache/datafusion-comet/pull/473 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

[I] ColumnReader.loadVector should initiate CometDictionary after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya opened a new issue, #474: URL: https://github.com/apache/datafusion-comet/issues/474 ### Describe the bug During debugging some test failures in #437, I found that `CometScanExec` returns empty dictionary values after first batch in failed queries. It is because in `Data.impor

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya commented on PR #473: URL: https://github.com/apache/datafusion-comet/pull/473#issuecomment-2131403965 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya commented on code in PR #473: URL: https://github.com/apache/datafusion-comet/pull/473#discussion_r1614819511 ## common/src/main/java/org/apache/comet/parquet/ColumnReader.java: ## @@ -230,15 +230,15 @@ public CometDecodedVector loadVector() { // return plain vec

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-25 Thread via GitHub
comphead commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1614830679 ## datafusion/core/src/datasource/physical_plan/parquet/schema_adapter.rs: ## @@ -20,35 +20,38 @@ use arrow_schema::{Schema, SchemaRef}; use std::fmt::Debug; us

  1   2   >