Re: [PR] Add was_valid parameter to NullState callbacks [datafusion]

2024-07-22 Thread via GitHub
joroKr21 commented on PR #11592: URL: https://github.com/apache/datafusion/pull/11592#issuecomment-2244358083 /benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix : `signum` function bug when `0.0` input [datafusion]

2024-07-22 Thread via GitHub
Throne3d commented on PR #11580: URL: https://github.com/apache/datafusion/pull/11580#issuecomment-2244205992 Here are the values I get for Spark 3.5.1: ```python data = ["-1", "1", "0", "-0.0", "0.0", "1.0", "-1.0"] df = spark.createDataFrame([(datum,) for datum in data], "col1

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-22 Thread via GitHub
phillipleblanc commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1687384579 ## _posts/2024-07-23-datafusion-40.0.0.md: ## @@ -0,0 +1,492 @@ +--- +layout: post +title: "Apache DataFusion 40.0.0 Released" +date: "2024-07-21 00:00:00" +au

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-22 Thread via GitHub
phillipleblanc commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1687352908 ## _posts/2024-07-23-datafusion-40.0.0.md: ## @@ -0,0 +1,492 @@ +--- +layout: post +title: "Apache DataFusion 40.0.0 Released" +date: "2024-07-21 00:00:00" +au

Re: [PR] Implement physical plan serialization for csv COPY plans , add `as_any`, `Debug` to `FileFormatFactory` [datafusion]

2024-07-22 Thread via GitHub
Lordworms commented on PR #11588: URL: https://github.com/apache/datafusion/pull/11588#issuecomment-2244176742 The fuzz test failed seems to be unrelated to this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-22 Thread via GitHub
Throne3d commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1687365607 ## _posts/2024-07-23-datafusion-40.0.0.md: ## @@ -0,0 +1,492 @@ +--- +layout: post +title: "Apache DataFusion 40.0.0 Released" +date: "2024-07-21 00:00:00" +author:

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-22 Thread via GitHub
doki23 commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2244171578 > > I have a question: How users specify the underlying physical types? FYI, Clickhouse exposes physical types to users like [this](https://clickhouse.com/docs/en/sql-reference/d

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2244155142 > I have a question: How users specify the underlying physical types? FYI, Clickhouse exposes physical types to users like [this](https://clickhouse.com/docs/en/sql-reference

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11516: URL: https://github.com/apache/datafusion/pull/11516#discussion_r1687354649 ## datafusion/core/src/datasource/listing_table_factory.rs: ## @@ -49,16 +49,18 @@ impl ListingTableFactory { impl TableProviderFactory for ListingTableFactory

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-22 Thread via GitHub
doki23 commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2244145992 I have a question: How users specify the underlying physical types? FYI, Clickhouse exposes physical types to users like [this](https://clickhouse.com/docs/en/sql-reference/data-

Re: [PR] Remove ArrayAgg Builtin [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11611: URL: https://github.com/apache/datafusion/pull/11611#discussion_r1687345718 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -573,8 +573,9 @@ impl AggregateExpr for AggregateFunctionExpr { })

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-22 Thread via GitHub
doki23 commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2244134428 I find a previous discussion and reference it here: #7421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Fix typo in doc of Partitioning [datafusion]

2024-07-22 Thread via GitHub
waruto210 commented on PR #11612: URL: https://github.com/apache/datafusion/pull/11612#issuecomment-2244125929 PTAL @alamb. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Fix typo in doc of Partitioning [datafusion]

2024-07-22 Thread via GitHub
waruto210 opened a new pull request, #11612: URL: https://github.com/apache/datafusion/pull/11612 ## Which issue does this PR close? Fix typo in doc of Partitioning Closes #11593 . ## Rationale for this change ## What changes are included in this PR?

Re: [PR] Using Union's input schema when recompute schema [datafusion]

2024-07-22 Thread via GitHub
github-actions[bot] closed pull request #10494: Using Union's input schema when recompute schema URL: https://github.com/apache/datafusion/pull/10494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Converted internal representation of LogicalPlanBuilder from LogicalPlan to Arc #10485 [datafusion]

2024-07-22 Thread via GitHub
github-actions[bot] closed pull request #10487: Converted internal representation of LogicalPlanBuilder from LogicalPlan to Arc #10485 URL: https://github.com/apache/datafusion/pull/10487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: Add `ProgressiveEval` operator [datafusion]

2024-07-22 Thread via GitHub
github-actions[bot] closed pull request #10490: feat: Add `ProgressiveEval` operator URL: https://github.com/apache/datafusion/pull/10490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] fix: dictionary decimal vector optimization [datafusion-comet]

2024-07-22 Thread via GitHub
kazuyukitanimura commented on code in PR #705: URL: https://github.com/apache/datafusion-comet/pull/705#discussion_r1687313607 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -100,17 +102,21 @@ public void close() { } private void initialize()

[PR] chore: add more aggregate functions to benchmark test [datafusion-comet]

2024-07-22 Thread via GitHub
huaxingao opened a new pull request, #706: URL: https://github.com/apache/datafusion-comet/pull/706 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [I] Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest` [datafusion]

2024-07-22 Thread via GitHub
JasonLi-cn commented on issue #11498: URL: https://github.com/apache/datafusion/issues/11498#issuecomment-2244089441 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2244076648 @findepi I see your PR plan to close #10782, isn't it the first step(? Given I didn't see `datafusion-catalog-common`, I guess there is some misunderstand about the next steps

Re: [I] Typo in doc of datafusion::physical_plan::Partitioning [datafusion]

2024-07-22 Thread via GitHub
waruto210 commented on issue #11593: URL: https://github.com/apache/datafusion/issues/11593#issuecomment-2244076139 > Yes I agree with this @waruto210 -- thank you > > > > Perhaps the easiest change would be to update the example description to say "RepartitionExec with 1 in

Re: [PR] perf: Improve performance of CASE .. WHEN expressions [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove commented on PR #703: URL: https://github.com/apache/datafusion-comet/pull/703#issuecomment-2244067022 TPC-DS queries 2, 43, 59, 62, and 99 all benefited from the `CASE` optimizations. ![Screenshot from 2024-07-22 19-09-50](https://github.com/user-attachments/assets/737c2b5

Re: [PR] Rename `functions-array` to `functions-nested` [datafusion]

2024-07-22 Thread via GitHub
goldmedal commented on code in PR #11602: URL: https://github.com/apache/datafusion/pull/11602#discussion_r1687295016 ## datafusion/core/src/lib.rs: ## @@ -569,10 +569,10 @@ pub mod functions { pub use datafusion_functions::*; } -/// re-export of [`datafusion_functions_a

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2244060536 > the one about catalog names is applied, the one about separate traits is replied to (https://github.com/apache/datafusion/pull/11516#issuecomment-2243062441) and it looks we're

Re: [PR] feat: support Map literals in Substrait consumer and producer [datafusion]

2024-07-22 Thread via GitHub
goldmedal commented on PR #11547: URL: https://github.com/apache/datafusion/pull/11547#issuecomment-2244058251 > I wonder if @goldmedal you might have some time to review this PR as well? Sure, I'll review this tonight. -- This is an automated message from the Apache Git Service. T

Re: [I] Expression Simplifier doesn't consider associativity (`(i + 1) + 2)` is not simplified to `i + 3`) [datafusion]

2024-07-22 Thread via GitHub
alamb commented on issue #11594: URL: https://github.com/apache/datafusion/issues/11594#issuecomment-2244055525 Thank you @timsaucer -- very cool I think there would be a tradeoff between the size of the dependency / how mature it is and additional benefit in DataFusion So if

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
alamb commented on PR #11564: URL: https://github.com/apache/datafusion/pull/11564#issuecomment-2244054205 > Thanks @alamb ! Thank you -- it is so close I can *feel* it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] DRAFT: module hierarchy [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 closed pull request #11376: DRAFT: module hierarchy URL: https://github.com/apache/datafusion/pull/11376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] DRAFT: map-with-vec-exprs [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 closed pull request #11526: DRAFT: map-with-vec-exprs URL: https://github.com/apache/datafusion/pull/11526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Minor:Disable flaky SMJ antijoin filtered test until the fix [datafusion]

2024-07-22 Thread via GitHub
alamb merged PR #11608: URL: https://github.com/apache/datafusion/pull/11608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1687282898 ## datafusion/physical-expr/src/aggregate/groups_accumulator/mod.rs: ## @@ -26,6 +26,4 @@ pub(crate) mod accumulate { pub use datafusion_physical_expr_commo

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1687283552 ## datafusion/proto/proto/datafusion.proto: ## @@ -466,8 +466,8 @@ message InListNode { } enum AggregateFunction { Review Comment: we can delete it!

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1687283314 ## datafusion/physical-expr/src/aggregate/groups_accumulator/mod.rs: ## @@ -26,6 +26,4 @@ pub(crate) mod accumulate { pub use datafusion_physical_expr_commo

Re: [PR] fix: dictionary decimal vector optimization [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove commented on code in PR #705: URL: https://github.com/apache/datafusion-comet/pull/705#discussion_r1687282806 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -100,17 +102,21 @@ public void close() { } private void initialize() { -

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1687282898 ## datafusion/physical-expr/src/aggregate/groups_accumulator/mod.rs: ## @@ -26,6 +26,4 @@ pub(crate) mod accumulate { pub use datafusion_physical_expr_commo

Re: [PR] Move min and max to user defined aggregate function [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11013: URL: https://github.com/apache/datafusion/pull/11013#discussion_r1687281829 ## datafusion/physical-expr/src/aggregate/build_in.rs: ## @@ -232,54 +222,6 @@ mod tests { Ok(()) } -#[test] -fn test_min_max_expr() ->

Re: [PR] Apache DataFusion Comet Logo [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove merged PR #697: URL: https://github.com/apache/datafusion-comet/pull/697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] perf: Improve performance of CASE .. WHEN expressions [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove commented on code in PR #703: URL: https://github.com/apache/datafusion-comet/pull/703#discussion_r1687278717 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala: ## @@ -53,23 +53,24 @@ import org.apache.comet.CometConf object CometTP

[PR] Remove ArrayAgg Builtin [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 opened a new pull request, #11611: URL: https://github.com/apache/datafusion/pull/11611 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] fix: dictionary decimal vector optimization [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove commented on code in PR #705: URL: https://github.com/apache/datafusion-comet/pull/705#discussion_r1687278083 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -100,17 +102,21 @@ public void close() { } private void initialize() { -

Re: [PR] fix: dictionary decimal vector optimization [datafusion-comet]

2024-07-22 Thread via GitHub
kazuyukitanimura commented on code in PR #705: URL: https://github.com/apache/datafusion-comet/pull/705#discussion_r1687275670 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -100,17 +102,21 @@ public void close() { } private void initialize()

Re: [PR] fix: dictionary decimal vector optimization [datafusion-comet]

2024-07-22 Thread via GitHub
parthchandra commented on code in PR #705: URL: https://github.com/apache/datafusion-comet/pull/705#discussion_r1687267060 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -100,17 +102,21 @@ public void close() { } private void initialize() { -

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 merged PR #11564: URL: https://github.com/apache/datafusion/pull/11564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on PR #11564: URL: https://github.com/apache/datafusion/pull/11564#issuecomment-2244011361 Thanks @alamb ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on PR #11564: URL: https://github.com/apache/datafusion/pull/11564#issuecomment-2244011048 TODO: 1. Remove ArrayAgg Builtin definition 2. Generalize order-by clause rewrite 3. Expr Builder (not sure before the refactor or after 🤔 ) -- This is an automated mess

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11564: URL: https://github.com/apache/datafusion/pull/11564#discussion_r1687259830 ## datafusion/expr/src/function.rs: ## @@ -57,6 +57,9 @@ pub struct AccumulatorArgs<'a> { /// The schema of the input arguments pub schema: &'a Schema

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on PR #11564: URL: https://github.com/apache/datafusion/pull/11564#issuecomment-2244005936 Either `limited_convert_logical_expr_to_physical_expr_with_dfschema` and `limited_convert_logical_expr_to_physical_expr` are both temporary until we refactor #11359 -- This is

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11564: URL: https://github.com/apache/datafusion/pull/11564#discussion_r1687256081 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -81,6 +86,61 @@ pub fn create_aggregate_expr( .map(|e| e.expr.data_type(schema))

Re: [PR] Migrate `OrderSensitiveArrayAgg` to be a user defined aggregate [datafusion]

2024-07-22 Thread via GitHub
jayzhan211 commented on code in PR #11564: URL: https://github.com/apache/datafusion/pull/11564#discussion_r1687256081 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -81,6 +86,61 @@ pub fn create_aggregate_expr( .map(|e| e.expr.data_type(schema))

Re: [I] Expression Simplifier doesn't consider associativity (`(i + 1) + 2)` is not simplified to `i + 3`) [datafusion]

2024-07-22 Thread via GitHub
timsaucer commented on issue #11594: URL: https://github.com/apache/datafusion/issues/11594#issuecomment-2243996984 A couple of resources for using an existing CAS (Computer Algebra System). SymPy is a python package, so not as useful but appears to be well built and has a good descri

Re: [PR] Implement physical plan serialization for csv COPY plans , add `as_any`, `Debug` to `FileFormatFactory` [datafusion]

2024-07-22 Thread via GitHub
Lordworms commented on code in PR #11588: URL: https://github.com/apache/datafusion/pull/11588#discussion_r1687247415 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -149,6 +154,11 @@ impl DefaultFileType { file_format_factory, } } + +/

[PR] fix: dictionary decimal vector optimization [datafusion-comet]

2024-07-22 Thread via GitHub
kazuyukitanimura opened a new pull request, #705: URL: https://github.com/apache/datafusion-comet/pull/705 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/679 and https://github.com/apache/datafusion-comet/issues/670 Related https://gith

Re: [PR] perf: Improve performance of CASE .. WHEN expressions [datafusion-comet]

2024-07-22 Thread via GitHub
parthchandra commented on code in PR #703: URL: https://github.com/apache/datafusion-comet/pull/703#discussion_r1687229468 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala: ## @@ -53,23 +53,24 @@ import org.apache.comet.CometConf object Come

Re: [PR] Apache DataFusion Comet Logo [datafusion-comet]

2024-07-22 Thread via GitHub
parthchandra commented on PR #697: URL: https://github.com/apache/datafusion-comet/pull/697#issuecomment-2243949707 +1 These look great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-22 Thread via GitHub
cisaacson commented on code in PR #11516: URL: https://github.com/apache/datafusion/pull/11516#discussion_r1687194943 ## datafusion/catalog/src/session.rs: ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [PR] perf: Optimize IsNotNullExpr [datafusion]

2024-07-22 Thread via GitHub
andygrove commented on code in PR #11586: URL: https://github.com/apache/datafusion/pull/11586#discussion_r1687165784 ## datafusion/physical-expr/src/expressions/is_null.rs: ## @@ -117,6 +117,16 @@ pub(crate) fn compute_is_null(array: ArrayRef) -> Result { } } +/// work

Re: [PR] perf: Optimize IsNotNullExpr [datafusion]

2024-07-22 Thread via GitHub
andygrove commented on code in PR #11586: URL: https://github.com/apache/datafusion/pull/11586#discussion_r1687165511 ## datafusion/physical-expr/src/expressions/is_null.rs: ## @@ -117,6 +117,21 @@ pub(crate) fn compute_is_null(array: ArrayRef) -> Result { } } +/// work

Re: [PR] chore(native): Make sure all targets in workspace been covered by clippy [datafusion-comet]

2024-07-22 Thread via GitHub
viirya commented on PR #702: URL: https://github.com/apache/datafusion-comet/pull/702#issuecomment-2243818827 Merged. Thanks @Xuanwo @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Clippy is not happy now [datafusion-comet]

2024-07-22 Thread via GitHub
viirya closed issue #700: Clippy is not happy now URL: https://github.com/apache/datafusion-comet/issues/700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] Extract CoalesceBatchesStream to a struct [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11610: URL: https://github.com/apache/datafusion/pull/11610#discussion_r1687143990 ## datafusion/physical-plan/src/coalesce_batches.rs: ## @@ -167,14 +164,8 @@ impl ExecutionPlan for CoalesceBatchesExec { struct CoalesceBatchesStream { /// Th

Re: [PR] chore(native): Make sure all targets in workspace been covered by clippy [datafusion-comet]

2024-07-22 Thread via GitHub
viirya merged PR #702: URL: https://github.com/apache/datafusion-comet/pull/702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Improve `SingleDistinctToGroupBy` to get the same plan as the `group by` query [datafusion]

2024-07-22 Thread via GitHub
alamb commented on issue #11360: URL: https://github.com/apache/datafusion/issues/11360#issuecomment-2243812261 > What is the rationale of single distinct to group by? Take count for example, I think distinct accumulator could be way more efficient than normal accumulator with group by 🤔

[PR] Extract CoalesceBatchesStream to a struct [datafusion]

2024-07-22 Thread via GitHub
alamb opened a new pull request, #11610: URL: https://github.com/apache/datafusion/pull/11610 Draft as I need to - [ ] add tests - [ ] Run benchmarks to ensure no regression ## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/9370

Re: [I] `SanityCheckPlan` Error during planning: ... does not satisfy parent order requirements: ... [datafusion]

2024-07-22 Thread via GitHub
ozankabak closed issue #11492: `SanityCheckPlan` Error during planning: ... does not satisfy parent order requirements: ... URL: https://github.com/apache/datafusion/issues/11492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Improve Union Equivalence Propagation [datafusion]

2024-07-22 Thread via GitHub
ozankabak commented on code in PR #11506: URL: https://github.com/apache/datafusion/pull/11506#discussion_r1687127355 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -191,6 +193,33 @@ pub fn with_new_children_if_necessary( } } +/// Rewrites an expression a

Re: [PR] Improve Union Equivalence Propagation [datafusion]

2024-07-22 Thread via GitHub
ozankabak merged PR #11506: URL: https://github.com/apache/datafusion/pull/11506 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Improve Union Equivalence Propagation [datafusion]

2024-07-22 Thread via GitHub
ozankabak commented on code in PR #11506: URL: https://github.com/apache/datafusion/pull/11506#discussion_r1687128217 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -1484,6 +1527,84 @@ impl Hash for ExprWrapper { } } +/// Calculates the union (in the se

Re: [PR] Improve Union Equivalence Propagation [datafusion]

2024-07-22 Thread via GitHub
ozankabak commented on code in PR #11506: URL: https://github.com/apache/datafusion/pull/11506#discussion_r1687123288 ## datafusion/physical-plan/src/union.rs: ## @@ -99,7 +99,12 @@ impl UnionExec { /// Create a new UnionExec pub fn new(inputs: Vec>) -> Self {

[I] Ability to chunk download from object store [datafusion]

2024-07-22 Thread via GitHub
trungda opened a new issue, #11609: URL: https://github.com/apache/datafusion/issues/11609 ### Is your feature request related to a problem or challenge? When downloading large objects (> 300MBs) using object_store crate, I often hit timeout using the default configuration (30 seconds

Re: [PR] GC `StringViewArray` in `CoalesceBatchesStream` [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11587: URL: https://github.com/apache/datafusion/pull/11587#discussion_r1687100275 ## datafusion/physical-plan/src/coalesce_batches.rs: ## @@ -290,6 +294,46 @@ pub fn concat_batches( arrow::compute::concat_batches(schema, batches) } +/// `S

Re: [PR] feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp, compatibility guide update. [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove commented on PR #704: URL: https://github.com/apache/datafusion-comet/pull/704#issuecomment-2243757594 Thanks @akhilss99. Could you run `make format` to fix the code formatting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Use rust-toolchain.toml for better ecosystem support [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove closed issue #698: Use rust-toolchain.toml for better ecosystem support URL: https://github.com/apache/datafusion-comet/issues/698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] chore: Use rust-toolchain.toml for better toolchain support [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove merged PR #699: URL: https://github.com/apache/datafusion-comet/pull/699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] chore: Update version to 0.2.0 and add 0.1.0 changelog [datafusion-comet]

2024-07-22 Thread via GitHub
andygrove merged PR #696: URL: https://github.com/apache/datafusion-comet/pull/696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] perf: Optimize IsNotNullExpr [datafusion]

2024-07-22 Thread via GitHub
andygrove commented on code in PR #11586: URL: https://github.com/apache/datafusion/pull/11586#discussion_r1687090064 ## datafusion/physical-expr/src/expressions/is_null.rs: ## @@ -117,6 +117,21 @@ pub(crate) fn compute_is_null(array: ArrayRef) -> Result { } } +/// work

Re: [I] Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered` [datafusion]

2024-07-22 Thread via GitHub
comphead commented on issue #11555: URL: https://github.com/apache/datafusion/issues/11555#issuecomment-2243707186 I have disabled the test for now. I'll spend more time on investigation why this happens -- This is an automated message from the Apache Git Service. To respond to the messag

[PR] Minor:Disable flaky SMJ antijoin filtered test until the perm solution [datafusion]

2024-07-22 Thread via GitHub
comphead opened a new pull request, #11608: URL: https://github.com/apache/datafusion/pull/11608 ## Which issue does this PR close? Closes #. ## Rationale for this change Disable flaky SMJ antijoin filtered test until the perm solution ## What changes are i

Re: [PR] feat: add bounds for unary math scalar functions [datafusion]

2024-07-22 Thread via GitHub
tshauck commented on code in PR #11584: URL: https://github.com/apache/datafusion/pull/11584#discussion_r1687069516 ## datafusion/functions/src/math/bounds.rs: ## @@ -0,0 +1,137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Add NullState::is_valid and NullState::is_null [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11592: URL: https://github.com/apache/datafusion/pull/11592#discussion_r1687065841 ## datafusion/physical-expr-common/src/aggregate/groups_accumulator/accumulate.rs: ## @@ -132,7 +132,7 @@ impl NullState { mut value_fn: F, ) where

Re: [PR] Improve Union Equivalence Propagation [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11506: URL: https://github.com/apache/datafusion/pull/11506#discussion_r1687056920 ## datafusion/sqllogictest/test_files/order.slt: ## @@ -1132,3 +1132,10 @@ physical_plan 02)--ProjectionExec: expr=[CAST(inc_col@0 > desc_col@1 AS Int32) as c] 03)

Re: [PR] Add NullState::is_valid and NullState::is_null [datafusion]

2024-07-22 Thread via GitHub
joroKr21 commented on PR #11592: URL: https://github.com/apache/datafusion/pull/11592#issuecomment-2243686252 @alamb I implemented the version with an additional callback parameter, LMK what you think? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add NullState::is_valid and NullState::is_null [datafusion]

2024-07-22 Thread via GitHub
joroKr21 commented on PR #11592: URL: https://github.com/apache/datafusion/pull/11592#issuecomment-2243667468 Yeah good point, I converted it to a draft. I'm not sure what to do. Adding a boolean flag to the callback is not great either... -- This is an automated message from the Apache G

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11516: URL: https://github.com/apache/datafusion/pull/11516#discussion_r1687045497 ## datafusion/catalog/src/session.rs: ## @@ -0,0 +1,102 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Plan `LATERAL` subqueries [datafusion]

2024-07-22 Thread via GitHub
alamb commented on PR #11456: URL: https://github.com/apache/datafusion/pull/11456#issuecomment-2243655010 Still on my list... . I am not sure if @jackwener or @mingmwang might have time to review this (or if they are familiar with this particular rewrite) -- This is an automated

Re: [PR] Implement `DynamicFileSchemaProvider` in the core [datafusion]

2024-07-22 Thread via GitHub
alamb commented on PR #11035: URL: https://github.com/apache/datafusion/pull/11035#issuecomment-2243654072 BTW I think https://github.com/apache/datafusion/pull/11516 is very related to this PR -- maybe once we get that one in then the API changes needed for this feature will become more na

Re: [PR] Extract catalog API to separate crate, change `TableProvider::scan` to take a trait rather than `SessionState` [datafusion]

2024-07-22 Thread via GitHub
Omega359 commented on code in PR #11516: URL: https://github.com/apache/datafusion/pull/11516#discussion_r1687040884 ## datafusion/catalog/src/catalog.rs: ## @@ -0,0 +1,173 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] Implement `DynamicFileSchemaProvider` in the core [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11035: URL: https://github.com/apache/datafusion/pull/11035#discussion_r1687039780 ## datafusion/core/src/catalog/dynamic_file_schema.rs: ## @@ -0,0 +1,197 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Feature/alternate function extension [datafusion]

2024-07-22 Thread via GitHub
timsaucer commented on PR #11582: URL: https://github.com/apache/datafusion/pull/11582#issuecomment-2243644463 Closing in favor of #11550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Feature/alternate function extension [datafusion]

2024-07-22 Thread via GitHub
timsaucer closed pull request #11582: Feature/alternate function extension URL: https://github.com/apache/datafusion/pull/11582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Implement physical plan serialization for csv COPY plans , add `as_any`, `Debug` to `FileFormatFactory` [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11588: URL: https://github.com/apache/datafusion/pull/11588#discussion_r1687025318 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -63,6 +63,10 @@ pub trait FileFormatFactory: Sync + Send + GetExt { /// Initialize a [FileFormat]

Re: [PR] Improve unparser MySQL compatibility [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11589: URL: https://github.com/apache/datafusion/pull/11589#discussion_r1687023298 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -36,8 +42,8 @@ pub trait Dialect: Send + Sync { true } -// Does the dialect use TIMESTAMP to rep

Re: [PR] Move `OrderSensitiveArrayAgg` to UDAF [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11564: URL: https://github.com/apache/datafusion/pull/11564#discussion_r1687016111 ## datafusion/expr/src/function.rs: ## @@ -57,6 +57,9 @@ pub struct AccumulatorArgs<'a> { /// The schema of the input arguments pub schema: &'a Schema, +

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-22 Thread via GitHub
Omega359 commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1687020525 ## _posts/2024-07-23-datafusion-40.0.0.md: ## @@ -0,0 +1,492 @@ +--- +layout: post +title: "Apache DataFusion 40.0.0 Released" +date: "2024-07-21 00:00:00" +author:

Re: [I] Typo in doc of datafusion::physical_plan::Partitioning [datafusion]

2024-07-22 Thread via GitHub
alamb commented on issue #11593: URL: https://github.com/apache/datafusion/issues/11593#issuecomment-2243611997 Yes I agree with this @waruto210 -- thank you Perhaps the easiest change would be to update the example description to say "RepartitionExec with 1 input partition and 3 ou

Re: [PR] Parsing SQL strings to Exprs with the qualified schema [datafusion]

2024-07-22 Thread via GitHub
alamb commented on PR #11562: URL: https://github.com/apache/datafusion/pull/11562#issuecomment-2243607969 Thanks @Lordworms and @goldmedal Since this changes the logic to resolve identifiers I think it should have a more careful review. Maybe @jonahgao has time. If not I will try

Re: [PR] Fix Internal Error for an INNER JOIN query [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11578: URL: https://github.com/apache/datafusion/pull/11578#discussion_r1687007005 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -518,6 +510,14 @@ impl LogicalPlan { Ok(using_columns) } +fn as_col(expr: Expr) -> Result { R

Re: [I] bug: signum produces different values than Spark in some cases [datafusion-comet]

2024-07-22 Thread via GitHub
abrassel commented on issue #486: URL: https://github.com/apache/datafusion-comet/issues/486#issuecomment-2243587968 Hi @andygrove sorry for not getting around to it! Work and personal life sharply picked up. Glad you were able to get the issue resolved. -- This is an automated message f

Re: [PR] feat: support Map literals in Substrait consumer and producer [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11547: URL: https://github.com/apache/datafusion/pull/11547#discussion_r1686995057 ## datafusion/common/src/hash_utils.rs: ## @@ -692,6 +730,48 @@ mod tests { assert_eq!(hashes[0], hashes[1]); } +#[test] +// Tests actual val

Re: [PR] perf: Optimize IsNotNullExpr [datafusion]

2024-07-22 Thread via GitHub
alamb commented on code in PR #11586: URL: https://github.com/apache/datafusion/pull/11586#discussion_r1686982702 ## datafusion/physical-expr/src/expressions/is_null.rs: ## @@ -117,6 +117,21 @@ pub(crate) fn compute_is_null(array: ArrayRef) -> Result { } } +/// workarou

  1   2   3   >