Re: [PR] feat: Optimize CASE expression for "column or null" use case [datafusion]

2024-07-19 Thread via GitHub
andygrove commented on code in PR #11534: URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683957487 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -256,6 +300,36 @@ impl CaseExpr { Ok(ColumnarValue::Array(current_value)) } + +///

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2238576514 cc @eejbyfeldt and @alamb I think @jcsherin find something interesting that nullable for list element should be `true` in order to allow computing with null element. I think

Re: [PR] fix: make `UnKnownColumn`s not equal to each other [datafusion]

2024-07-19 Thread via GitHub
jonahgao commented on code in PR #11536: URL: https://github.com/apache/datafusion/pull/11536#discussion_r1683975286 ## datafusion/physical-plan/src/union.rs: ## @@ -431,7 +431,12 @@ impl ExecutionPlan for InterleaveExec { self: Arc, children: Vec>, ) -> R

Re: [PR] feat: Optimize CASE expression for "column or null" use case [datafusion]

2024-07-19 Thread via GitHub
andygrove commented on code in PR #11534: URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683956579 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -998,6 +1080,53 @@ mod tests { Ok(()) } +#[test] Review Comment: I added the s

Re: [PR] Minor: fix some new warnings, fix `force_hash_collisions` flag propagation + comment out some tests [datafusion]

2024-07-19 Thread via GitHub
nrc commented on PR #11467: URL: https://github.com/apache/datafusion/pull/11467#issuecomment-2238179605 @alamb thanks for reviewing and looking into this! This has got weirder and I may be doing something fundamentally wrong, but here's my investigation. For reference, I'm running on

Re: [PR] feat: Optimize CASE expression for "column or null" use case [datafusion]

2024-07-19 Thread via GitHub
andygrove commented on code in PR #11534: URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683957851 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -86,13 +108,35 @@ impl CaseExpr { when_then_expr: Vec, else_expr: Option>, ) ->

Re: [PR] fix: make `UnKnownColumn`s not equal to others physical exprs [datafusion]

2024-07-19 Thread via GitHub
mustafasrepo commented on code in PR #11536: URL: https://github.com/apache/datafusion/pull/11536#discussion_r1684015535 ## datafusion/physical-plan/src/union.rs: ## @@ -431,7 +431,12 @@ impl ExecutionPlan for InterleaveExec { self: Arc, children: Vec>, )

Re: [PR] feat: Optimize CASE expression for "column or null" use case [datafusion]

2024-07-19 Thread via GitHub
Dandandan commented on code in PR #11534: URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683928562 ## datafusion/physical-expr/src/expressions/case.rs: ## @@ -256,6 +300,36 @@ impl CaseExpr { Ok(ColumnarValue::Array(current_value)) } + +///

Re: [PR] fix: make `UnKnownColumn`s not equal to each other [datafusion]

2024-07-19 Thread via GitHub
jonahgao commented on PR #11536: URL: https://github.com/apache/datafusion/pull/11536#issuecomment-2238594917 > However, I think the proposed fix might be treating the symptom rather than the root cause and I think more investigation is warranted. We could merge this PR and file a follow on

Re: [PR] Move `sql_compound_identifier_to_expr ` to `ExprPlanner` [datafusion]

2024-07-19 Thread via GitHub
dharanad commented on code in PR #11487: URL: https://github.com/apache/datafusion/pull/11487#discussion_r1683790742 ## datafusion/expr/src/planner.rs: ## @@ -173,6 +173,24 @@ pub trait ExprPlanner: Send + Sync { fn plan_overlay(&self, args: Vec) -> Result>> { Ok(

Re: [PR] Move `sql_compound_identifier_to_expr ` to `ExprPlanner` [datafusion]

2024-07-19 Thread via GitHub
dharanad commented on code in PR #11487: URL: https://github.com/apache/datafusion/pull/11487#discussion_r1683791632 ## datafusion/expr/src/planner.rs: ## @@ -173,6 +173,24 @@ pub trait ExprPlanner: Send + Sync { fn plan_overlay(&self, args: Vec) -> Result>> { Ok(

[I] Use SimpleExtensions for Substrait type variations [datafusion]

2024-07-19 Thread via GitHub
Blizzara opened a new issue, #11544: URL: https://github.com/apache/datafusion/issues/11544 ### Describe the bug Substrait has an extension mechanism for defining things not included in the protobuf format. See https://substrait.io/extensions/#simple-extensions for definition and htt

Re: [PR] feat: consume and produce Substrait type extensions [datafusion]

2024-07-19 Thread via GitHub
Blizzara commented on code in PR #11510: URL: https://github.com/apache/datafusion/pull/11510#discussion_r1684094263 ## datafusion/substrait/src/extensions.rs: ## @@ -0,0 +1,146 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 commented on code in PR #11452: URL: https://github.com/apache/datafusion/pull/11452#discussion_r1684115054 ## datafusion/sqllogictest/test_files/map.slt: ## @@ -131,17 +131,23 @@ SELECT MAKE_MAP([1,2], ['a', 'b'], [3,4], ['b']); {[1, 2]: [a, b], [3, 4]: [b]}

Re: [PR] Support `newlines_in_values` CSV option [datafusion]

2024-07-19 Thread via GitHub
connec commented on PR #11533: URL: https://github.com/apache/datafusion/pull/11533#issuecomment-2238756965 Thanks both for the rapid feedback. I'll take a look and update the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 commented on PR #11452: URL: https://github.com/apache/datafusion/pull/11452#issuecomment-2238757741 Thanks @goldmedal . I will file an issue about the `map` API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 merged PR #11452: URL: https://github.com/apache/datafusion/pull/11452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Support `newlines_in_values` CSV option [datafusion]

2024-07-19 Thread via GitHub
connec commented on code in PR #11533: URL: https://github.com/apache/datafusion/pull/11533#discussion_r1684123766 ## datafusion/common/src/config.rs: ## @@ -1665,6 +1670,14 @@ impl CsvOptions { self } +/// Set true to ensure that newlines in (quoted) values

[I] Easier Dataframe API for `map` [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 opened a new issue, #11546: URL: https://github.com/apache/datafusion/issues/11546 Dataframe API for `map` expects us to pass args with `make_array` i.e. `map(vec![make_array(vec![lit("a"), lit("b")]), make_array(vec![lit("1"), lit("2")])]) I think we could have e

Re: [PR] Move `MAKE_MAP` to ExprPlanner [datafusion]

2024-07-19 Thread via GitHub
goldmedal commented on PR #11452: URL: https://github.com/apache/datafusion/pull/11452#issuecomment-2238780395 Thanks @jayzhan211 and @dharanad for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Implement the rewrite from the Map literal to Map function [datafusion]

2024-07-19 Thread via GitHub
goldmedal commented on issue #11434: URL: https://github.com/apache/datafusion/issues/11434#issuecomment-2238802333 The part of moving `MAKE_MAP` has been solved by #11452. I'll start to implement the MAP literal after https://github.com/sqlparser-rs/sqlparser-rs/pull/1344 -- This is an

Re: [PR] Add a config to force using string view in benchmark [datafusion]

2024-07-19 Thread via GitHub
alamb merged PR #11514: URL: https://github.com/apache/datafusion/pull/11514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add support for Utf8View for date/temporal codepaths [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11518: URL: https://github.com/apache/datafusion/pull/11518#issuecomment-2238826038 > Hey @alamb, I added some sqllogictest cases. However, they will fail currently, I'd forgotten this change is dependent on the fixes in [apache/arrow-rs#6077](https://github.com/apach

Re: [PR] Extract catalog API to separate crate [datafusion]

2024-07-19 Thread via GitHub
findepi commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2238870621 > I think this is similar to minimum dependencies what I'm thinking. The only difference is that I propose `struct TableProviderContext`, but you have `trait CatalogSession` >

Re: [I] Easier Dataframe API for `map` [datafusion]

2024-07-19 Thread via GitHub
goldmedal commented on issue #11546: URL: https://github.com/apache/datafusion/issues/11546#issuecomment-2238922779 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [EPIC] A collection of issues for supporting the `MAP` DataType [datafusion]

2024-07-19 Thread via GitHub
goldmedal commented on issue #11429: URL: https://github.com/apache/datafusion/issues/11429#issuecomment-2238924621 I guess it's also related to this epic - [ ] https://github.com/apache/datafusion/issues/11546 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Extract catalog API to separate crate [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2238946024 Given the current `FileFormatFactory` implementation in datafusion, we only need `TableOptions`, we can change SessionState to `FileFormatContext ` with `TableOptions`, and it can

Re: [PR] Support `newlines_in_values` CSV option [datafusion]

2024-07-19 Thread via GitHub
connec commented on code in PR #11533: URL: https://github.com/apache/datafusion/pull/11533#discussion_r1684242064 ## datafusion/common/src/config.rs: ## @@ -184,6 +184,10 @@ config_namespace! { /// Default value for `format.has_header` for `CREATE EXTERNAL TABLE`

Re: [PR] Support `newlines_in_values` CSV option [datafusion]

2024-07-19 Thread via GitHub
connec commented on code in PR #11533: URL: https://github.com/apache/datafusion/pull/11533#discussion_r1684240801 ## datafusion/common/src/config.rs: ## @@ -1665,6 +1670,14 @@ impl CsvOptions { self } +/// Set true to ensure that newlines in (quoted) values

Re: [PR] Support `newlines_in_values` CSV option [datafusion]

2024-07-19 Thread via GitHub
connec commented on PR #11533: URL: https://github.com/apache/datafusion/pull/11533#issuecomment-2238962418 I've made some documentation updates and added/fixed sqllogictests. There's the new clippy issue around having too many arguments for `CsvExec::new`, which I can either fix with

Re: [PR] Extract catalog API to separate crate [datafusion]

2024-07-19 Thread via GitHub
findepi commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2238975365 At runtime there are circular dependencies, unless we further limit amount of functionality available to the `TableProvider`. However, I think we actually solve the circular depen

Re: [PR] feat: consume and produce Substrait type extensions [datafusion]

2024-07-19 Thread via GitHub
Blizzara commented on code in PR #11510: URL: https://github.com/apache/datafusion/pull/11510#discussion_r1684267072 ## datafusion/substrait/src/extensions.rs: ## @@ -0,0 +1,146 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] feat: consume and produce Substrait type extensions [datafusion]

2024-07-19 Thread via GitHub
Blizzara commented on code in PR #11510: URL: https://github.com/apache/datafusion/pull/11510#discussion_r1684274036 ## datafusion/substrait/src/variation_const.rs: ## @@ -55,6 +55,7 @@ pub const DECIMAL_256_TYPE_VARIATION_REF: u32 = 1; /// [`DataType::Interval`]: datafusion::a

Re: [PR] Extract catalog API to separate crate [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2239003792 I doubt that `CatalogSession` solves all the dependencies issue. `CatalogSession` is one higher level trait out of core. Others implementations are still leave inside core. If we

[PR] Avo/substrait map literals [datafusion]

2024-07-19 Thread via GitHub
Blizzara opened a new pull request, #11547: URL: https://github.com/apache/datafusion/pull/11547 ## Which issue does this PR close? Related to Map epic https://github.com/apache/datafusion/issues/11434 ## Rationale for this change Substrait didn't support Map literals, si

Re: [PR] Avo/substrait map literals [datafusion]

2024-07-19 Thread via GitHub
Blizzara commented on code in PR #11547: URL: https://github.com/apache/datafusion/pull/11547#discussion_r1684290508 ## datafusion/sqllogictest/test_files/map.slt: ## @@ -302,3 +302,9 @@ SELECT MAP(arrow_cast(make_array('POST', 'HEAD', 'PATCH'), 'LargeList(Utf8)'), a {POST: 41

Re: [PR] feat: Support Map literals in Substrait consumer and producer [datafusion]

2024-07-19 Thread via GitHub
Blizzara commented on PR #11547: URL: https://github.com/apache/datafusion/pull/11547#issuecomment-2239027784 This and https://github.com/apache/datafusion/pull/11510 will conflict, once either one is merged I'll be happy to fix the other one. -- This is an automated message from the Apac

[PR] fix: typos of sql, sqllogictest and substrait packages [datafusion]

2024-07-19 Thread via GitHub
JasonLi-cn opened a new pull request, #11548: URL: https://github.com/apache/datafusion/pull/11548 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11521 ## Rationale for this change ## What changes are included in this

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-19 Thread via GitHub
findepi commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1684319212 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,450 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:00" +aut

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-19 Thread via GitHub
findepi commented on PR #6: URL: https://github.com/apache/datafusion-site/pull/6#issuecomment-2239051762 Is there a way to deploy a rendered preview? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-19 Thread via GitHub
mustafasrepo commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1684320344 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,450 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:00"

Re: [PR] Extract catalog API to separate crate [datafusion]

2024-07-19 Thread via GitHub
jayzhan211 commented on PR #11516: URL: https://github.com/apache/datafusion/pull/11516#issuecomment-2239052594 I think there is clear dependency something close to `Catalog` -> `Schema` -> `Table` -> `FileFormat` -> `QueryPlanner`. They all have trait and trait implementation and we c

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-19 Thread via GitHub
findepi commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1684321134 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,450 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:00" +aut

Re: [PR] feat: Optimize CASE expression for "column or null" use case [datafusion]

2024-07-19 Thread via GitHub
andygrove merged PR #11534: URL: https://github.com/apache/datafusion/pull/11534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-19 Thread via GitHub
findepi commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1684332130 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,450 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:00" +aut

[I] Crash bug when `log()` is used in `order by` clause (SQLancer) [datafusion]

2024-07-19 Thread via GitHub
2010YOUY01 opened a new issue, #11549: URL: https://github.com/apache/datafusion/issues/11549 ### Describe the bug Reproducer in datafusion-cli ``` DataFusion CLI v40.0.0 > create table t3(v1 int); 0 row(s) fetched. Elapsed 0.065 seconds. > select * from t

[PR] Consistent approach to setting parameters on aggregate functions and window functions [datafusion]

2024-07-19 Thread via GitHub
timsaucer opened a new pull request, #11550: URL: https://github.com/apache/datafusion/pull/11550 ## Which issue does this PR close? Closes #6747. ## Rationale for this change There was excellent work done on https://github.com/apache/datafusion/pull/10560 that makes a m

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-07-19 Thread via GitHub
timsaucer commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2239126528 I started a new branch off `main` with these changes. Tomorrow I'll review the previous branch @shanretoo was working on to make sure I didn't miss any unit tests he added. Othe

[I] Parsing SQL strings to Exprs wtih the qualified schema [datafusion]

2024-07-19 Thread via GitHub
goldmedal opened a new issue, #11551: URL: https://github.com/apache/datafusion/issues/11551 ### Is your feature request related to a problem or challenge? After #10995, we have `parse_sql_expr()` to create a logical expression from the SQL string. This API allows creating the express

Re: [PR] Consistent approach to setting parameters on aggregate functions and window functions [datafusion]

2024-07-19 Thread via GitHub
timsaucer commented on PR #11550: URL: https://github.com/apache/datafusion/pull/11550#issuecomment-2239199650 I don't think I have permission on this repo to add labels to this PR so I wasn't able to add one for having user facing changes -- This is an automated message from the Apache G

Re: [PR] doc: Add memory tuning section to user guide [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove commented on code in PR #684: URL: https://github.com/apache/datafusion-comet/pull/684#discussion_r1684397109 ## docs/source/user-guide/tuning.md: ## @@ -21,6 +21,17 @@ under the License. Comet provides some tuning options to help you get the best performance from

Re: [PR] doc: Add memory tuning section to user guide [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove commented on code in PR #684: URL: https://github.com/apache/datafusion-comet/pull/684#discussion_r1684397942 ## docs/source/user-guide/tuning.md: ## @@ -21,6 +21,17 @@ under the License. Comet provides some tuning options to help you get the best performance from

[I] Query with `order by acos(sin(v1))` panic (SQLancer) [datafusion]

2024-07-19 Thread via GitHub
2010YOUY01 opened a new issue, #11552: URL: https://github.com/apache/datafusion/issues/11552 ### Describe the bug Reproducer in datafusion-cli: ``` DataFusion CLI v40.0.0 > create table t1(v1 int); 0 row(s) fetched. Elapsed 0.073 seconds. > SELECT * FROM t1 O

Re: [PR] chore: fix typos of sql, sqllogictest and substrait packages [datafusion]

2024-07-19 Thread via GitHub
jonahgao merged PR #11548: URL: https://github.com/apache/datafusion/pull/11548 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: fix typos of sql, sqllogictest and substrait packages [datafusion]

2024-07-19 Thread via GitHub
jonahgao commented on PR #11548: URL: https://github.com/apache/datafusion/pull/11548#issuecomment-2239241951 Thanks @JasonLi-cn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Create a logo for the Comet project [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove commented on issue #596: URL: https://github.com/apache/datafusion-comet/issues/596#issuecomment-2239253667 @aocsa I think that looks great. Do you want to create a PR to add this to the repo? -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] 1TB TPCDS benchmark over Vanilla Spark, suffer performance slowdown [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove commented on issue #588: URL: https://github.com/apache/datafusion-comet/issues/588#issuecomment-2239257871 @DamonZhao-sfu could you also provide the configs you used for the Spark run? I am seeing most queries running faster with Comet (but at 100GB) and would like to try and re

Re: [I] Query with `order by acos(sin(v1))` panic (SQLancer) [datafusion]

2024-07-19 Thread via GitHub
2010YOUY01 commented on issue #11552: URL: https://github.com/apache/datafusion/issues/11552#issuecomment-2239309766 All below queries will panic similarly ``` SELECT * FROM t1 ORDER BY ACOS(SIN(v1)); SELECT * FROM t1 ORDER BY ACOSH(SIN(v1)); SELECT * FROM t1 ORDER BY ASIN(SIN(v1)

Re: [PR] fix: CASE with NULL [datafusion]

2024-07-19 Thread via GitHub
jonahgao commented on code in PR #11542: URL: https://github.com/apache/datafusion/pull/11542#discussion_r1684484209 ## datafusion/expr/src/expr_schema.rs: ## @@ -112,7 +112,23 @@ impl ExprSchemable for Expr { Expr::OuterReferenceColumn(ty, _) => Ok(ty.clone()),

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-19 Thread via GitHub
phillipleblanc commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1684550959 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,450 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:0

Re: [I] Create a logo for the Comet project [datafusion-comet]

2024-07-19 Thread via GitHub
aocsa commented on issue #596: URL: https://github.com/apache/datafusion-comet/issues/596#issuecomment-2239488984 Sure I can do that. ๐Ÿ‘ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] docs: Update benchmark results [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove opened a new pull request, #687: URL: https://github.com/apache/datafusion-comet/pull/687 ## Which issue does this PR close? N/A ## Rationale for this change We have made many improvements in Comet since we last published benchmark results.

Re: [PR] fix: change the not exists base image apache/spark:3.4.3 to 3.4.2 [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove merged PR #686: URL: https://github.com/apache/datafusion-comet/pull/686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Dockerfile: base (build) image not exists [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove closed issue #685: Dockerfile: base (build) image not exists URL: https://github.com/apache/datafusion-comet/issues/685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2239554664 @viirya @parthchandra I no longer think that it is critical to fix https://github.com/apache/datafusion-comet/issues/387 before we release, because users can already enable

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-19 Thread via GitHub
alamb commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2239614500 Thank you @notfilippo -- I think this proposal is well thought out and makes a lot of sense to me. If we were to implement it I think the benefits for DataFusion would be

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-19 Thread via GitHub
alamb commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2239617843 Thoughts on the technical details > This would also mean the introduction of new Field-compatible structure (LogicalPhysicalField) Since `LogicalPlans` already use DF

[PR] fix: Spark 4.0 SparkArithmeticException test [datafusion-comet]

2024-07-19 Thread via GitHub
kazuyukitanimura opened a new pull request, #688: URL: https://github.com/apache/datafusion-comet/pull/688 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/372 and https://github.com/apache/datafusion-comet/issues/551 ## Rationale for

Re: [I] 1TB TPCDS benchmark over Vanilla Spark, suffer performance slowdown [datafusion-comet]

2024-07-19 Thread via GitHub
DamonZhao-sfu commented on issue #588: URL: https://github.com/apache/datafusion-comet/issues/588#issuecomment-2239737484 > Hi @DamonZhao-sfu. For query 72, are you enabling CBO in Spark or using any form of join reordering or are you using the official version of the query that joins cata

[PR] Optimize CASE expression for usage where then and else values are literals [datafusion]

2024-07-19 Thread via GitHub
andygrove opened a new pull request, #11553: URL: https://github.com/apache/datafusion/pull/11553 ## Which issue does this PR close? Closes #. ## Rationale for this change ``` case_when: scalar or scalar time: [5.6794 ยตs 5.7

Re: [I] 1TB TPCDS benchmark over Vanilla Spark, suffer performance slowdown [datafusion-comet]

2024-07-19 Thread via GitHub
DamonZhao-sfu commented on issue #588: URL: https://github.com/apache/datafusion-comet/issues/588#issuecomment-2239746176 > @DamonZhao-sfu could you also provide the configs you used for the Spark run? I am seeing most queries running faster with Comet (but at 100GB) and would like to try

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-19 Thread via GitHub
andygrove commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2239779529 This proposal makes sense to me. Thanks for driving this @notfilippo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add String view helper functions [datafusion]

2024-07-19 Thread via GitHub
XiangpengHao commented on code in PR #11517: URL: https://github.com/apache/datafusion/pull/11517#discussion_r1684718754 ## datafusion/physical-expr/src/aggregate/min_max.rs: ## @@ -453,6 +454,14 @@ fn min_batch(values: &ArrayRef) -> Result { DataType::LargeUtf8 => {

Re: [PR] Add String view helper functions [datafusion]

2024-07-19 Thread via GitHub
XiangpengHao commented on PR #11517: URL: https://github.com/apache/datafusion/pull/11517#issuecomment-2239801320 I rebased the branch so that it only contains the relevant changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] docs: Update benchmark results [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove merged PR #687: URL: https://github.com/apache/datafusion-comet/pull/687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] 1TB TPCDS benchmark over Vanilla Spark, suffer performance slowdown [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove commented on issue #588: URL: https://github.com/apache/datafusion-comet/issues/588#issuecomment-2239844234 Thanks @DamonZhao-sfu. We just updated our [benchmarking guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html) with the currently recommended conf

[I] Add reservoir sampling [datafusion]

2024-07-19 Thread via GitHub
brancz opened a new issue, #11554: URL: https://github.com/apache/datafusion/issues/11554 ### Is your feature request related to a problem or challenge? We have a large sample of statistical data. All we need is a subset of the data that maintains statistical significance while being

[I] Document how to use Comet in Kubernetes environment [datafusion-comet]

2024-07-19 Thread via GitHub
comphead opened a new issue, #689: URL: https://github.com/apache/datafusion-comet/issues/689 ### What is the problem the feature request solves? The Comet has Dockerfile and usually users want to try the Comet within the Kube env. ### Describe the potential solution Pr

Re: [I] `spark.comet.memory.overhead.min` not respected when submitting jobs with Comet with Spark on Kubernetes [datafusion-comet]

2024-07-19 Thread via GitHub
comphead commented on issue #605: URL: https://github.com/apache/datafusion-comet/issues/605#issuecomment-2239917819 Depends on #689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] docs: Update percentage speedups in benchmarking guide [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove opened a new pull request, #691: URL: https://github.com/apache/datafusion-comet/pull/691 ## Which issue does this PR close? Closes #. ## Rationale for this change I used the wrong formula for calculating the speedups when updating the benchmark

Re: [PR] Add String view helper functions [datafusion]

2024-07-19 Thread via GitHub
alamb merged PR #11517: URL: https://github.com/apache/datafusion/pull/11517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] [Proposal] Decouple logical from physical types [datafusion]

2024-07-19 Thread via GitHub
alamb commented on issue #11513: URL: https://github.com/apache/datafusion/issues/11513#issuecomment-2239965592 > This issue originates from the fact that TableSource and TableProvider (the "native" sources of schemas) would have to return a DFSchema to include the LogicalPhysicalTypes

Re: [PR] Add support for Utf8View for date/temporal codepaths [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11518: URL: https://github.com/apache/datafusion/pull/11518#issuecomment-2239969446 Ok, sorry for the delay -- I just merged https://github.com/apache/arrow-rs/pull/6077 I think if you updated the pin to Cargo.toml here https://github.com/apache/datafusio

Re: [PR] doc: Add memory tuning section to user guide [datafusion-comet]

2024-07-19 Thread via GitHub
viirya commented on code in PR #684: URL: https://github.com/apache/datafusion-comet/pull/684#discussion_r1684893189 ## docs/source/user-guide/tuning.md: ## @@ -21,6 +21,17 @@ under the License. Comet provides some tuning options to help you get the best performance from you

Re: [PR] doc: Add memory tuning section to user guide [datafusion-comet]

2024-07-19 Thread via GitHub
viirya commented on code in PR #684: URL: https://github.com/apache/datafusion-comet/pull/684#discussion_r1684894745 ## docs/source/user-guide/tuning.md: ## @@ -21,6 +21,17 @@ under the License. Comet provides some tuning options to help you get the best performance from you

Re: [PR] Add ArrowBytesViewMap and ArrowBytesViewSet [datafusion]

2024-07-19 Thread via GitHub
XiangpengHao commented on PR #11515: URL: https://github.com/apache/datafusion/pull/11515#issuecomment-2239977162 I rebased the branch to reflect new changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Add ArrowBytesViewMap and ArrowBytesViewSet [datafusion]

2024-07-19 Thread via GitHub
XiangpengHao commented on code in PR #11515: URL: https://github.com/apache/datafusion/pull/11515#discussion_r1684898492 ## datafusion/physical-expr-common/src/binary_view_map.rs: ## @@ -0,0 +1,683 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more con

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11289: URL: https://github.com/apache/datafusion/pull/11289#issuecomment-2239982043 Thank you @Lordworms - sorry for the delay / runaround. I just haven't had a chance to focus on this PR. I was hoping someone with more structured type experience would be able to hel

Re: [PR] feat: consume and produce Substrait type extensions [datafusion]

2024-07-19 Thread via GitHub
alamb merged PR #11510: URL: https://github.com/apache/datafusion/pull/11510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add ArrowBytesViewMap and ArrowBytesViewSet [datafusion]

2024-07-19 Thread via GitHub
alamb commented on code in PR #11515: URL: https://github.com/apache/datafusion/pull/11515#discussion_r1684911508 ## datafusion/physical-expr-common/src/binary_view_map.rs: ## @@ -0,0 +1,683 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-07-19 Thread via GitHub
viirya commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2240013614 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] docs: Update percentage speedups in benchmarking guide [datafusion-comet]

2024-07-19 Thread via GitHub
andygrove merged PR #691: URL: https://github.com/apache/datafusion-comet/pull/691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Add ArrowBytesViewMap and ArrowBytesViewSet [datafusion]

2024-07-19 Thread via GitHub
XiangpengHao commented on PR #11515: URL: https://github.com/apache/datafusion/pull/11515#issuecomment-2240021874 > I wonder if you have some missing commits that haven't been pushed yet? It looks like you addressed the comments, but the code hasn't been changed. Sorry, I probably cho

Re: [PR] make unparser `Dialect` trait `Send` + `Sync` [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11504: URL: https://github.com/apache/datafusion/pull/11504#issuecomment-2240023933 ๐Ÿš€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] make unparser `Dialect` trait `Send` + `Sync` [datafusion]

2024-07-19 Thread via GitHub
alamb merged PR #11504: URL: https://github.com/apache/datafusion/pull/11504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] make unparser `Dialect` trait `Send` + `Sync` [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11504: URL: https://github.com/apache/datafusion/pull/11504#issuecomment-2240024079 Thanks again @y-f-u -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: unparser generates wrong sql for derived table with columns [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11505: URL: https://github.com/apache/datafusion/pull/11505#issuecomment-2240024758 Thanks again @y-f-u -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: unparser generates wrong sql for derived table with columns [datafusion]

2024-07-19 Thread via GitHub
alamb commented on code in PR #11505: URL: https://github.com/apache/datafusion/pull/11505#discussion_r1684920814 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -240,6 +240,35 @@ fn roundtrip_statement_with_dialect() -> Result<()> { parser_dialect: Box::new(Gen

Re: [PR] fix: unparser generates wrong sql for derived table with columns [datafusion]

2024-07-19 Thread via GitHub
alamb merged PR #11505: URL: https://github.com/apache/datafusion/pull/11505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Prevent bigger files from being checked in [datafusion]

2024-07-19 Thread via GitHub
alamb commented on PR #11508: URL: https://github.com/apache/datafusion/pull/11508#issuecomment-2240031572 > @alamb there is a small side effect described above which may give a false positive and basically may require PR recreate, which doesn't seem a big problem for me but for large comp

  1   2   >