[I] UNION ALL should strip table identifiers in its resulting schema [datafusion]

2024-05-29 Thread via GitHub
phillipleblanc opened a new issue, #10706: URL: https://github.com/apache/datafusion/issues/10706 ### Describe the bug The schema that is the result of a UNION ALL should not have any table qualifiers, as the table information has effectively been erased and is no longer a valid refe

[PR] Strip table qualifiers from schema in `UNION ALL` [datafusion]

2024-05-29 Thread via GitHub
phillipleblanc opened a new pull request, #10707: URL: https://github.com/apache/datafusion/pull/10707 ## Which issue does this PR close? Closes #10706 ## Rationale for this change The schema that is the result of a UNION ALL should not have any table qualifiers, as

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1618312815 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean decod

[PR] Separate proto partitioning [datafusion]

2024-05-29 Thread via GitHub
lewiszlw opened a new pull request, #10708: URL: https://github.com/apache/datafusion/pull/10708 ## Which issue does this PR close? Closes #. ## Rationale for this change Separate proto partitioning so it can be used in user defined shuffle operator.

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1618321281 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean decod

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-29 Thread via GitHub
waynexia commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1618334298 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { &self.cache } -fn children(&self)

Re: [PR] Strip table qualifiers from schema in `UNION ALL` [datafusion]

2024-05-29 Thread via GitHub
phillipleblanc commented on code in PR #10707: URL: https://github.com/apache/datafusion/pull/10707#discussion_r1618338309 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1373,33 +1373,32 @@ pub fn union(left_plan: LogicalPlan, right_plan: LogicalPlan) -> Resulthttps://g

Re: [PR] Strip table qualifiers from schema in `UNION ALL` [datafusion]

2024-05-29 Thread via GitHub
phillipleblanc commented on code in PR #10707: URL: https://github.com/apache/datafusion/pull/10707#discussion_r1618341153 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1373,33 +1373,32 @@ pub fn union(left_plan: LogicalPlan, right_plan: LogicalPlan) -> Result

Re: [PR] Minor: improve Expr documentation [datafusion]

2024-05-29 Thread via GitHub
jonahgao merged PR #10685: URL: https://github.com/apache/datafusion/pull/10685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-29 Thread via GitHub
Blizzara commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1618379915 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1160,6 +1162,24 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result { +

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-29 Thread via GitHub
waynexia commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1618465409 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1160,6 +1162,24 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result { +

Re: [PR] Minor: Add examples of using TreeNode with `LogicalPlan` [datafusion]

2024-05-29 Thread via GitHub
jonahgao commented on code in PR #10687: URL: https://github.com/apache/datafusion/pull/10687#discussion_r1618480297 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -56,19 +56,139 @@ use crate::logical_plan::tree_node::unwrap_arc; pub use datafusion_common::display::{PlanTyp

Re: [PR] Minor: Add examples of using TreeNode with `LogicalPlan` [datafusion]

2024-05-29 Thread via GitHub
jonahgao commented on code in PR #10687: URL: https://github.com/apache/datafusion/pull/10687#discussion_r1618479343 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -56,19 +56,139 @@ use crate::logical_plan::tree_node::unwrap_arc; pub use datafusion_common::display::{PlanTyp

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-29 Thread via GitHub
waynexia commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1618540788 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1160,6 +1162,24 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result { +

Re: [PR] Fix typo in bench.sh [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10698: URL: https://github.com/apache/datafusion/pull/10698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: align re-exports in functions-aggregate [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10705: URL: https://github.com/apache/datafusion/pull/10705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1618607123 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given jo

Re: [PR] Fix incorrect statistics read for unsigned integers columns in parquet [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10704: URL: https://github.com/apache/datafusion/pull/10704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Incorrect statistics read for unsigned integer columns in parquet [datafusion]

2024-05-29 Thread via GitHub
alamb closed issue #10604: Incorrect statistics read for unsigned integer columns in parquet URL: https://github.com/apache/datafusion/issues/10604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[I] Google Cloud Storage requests during query execution being performed in series [datafusion]

2024-05-29 Thread via GitHub
davidhewitt opened a new issue, #10709: URL: https://github.com/apache/datafusion/issues/10709 ### Describe the bug I'm experimenting with datafusion 38 (using as a library, not the cli) to query against parquet files in Google Cloud Storage. For now I'm doing a basic test, just `sel

Re: [I] Google Cloud Storage requests during query execution being performed in series [datafusion]

2024-05-29 Thread via GitHub
samuelcolvin commented on issue #10709: URL: https://github.com/apache/datafusion/issues/10709#issuecomment-2137048388 Thanks @davidhewitt, one extra note - if it would help anyone looking into this to see the Logfire live view of the trace, as well as the screenshot, happy to send some dat

Re: [PR] Push down filter as table partition list prefix [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10693: URL: https://github.com/apache/datafusion/pull/10693#discussion_r1618624379 ## datafusion/core/src/datasource/listing/helpers.rs: ## @@ -675,4 +764,127 @@ mod tests { // this helper function assert!(expr_applicable_for_cols

Re: [PR] Separate `Partitioning` protobuf serialization code [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10708: URL: https://github.com/apache/datafusion/pull/10708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Integrate with the substrait integration test [datafusion]

2024-05-29 Thread via GitHub
alamb opened a new issue, #10710: URL: https://github.com/apache/datafusion/issues/10710 ### Is your feature request related to a problem or challenge? In https://github.com/apache/datafusion/pull/10653, @Blizzara added the beginnings of testing for substrait plans that came from othe

Re: [PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-29 Thread via GitHub
alamb commented on PR #10653: URL: https://github.com/apache/datafusion/pull/10653#issuecomment-2137098889 > > > though curious if you have any thoughts on how to make it simpler - given DF doesn't yet produce those compound names, I cannot use a roundtrip test, and writing the substrait ma

Re: [PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-29 Thread via GitHub
alamb commented on PR #10653: URL: https://github.com/apache/datafusion/pull/10653#issuecomment-2137099101 Thanks again everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10653: URL: https://github.com/apache/datafusion/pull/10653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Substrait integration doesn't recognize typed functions [datafusion]

2024-05-29 Thread via GitHub
alamb closed issue #10412: Substrait integration doesn't recognize typed functions URL: https://github.com/apache/datafusion/issues/10412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Minor: Add examples of using TreeNode with `Expr` [datafusion]

2024-05-29 Thread via GitHub
alamb commented on PR #10686: URL: https://github.com/apache/datafusion/pull/10686#issuecomment-2137100092 Thanks for the review @jonahgao 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Minor: Add examples of using TreeNode with `LogicalPlan` [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10687: URL: https://github.com/apache/datafusion/pull/10687#discussion_r1618661393 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -56,19 +56,139 @@ use crate::logical_plan::tree_node::unwrap_arc; pub use datafusion_common::display::{PlanType,

Re: [PR] Minor: Add examples of using TreeNode with `Expr` [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10686: URL: https://github.com/apache/datafusion/pull/10686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Suport unparsing `LogicalPlan::Window` to SQL [datafusion]

2024-05-29 Thread via GitHub
alamb commented on issue #10664: URL: https://github.com/apache/datafusion/issues/10664#issuecomment-2137108974 > In my understanding, window function is a special case of aggregate function, why would these two have different plan struct? I think window funtions and aggregate functio

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-29 Thread via GitHub
xinlifoobar commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2137115446 @alamb just hint #10605 is also closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Support "Tracing" / Spans [datafusion]

2024-05-29 Thread via GitHub
alamb commented on issue #9415: URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2137119619 Since the the internal metrics in datafusion -- aka https://docs.rs/datafusion/latest/datafusion/physical_plan/metrics/index.html have start/stop timestamps on them already, we have

[PR] Try to Improve performance of extracting statistics from parquet files [datafusion]

2024-05-29 Thread via GitHub
xinlifoobar opened a new pull request, #10711: URL: https://github.com/apache/datafusion/pull/10711 ## Which issue does this PR close? Closes #10626 ## Rationale for this change ## What changes are included in this PR? Replace the `get_statatistics`

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10600: URL: https://github.com/apache/datafusion/pull/10600#discussion_r1618678318 ## datafusion/core/src/datasource/stream.rs: ## @@ -103,19 +105,46 @@ impl FromStr for StreamEncoding { } } -/// The configuration for a [`StreamTable`] +///

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-29 Thread via GitHub
ozankabak commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2137168944 @berkaysynnada PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
Dandandan commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1618726396 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the give

[I] `COPY ... PARTITIONED BY` with parquet causes "out of bounds" panic [datafusion]

2024-05-29 Thread via GitHub
samuelcolvin opened a new issue, #10712: URL: https://github.com/apache/datafusion/issues/10712 ### Describe the bug While investigating #10709, I tried using datafusion CLI to require parquet files to a better size. But I got a panic: ``` thread 'tokio-runtime-worker

Re: [PR] Minor: Add examples of using TreeNode with `LogicalPlan` [datafusion]

2024-05-29 Thread via GitHub
alamb merged PR #10687: URL: https://github.com/apache/datafusion/pull/10687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Add examples of using TreeNode with `LogicalPlan` [datafusion]

2024-05-29 Thread via GitHub
alamb commented on PR #10687: URL: https://github.com/apache/datafusion/pull/10687#issuecomment-2137240615 Thanks again for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Convert variance sample to udaf [datafusion]

2024-05-29 Thread via GitHub
yyin-dev opened a new pull request, #10713: URL: https://github.com/apache/datafusion/pull/10713 ## Which issue does this PR close? Closes #10667 . ## Are these changes tested? - [x] New proto roundtrip tests. - [ ] Moved tests into sqllogictests. -- This is

Re: [PR] Reduce repetition in math and functions modules with macros [datafusion]

2024-05-29 Thread via GitHub
jayzhan211 commented on code in PR #10700: URL: https://github.com/apache/datafusion/pull/10700#discussion_r1618774900 ## datafusion/functions/src/macros.rs: ## @@ -59,6 +59,30 @@ macro_rules! export_functions { }; } +macro_rules! make_function { +// single vector ar

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1618607123 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given jo

Re: [PR] feature: Add a WindowUDFImpl::simplify() API [datafusion]

2024-05-29 Thread via GitHub
jayzhan211 commented on code in PR #9906: URL: https://github.com/apache/datafusion/pull/9906#discussion_r1618798950 ## datafusion-examples/examples/simplify_udwf_expression.rs: ## @@ -0,0 +1,142 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10711: URL: https://github.com/apache/datafusion/pull/10711#discussion_r1618800964 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -52,131 +55,311 @@ fn sign_extend_be(b: &[u8]) -> [u8; 16] { result } -/// Extra

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-29 Thread via GitHub
waynexia commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1618822660 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1160,6 +1162,24 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result { +

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-29 Thread via GitHub
Blizzara commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1618834391 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1160,6 +1162,24 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result { +

Re: [I] [EPIC] Tracking issue of support substrait logical plan [datafusion]

2024-05-29 Thread via GitHub
Blizzara commented on issue #8149: URL: https://github.com/apache/datafusion/issues/8149#issuecomment-2137345554 Values (and EmptyTable) is supported as of #10531 :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [EPIC] Tracking issue of support substrait logical plan [datafusion]

2024-05-29 Thread via GitHub
waynexia commented on issue #8149: URL: https://github.com/apache/datafusion/issues/8149#issuecomment-2137362019 > Values (and EmptyTable) is supported as of #10531 :) Updated. Thanks for implementing it! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files [datafusion]

2024-05-29 Thread via GitHub
xinlifoobar commented on code in PR #10711: URL: https://github.com/apache/datafusion/pull/10711#discussion_r1618850434 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -211,32 +394,25 @@ pub(crate) fn min_statistics<'a, I: Iterator Result { -let

Re: [PR] Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files [datafusion]

2024-05-29 Thread via GitHub
xinlifoobar commented on code in PR #10711: URL: https://github.com/apache/datafusion/pull/10711#discussion_r1618850434 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -211,32 +394,25 @@ pub(crate) fn min_statistics<'a, I: Iterator Result { -let

Re: [PR] Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files [datafusion]

2024-05-29 Thread via GitHub
xinlifoobar commented on code in PR #10711: URL: https://github.com/apache/datafusion/pull/10711#discussion_r1618850434 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -211,32 +394,25 @@ pub(crate) fn min_statistics<'a, I: Iterator Result { -let

Re: [PR] Convert variance sample to udaf [datafusion]

2024-05-29 Thread via GitHub
yyin-dev commented on PR #10713: URL: https://github.com/apache/datafusion/pull/10713#issuecomment-2137394530 @jayzhan211 I'm working on a change, but can you help me understand the semantics here: ``` # csv_query_distinct_variance query R SELECT var(distinct c2) FROM aggreg

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1618887167 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean decod

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1618887167 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean decod

Re: [PR] Profile spark3.5.1 and centos7 for compatible on spark 3.5.1 and centos7 old glic 2.7 [datafusion-comet]

2024-05-29 Thread via GitHub
xhumanoid commented on code in PR #491: URL: https://github.com/apache/datafusion-comet/pull/491#discussion_r1618965703 ## spark/src/main/scala/org/apache/comet/parquet/CometParquetFileFormat.scala.old: ## Review Comment: do we really need to keep old files? maybe you c

[PR] Introduce FunctionRegistry dependency to optimize and rewrite rule [datafusion]

2024-05-29 Thread via GitHub
jayzhan211 opened a new pull request, #10714: URL: https://github.com/apache/datafusion/pull/10714 ## Which issue does this PR close? Closes #10703. ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

Re: [I] Google Cloud Storage requests during query execution being performed in series [datafusion]

2024-05-29 Thread via GitHub
davidhewitt closed issue #10709: Google Cloud Storage requests during query execution being performed in series URL: https://github.com/apache/datafusion/issues/10709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Google Cloud Storage requests during query execution being performed in series [datafusion]

2024-05-29 Thread via GitHub
davidhewitt commented on issue #10709: URL: https://github.com/apache/datafusion/issues/10709#issuecomment-2137573303 Ok great, so this is happily user error! After digging around with the query plans and understanding that each file group is loaded sequentially by design, I found `L

Re: [I] UNION ALL not correctly projects the floating numbers [datafusion]

2024-05-29 Thread via GitHub
goldmedal commented on issue #10688: URL: https://github.com/apache/datafusion/issues/10688#issuecomment-2137613195 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-29 Thread via GitHub
vaibhawvipul commented on code in PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#discussion_r1618206450 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1469,5 +1469,36 @@ class CometExpressionSuite extends CometTestBase with Adapti

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-29 Thread via GitHub
berkaysynnada commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2137637228 I will review it in detail tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
comphead commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1619055381 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
comphead commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1619056151 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given

Re: [PR] Convert variance sample to udaf [datafusion]

2024-05-29 Thread via GitHub
jayzhan211 commented on PR #10713: URL: https://github.com/apache/datafusion/pull/10713#issuecomment-2137673821 > @jayzhan211 I'm working on a change, but can you help me understand the semantics here: > > ``` > # csv_query_distinct_variance > query R > SELECT var(distinct c2

[PR] chore: Use Spark's ParquetFilters [datafusion-comet]

2024-05-29 Thread via GitHub
huaxingao opened a new pull request, #492: URL: https://github.com/apache/datafusion-comet/pull/492 ## Which issue does this PR close? Closes [#36](https://github.com/apache/datafusion-comet/issues/36). ## Rationale for this change ## What changes are incl

Re: [PR] Introduce FunctionRegistry dependency to optimize and rewrite rule [datafusion]

2024-05-29 Thread via GitHub
jayzhan211 commented on code in PR #10714: URL: https://github.com/apache/datafusion/pull/10714#discussion_r1619082432 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -163,53 +173,3 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { Some(BottomUp)

Re: [PR] Introduce FunctionRegistry dependency to optimize and rewrite rule [datafusion]

2024-05-29 Thread via GitHub
jayzhan211 commented on code in PR #10714: URL: https://github.com/apache/datafusion/pull/10714#discussion_r1619081626 ## datafusion/optimizer/src/lib.rs: ## @@ -56,7 +56,7 @@ pub mod single_distinct_to_groupby; pub mod unwrap_cast_in_comparison; pub mod utils; -#[cfg(test)]

Re: [I] UNION ALL not correctly projects the floating numbers [datafusion]

2024-05-29 Thread via GitHub
viirya commented on issue #10688: URL: https://github.com/apache/datafusion/issues/10688#issuecomment-2137699009 > Thats weird. > > > explain select -128.2::float union all select -128.2; I think it is because the first `-128.2` is read as float 32, then treated as float 64. I

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
viirya commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1619096578 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given j

Re: [I] UNION ALL not correctly projects the floating numbers [datafusion]

2024-05-29 Thread via GitHub
goldmedal commented on issue #10688: URL: https://github.com/apache/datafusion/issues/10688#issuecomment-2137711858 > I think it is because the first `-128.2` is read as float 32, then treated as float 64. I said "treated" because I don't see there is cast expression. It may be happened whe

Re: [I] UNION ALL not correctly projects the floating numbers [datafusion]

2024-05-29 Thread via GitHub
goldmedal commented on issue #10688: URL: https://github.com/apache/datafusion/issues/10688#issuecomment-2137807135 I wanted to share something I found. I tried to observe the plan transformation step by step: sql_to_plan -> analyze -> optimize ``` SQL: select 128.2 union all selec

Re: [PR] chore: Use Spark's ParquetFilters [datafusion-comet]

2024-05-29 Thread via GitHub
huaxingao commented on PR #492: URL: https://github.com/apache/datafusion-comet/pull/492#issuecomment-2137815303 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-29 Thread via GitHub
vaibhawvipul commented on code in PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#discussion_r1619176000 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1469,5 +1469,36 @@ class CometExpressionSuite extends CometTestBase with Adapti

Re: [PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-29 Thread via GitHub
vaibhawvipul commented on PR #471: URL: https://github.com/apache/datafusion-comet/pull/471#issuecomment-2137819580 @kazuyukitanimura requesting a review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Add "Comet Fuzz" fuzz-testing utility [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on PR #472: URL: https://github.com/apache/datafusion-comet/pull/472#issuecomment-2137823739 Thanks for the review @kazuyukitanimura. I have addressed your comments and also added support for decimal type. I removed the TODO comments and added a roadmap section to the R

Re: [PR] chore: Use Spark's ParquetFilters [datafusion-comet]

2024-05-29 Thread via GitHub
huaxingao closed pull request #492: chore: Use Spark's ParquetFilters URL: https://github.com/apache/datafusion-comet/pull/492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
sunchao commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1619195722 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean deco

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619197931 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619208014 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619210130 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619211189 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_cou

[PR] build: Enable comet tests with spark-4.0 profile [datafusion-comet]

2024-05-29 Thread via GitHub
kazuyukitanimura opened a new pull request, #493: URL: https://github.com/apache/datafusion-comet/pull/493 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/372 ## Rationale for this change To be ready for Spark

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619233974 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_count_

Re: [PR] Convert variance sample to udaf [datafusion]

2024-05-29 Thread via GitHub
yyin-dev commented on PR #10713: URL: https://github.com/apache/datafusion/pull/10713#issuecomment-2137921026 > > @jayzhan211 I'm working on a change, but can you help me understand the semantics here: > > ``` > > # csv_query_distinct_variance > > query R > > SELECT var(distinct

Re: [I] bug: CAST timestamp to string ignores timezone prior to Spark 3.4 [datafusion-comet]

2024-05-29 Thread via GitHub
parthchandra commented on issue #468: URL: https://github.com/apache/datafusion-comet/issues/468#issuecomment-2137936443 IIRC there were differences in output between Spark 3.2 and Spark 3.4 for the timestamp_ntz type. Taking a closer look, the definition of timestamp_ntz (in Spark) ess

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619275258 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1619275529 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,475 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [x] any + - [x] any_value + - [ ] approx_cou

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-29 Thread via GitHub
Abdullahsab3 commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2137977511 I was looking into this issue again. I am still validating this, but I found that using `to_char(time AT TIME ZONE 'UTC' AT TIME ZONE 'Europe/Brussels', '%F %X')` also resu

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1619295940 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean decod

Re: [PR] chore: Removing copying data from dictionary values into CometDictionary [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #490: URL: https://github.com/apache/datafusion-comet/pull/490#discussion_r1619295940 ## common/src/main/java/org/apache/comet/vector/CometDictionary.java: ## @@ -59,121 +46,57 @@ public ValueVector getValueVector() { } public boolean decod

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
viirya commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1619347294 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given j

Re: [PR] Make swap_hash_join public API [datafusion]

2024-05-29 Thread via GitHub
viirya commented on code in PR #10702: URL: https://github.com/apache/datafusion/pull/10702#discussion_r1619353656 ## datafusion/core/src/physical_optimizer/join_selection.rs: ## @@ -157,7 +157,9 @@ fn swap_join_projection( } /// This function swaps the inputs of the given j

Re: [PR] feat: Add HashJoin support for BuildRight [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #437: URL: https://github.com/apache/datafusion-comet/pull/437#discussion_r1619362298 ## core/src/execution/datafusion/planner.rs: ## @@ -978,7 +979,15 @@ impl PhysicalPlanner { // `EqualNullSafe`, Spark will rewrite it duri

[PR] RFC: Prototype statistics extraction iterators [datafusion]

2024-05-29 Thread via GitHub
alamb opened a new pull request, #10715: URL: https://github.com/apache/datafusion/pull/10715 ## Which issue does this PR close? re https://github.com/apache/datafusion/issues/10626 ## Rationale for this change I wanted to prototype another approach that @xinlifo

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-29 Thread via GitHub
andygrove commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1619370071 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,296 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-29 Thread via GitHub
viirya commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1619376557 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,296 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

Re: [PR] RFC: Prototype statistics extraction iterators [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10715: URL: https://github.com/apache/datafusion/pull/10715#discussion_r1619383361 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -211,9 +287,29 @@ pub(crate) fn min_statistics<'a, I: Iterator Result { -let scalars

Re: [PR] Avoid the usage of intermediate ScalarValue to improve performance of extracting statistics from parquet files [datafusion]

2024-05-29 Thread via GitHub
alamb commented on code in PR #10711: URL: https://github.com/apache/datafusion/pull/10711#discussion_r1619384338 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -211,32 +394,25 @@ pub(crate) fn min_statistics<'a, I: Iterator Result { -let scalar

  1   2   >