Re: [PR] bug fix: Fix fuzz testcase for cast string to integer [datafusion-comet]

2024-05-20 Thread via GitHub
vaibhawvipul commented on code in PR #450: URL: https://github.com/apache/datafusion-comet/pull/450#discussion_r1607440985 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -533,11 +533,16 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHe

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-20 Thread via GitHub
comphead merged PR #331: URL: https://github.com/apache/datafusion-comet/pull/331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-20 Thread via GitHub
comphead commented on PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#issuecomment-2121507170 Thanks everyone for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-20 Thread via GitHub
comphead opened a new pull request, #455: URL: https://github.com/apache/datafusion-comet/pull/455 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes te

Re: [PR] Use stabalized aws-sdk and clap versions in `datafusion-cli` [datafusion]

2024-05-20 Thread via GitHub
github-actions[bot] commented on PR #9659: URL: https://github.com/apache/datafusion/pull/9659#issuecomment-2121548399 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Count distinct support multiple expressions [datafusion]

2024-05-20 Thread via GitHub
github-actions[bot] closed pull request #5939: Count distinct support multiple expressions URL: https://github.com/apache/datafusion/pull/5939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] bug: CAST string to integer does not handle all invalid inputs [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove closed issue #431: bug: CAST string to integer does not handle all invalid inputs URL: https://github.com/apache/datafusion-comet/issues/431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: Enable cast string to int tests and fix compatibility issue [datafusion-comet]

2024-05-20 Thread via GitHub
andygrove merged PR #453: URL: https://github.com/apache/datafusion-comet/pull/453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
advancedxy commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607484041 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.nextLong()

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-20 Thread via GitHub
advancedxy commented on issue #389: URL: https://github.com/apache/datafusion-comet/issues/389#issuecomment-2121583062 Java 11 is excluded on purpose on pull request event as there are already too much combinations. Java8 and Java17 with Spark 3.4 on Linux are tested, which should be suffi

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-20 Thread via GitHub
advancedxy commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1607506665 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1452,17 +1452,55 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

[PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-20 Thread via GitHub
xinlifoobar opened a new pull request, #10593: URL: https://github.com/apache/datafusion/pull/10593 ## Which issue does this PR close? Closes #10587 ## Rationale for this change This is to fix a bug when reading a Date32 or Date64 column from a parquet f

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-20 Thread via GitHub
lewiszlw commented on PR #10576: URL: https://github.com/apache/datafusion/pull/10576#issuecomment-2121613747 The `CommonSubexprEliminate` rule has not been migrated yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Fix compilation of datafusion-cli on 32bit targets [datafusion]

2024-05-20 Thread via GitHub
nathaniel-daniel opened a new pull request, #10594: URL: https://github.com/apache/datafusion/pull/10594 ## Which issue does this PR close? Closes #10552. ## Rationale for this change This PR fixes compilation of the datafusion-cli crate on 32bit targets. ## What changes

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-20 Thread via GitHub
lewiszlw commented on code in PR #10575: URL: https://github.com/apache/datafusion/pull/10575#discussion_r1607537219 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -76,27 +76,20 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// For example: `TopK:

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-20 Thread via GitHub
lewiszlw commented on code in PR #10575: URL: https://github.com/apache/datafusion/pull/10575#discussion_r1607537742 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -76,27 +76,20 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// For example: `TopK:

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-20 Thread via GitHub
lewiszlw commented on PR #10575: URL: https://github.com/apache/datafusion/pull/10575#issuecomment-2121658693 If we want to apply same change to `UserDefinedLogicalNodeCore` trait, we shoud add Sized trait bound to it, do you need me to change it in this pr? @alamb -- This is an automat

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1607637374 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1607637374 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [I] Support unnest for struct data type [datafusion]

2024-05-20 Thread via GitHub
duongcongtoai commented on issue #10264: URL: https://github.com/apache/datafusion/issues/10264#issuecomment-212133 Please help me review this PR everyone https://github.com/apache/datafusion/pull/10429 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1607685866 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1452,17 +1452,55 @@ class CometExpressionSuite extends CometTestBase with A

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607689388 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.next

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607690262 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.next

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-20 Thread via GitHub
advancedxy commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1607703859 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or m

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
advancedxy commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607709720 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.nextLong()

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
advancedxy commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607716091 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.nextLong()

Re: [I] Implement a way to preserve partitioning through `UnionExec` without losing ordering [datafusion]

2024-05-21 Thread via GitHub
xinlifoobar commented on issue #10314: URL: https://github.com/apache/datafusion/issues/10314#issuecomment-2122201906 Hi @alamb, found another interesting case while testing, do you think this could apply `InterleaveExec` with same order by sets? ``` explain select count(*) from (

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on PR #10531: URL: https://github.com/apache/datafusion/pull/10531#issuecomment-2122279768 @jonahgao @alamb I think this is ready by me :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608039288 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [PR] Minor: Fix `ArrayFunctionRewriter` name reporting [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10581: URL: https://github.com/apache/datafusion/pull/10581#issuecomment-2122285943 Thanks for the review @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Minor: Fix `ArrayFunctionRewriter` name reporting [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10581: URL: https://github.com/apache/datafusion/pull/10581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10575: URL: https://github.com/apache/datafusion/pull/10575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #10571: UserDefindedLogicalNode::from_template does not return a Result<...>. URL: https://github.com/apache/datafusion/issues/10571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10575: URL: https://github.com/apache/datafusion/pull/10575#discussion_r1608050237 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -76,27 +76,31 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// For example: `TopK: k=

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10576: URL: https://github.com/apache/datafusion/pull/10576#issuecomment-2122297757 > The `CommonSubexprEliminate` rule has not been migrated yet. That is a good point -- though we could do something like `#[allow(deprecated)]` until it is. I hope to work on it

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10576: URL: https://github.com/apache/datafusion/pull/10576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Migrate testing optimizer rules to use `rewrite` API [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10576: URL: https://github.com/apache/datafusion/pull/10576#issuecomment-2122298090 Thanks again @lewiszlw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608069310 ## datafusion/expr/src/udf.rs: ## @@ -426,6 +467,59 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { false } +/// Computes the output interval f

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608079456 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -0,0 +1,241 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
alamb closed pull request #10117: improve monotonicity api URL: https://github.com/apache/datafusion/pull/10117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2122333403 I believe this PR has been superceded by https://github.com/apache/datafusion/pull/10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing thi

Re: [I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #9879: Request: Improve Monotoniciy API URL: https://github.com/apache/datafusion/issues/9879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #9879: URL: https://github.com/apache/datafusion/issues/9879#issuecomment-2122335700 https://github.com/apache/datafusion/pull/10504 introduces a new API for boundary propagation, so the API challenge described in this ticket is no longer relevant. Thus closing --

[I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new issue, #9879: URL: https://github.com/apache/datafusion/issues/9879 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/arrow-datafusion/pull/9869 from @tinfoil-knight I was confused about the [`ScalarUDFImpl::mo

Re: [I] Request: Improve Monotoniciy API [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #9879: Request: Improve Monotoniciy API URL: https://github.com/apache/datafusion/issues/9879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608102197 ## datafusion/expr/src/udf.rs: ## @@ -426,6 +467,59 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { false } +/// Computes the output in

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608112791 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -0,0 +1,241 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608116040 ## datafusion/expr/src/udf.rs: ## @@ -426,6 +467,59 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { false } +/// Computes the output interval f

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2122357671 > I believe this PR has been superceded by #10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR > > Thanks for the

Re: [PR] improve monotonicity api [datafusion]

2024-05-21 Thread via GitHub
tinfoil-knight commented on PR #10117: URL: https://github.com/apache/datafusion/pull/10117#issuecomment-2122366778 No worries @berkaysynnada and @alamb. It's difficult to keep track of everyone's work in such a large project. It was fun to work on this and I learnt a few things from

Re: [I] Cast String to Date ANSI Mode - Spark 3.2 - Mismatch between Spark and Comet Errors [datafusion-comet]

2024-05-21 Thread via GitHub
vidyasankarv commented on issue #440: URL: https://github.com/apache/datafusion-comet/issues/440#issuecomment-2122378853 > Is this an issue of just a mismatch between error messages? Or is the cast actually not doing the right thing with Spark 3.2? Is an issue with mismatch between e

[PR] tsaucer/run TPC-H examples in CI [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer opened a new pull request, #711: URL: https://github.com/apache/datafusion-python/pull/711 # Which issue does this PR close? Closes #696 # Rationale for this change This PR sets up a work flow to generate TPH-C 1Gb data set in CI, runs the 22 examples, and com

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer closed pull request #710: Tsaucer/prepare tpch examples for ci URL: https://github.com/apache/datafusion-python/pull/710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer commented on PR #710: URL: https://github.com/apache/datafusion-python/pull/710#issuecomment-2122385949 Closing in favor of https://github.com/apache/datafusion-python/pull/711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] tsaucer/run TPC-H examples in CI [datafusion-python]

2024-05-21 Thread via GitHub
timsaucer commented on PR #711: URL: https://github.com/apache/datafusion-python/pull/711#issuecomment-2122437247 @Michael-J-Ward It looks like we have a _potential_ regression between 37.1.0 and 38.0.0. Namely `substr` on 37.1.0 would accept a start and length, the parameters that should

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1608186038 ## datafusion/sql/Cargo.toml: ## @@ -47,6 +47,7 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true, default-features = true } datafu

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1608186725 ## datafusion/sql/src/unparser/expr.rs: ## @@ -504,6 +508,14 @@ impl Unparser<'_> { .collect::>>() } +pub(super) fn new_ident_quoted_if_n

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2122464482 > Thanks @goldmedal I'm thinking how this will work with whitespaces columns like > > ``` > select 1 as "a a"; > ``` Thanks @comphead :) I'm not sure what you

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2122480005 I'm not sure but I think we can merge #10573 first because it also fix many unpasring tests. Then, I'll create PR for sqlparser to add the check rule in dialect. -- This is

[I] Expand Test Coverage for ScalarUDF's [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada opened a new issue, #10595: URL: https://github.com/apache/datafusion/issues/10595 ### Is your feature request related to a problem or challenge? After merging PR #10504, a new file [monotonicity.rs](https://github.com/apache/datafusion/blob/main/datafusion/functions/src

[PR] Rename monotonicity as output_ordering in ScalarUDF's [datafusion]

2024-05-21 Thread via GitHub
berkaysynnada opened a new pull request, #10596: URL: https://github.com/apache/datafusion/pull/10596 ## Which issue does this PR close? Closes #. ## Rationale for this change The signature and usage of the monotonicity API have significantly changed. The

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-21 Thread via GitHub
aditanase commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2122637271 @alamb thanks for the very quick reply! Just tested with `datafusion-cli`, you're right that it's working. I was trying from a test deployment of Ballista. Will add the object

[PR] Improve `UserDefinedLogicalNodeCore::from_template` API to return Result [datafusion]

2024-05-21 Thread via GitHub
lewiszlw opened a new pull request, #10597: URL: https://github.com/apache/datafusion/pull/10597 ## Which issue does this PR close? follow up of https://github.com/apache/datafusion/pull/10575. ## Rationale for this change ## What changes are included in t

Re: [PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-21 Thread via GitHub
crepererum commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1608400609 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -605,9 +611,6 @@ async fn test_dates_32_diff_rg_sizes() { .run("date32"); } -// BUG: same as

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608408244 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,7 +1404,84 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608413505 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,6 +1407,56 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608431571 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -505,7 +507,35 @@ pub async fn from_substrait_rel( _ => Ok(t), }

[I] Make the configuration for `StreamTable` more generic to support more stream sources [datafusion]

2024-05-21 Thread via GitHub
matthewmturner opened a new issue, #10599: URL: https://github.com/apache/datafusion/issues/10599 ### Is your feature request related to a problem or challenge? I am working on a websocket `TableProvider` and initially I went about creating my own `TableProvider` but then after review

[PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
matthewmturner opened a new pull request, #10600: URL: https://github.com/apache/datafusion/pull/10600 ## Which issue does this PR close? Closes #10599 ## Rationale for this change ## What changes are included in this PR? ## Are these chang

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
matthewmturner commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2122787433 @metesynnada @mustafasrepo i believe you were both involved in the `StreamTable` implementation so im interested in getting your views if this is going in the right direction

[PR] Add to_date function to scalar functions doc [datafusion]

2024-05-21 Thread via GitHub
Omega359 opened a new pull request, #10601: URL: https://github.com/apache/datafusion/pull/10601 ## Which issue does this PR close? Closes #10461 ## Rationale for this change Adding missing documentation ## What changes are included in this PR? doc

[I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new issue, #10602: URL: https://github.com/apache/datafusion/issues/10602 ### Is your feature request related to a problem or challenge? Broken out from @Abdullahsab3's great ticket https://github.com/apache/datafusion/issues/10368 We would like to apply date

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122811703 The way you can perform this binning in postgres is somewhat paradoxically to convert a timestamp with a timezone back to a timestamp without timezone and then apply `date_bin`.

Re: [PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-21 Thread via GitHub
edmondop commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1608470231 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -75,6 +75,12 @@ macro_rules! get_statistic { *scale,

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122823905 If we cast using arrow_cast back to `Timestamp(Nanosecond, None)` the binning does appear to work correctly ```sql > create or replace view t_roundtrip as select arrow_ca

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-21 Thread via GitHub
peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2122829132 > What do we think about merging this PR and filing a follow on ticket to unify the APIs? I'm ok with merging the current state of the PR. But I was also thinking about how

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608493460 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608498362 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_cou

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
andygrove commented on PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2122853631 This is very cool @comphead but it looks like it is not detecting any of the aggregate functions that we support? -- This is an automated message from the Apache Git Service. T

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608510461 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-21 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2122870652 @timsaucer I have fixed the calls of `expr::WindowFunction` to meet the changes and add tests for those window functions in `dataframe_functions.rs`. Let me know if I missed a

[PR] Implement Unparser for `UNION ALL` [datafusion]

2024-05-21 Thread via GitHub
phillipleblanc opened a new pull request, #10603: URL: https://github.com/apache/datafusion/pull/10603 ## Which issue does this PR close? It doesn't close this issue, but is part of the work for #9494 ## Rationale for this change Adds support for turning LogicalPlans that

Re: [PR] build: bump spark version to 3.4.3 [datafusion-comet]

2024-05-21 Thread via GitHub
codecov-commenter commented on PR #292: URL: https://github.com/apache/datafusion-comet/pull/292#issuecomment-2122931674 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/292?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[I] Incorrect statistics read for unsigned integer columns in parquet [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN opened a new issue, #10604: URL: https://github.com/apache/datafusion/issues/10604 ### Describe the bug I found this bug while adding tests for reading parquet statistics https://github.com/apache/datafusion/pull/10592/. Instead of getting corresponding UInt8Array, UInt16Arr

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608584669 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122959190 Given the statement in the description, here is the best I can come up with using `arrow_cast` ```sql -- Times in brussels WITH t_brussels AS ( SELECT c

[PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao opened a new pull request, #456: URL: https://github.com/apache/datafusion-comet/pull/456 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608584669 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608597228 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

[I] Incorrect statistics read for binary columns in parquet [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN opened a new issue, #10605: URL: https://github.com/apache/datafusion/issues/10605 ### Describe the bug I found this while adding tests for reading parquet statistics https://github.com/apache/datafusion/pull/10592. Instead of getting back `BinaryArray`, we get `StringArray`

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608598651 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-21 Thread via GitHub
viirya merged PR #395: URL: https://github.com/apache/datafusion-comet/pull/395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122975595 @mhilton and I agree that if we had the functionality suggested by @Abdullahsab3's on https://github.com/apache/datafusion/issues/10368#issue-2277903243 > given a UTC ti

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#issuecomment-2122976226 Merged. Thanks @huaxingao @advancedxy @comphead @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao commented on PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#issuecomment-2122977103 Thanks, everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2122979923 My suggested next steps for this ticket: 1. Someone prototype the "strip_timezone" function as a ScalarUDF and verify that we can in fact we can achieve the expected result from

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608584669 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608605897 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_count_

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608605897 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_count_

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2122986254 @alamb I have created 2 more bug tickets but I cannot edit the description to add them in the subtasks. Can you help with that? 1. https://github.com/apache/datafusion/i

<    1   2   3   4   5   6   7   8   9   10   >