[PR] Fix Union Equivalence Propogation Bug [datafusion]

2024-07-16 Thread via GitHub
mustafasrepo opened a new pull request, #11506: URL: https://github.com/apache/datafusion/pull/11506 ## Which issue does this PR close? Closes [#11492](https://github.com/apache/datafusion/issues/11492) ## Rationale for this change ## What changes are incl

Re: [PR] `ArrayAgg` UDAF [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11448: URL: https://github.com/apache/datafusion/pull/11448#discussion_r1680467219 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -702,6 +702,7 @@ async fn roundtrip_expr_api() -> Result<()> { string_agg(col("a").cas

[PR] fix: unparser generates wrong sql for derived table with columns [datafusion]

2024-07-16 Thread via GitHub
y-f-u opened a new pull request, #11505: URL: https://github.com/apache/datafusion/pull/11505 ## Which issue does this PR close? Unparser creates invalid sqls for LogicPlan that generated from sql with derived table with columns. ## Rationale for this change See "Which i

Re: [PR] `ArrayAgg` UDAF [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11448: URL: https://github.com/apache/datafusion/pull/11448#discussion_r1680464606 ## datafusion/physical-expr/src/aggregate/array_agg_distinct.rs: ## @@ -1,433 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more

[PR] make unparser Dialect Send + Sync [datafusion]

2024-07-16 Thread via GitHub
y-f-u opened a new pull request, #11504: URL: https://github.com/apache/datafusion/pull/11504 ## Which issue does this PR close? This makes unparser trait `Dialect` easier to work within DataFusion context for federated queries (datafusion-federation). ## Rationale for this cha

Re: [PR] implement retract_batch for xor accumulator [datafusion]

2024-07-16 Thread via GitHub
mustafasrepo commented on code in PR #11500: URL: https://github.com/apache/datafusion/pull/11500#discussion_r1680447853 ## datafusion/functions-aggregate/src/bit_and_or_xor.rs: ## @@ -358,6 +358,14 @@ where Ok(()) } +fn retract_batch(&mut self, values: &[Arr

Re: [PR] implement retract_batch for xor accumulator [datafusion]

2024-07-16 Thread via GitHub
mustafasrepo commented on code in PR #11500: URL: https://github.com/apache/datafusion/pull/11500#discussion_r1680447853 ## datafusion/functions-aggregate/src/bit_and_or_xor.rs: ## @@ -358,6 +358,14 @@ where Ok(()) } +fn retract_batch(&mut self, values: &[Arr

Re: [PR] feat: Optimize for CASE WHEN .. THEN column ELSE null END [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #672: URL: https://github.com/apache/datafusion-comet/pull/672#discussion_r1680431753 ## native/core/src/execution/datafusion/planner.rs: ## @@ -541,6 +542,24 @@ impl PhysicalPlanner { Some(self.create_expr(case_when.el

Re: [PR] Remove element's nullability of array_agg function [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on PR #11447: URL: https://github.com/apache/datafusion/pull/11447#issuecomment-2232454745 Thanks @alamb and @eejbyfeldt for your suggestion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Remove element's nullability of array_agg function [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 merged PR #11447: URL: https://github.com/apache/datafusion/pull/11447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Remove element's nullability of array_agg function [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on PR #11447: URL: https://github.com/apache/datafusion/pull/11447#issuecomment-2232453938 I would like to merge this, since the builtin function code will eventually be removed, we can add it for UDAF later on -- This is an automated message from the Apache Git Servi

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2232452105 > Hmmm we have microbenchmarks at https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark > > Wondering if we should follow

[I] Leverage dictionary-encode when turning a scalar columnar value into an array [datafusion]

2024-07-16 Thread via GitHub
doki23 opened a new issue, #11503: URL: https://github.com/apache/datafusion/issues/11503 ### Is your feature request related to a problem or challenge? We have a `into_array` function of `ColumnarValue` which converts it into an arrow array like this: ```rust pub fn into_array(

Re: [I] Potential memory issue when using COPY with PARTITIONED BY [datafusion]

2024-07-16 Thread via GitHub
hveiga commented on issue #11042: URL: https://github.com/apache/datafusion/issues/11042#issuecomment-2232227943 I finally have some time to continue investigating this issue. I have not been able to make heaptrack work (yet!) but I did try using [dhat](https://docs.rs/dhat/latest/dhat/) an

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-16 Thread via GitHub
Michael-J-Ward commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1680305302 ## benchmarks/db-benchmark/join-datafusion.py: ## @@ -74,7 +74,8 @@ def ans_shape(batches): ctx = df.SessionContext() print(ctx) -# TODO we should be

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-16 Thread via GitHub
Michael-J-Ward commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1680305302 ## benchmarks/db-benchmark/join-datafusion.py: ## @@ -74,7 +74,8 @@ def ans_shape(batches): ctx = df.SessionContext() print(ctx) -# TODO we should be

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-16 Thread via GitHub
Michael-J-Ward commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1680309889 ## python/datafusion/__init__.py: ## @@ -15,206 +15,74 @@ # specific language governing permissions and limitations # under the License. -from abc im

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-16 Thread via GitHub
timsaucer commented on PR #750: URL: https://github.com/apache/datafusion-python/pull/750#issuecomment-2232207709 Added a temporary work around for CI issues and added an issue to fix once the upstream is resolved. https://github.com/apache/datafusion-python/issues/763 -- This is an auto

[I] Remove unnecessary google test import in CI once upstream is resolved [datafusion-python]

2024-07-16 Thread via GitHub
timsaucer opened a new issue, #763: URL: https://github.com/apache/datafusion-python/issues/763 **Describe the bug** After https://github.com/MaterializeInc/rust-protobuf-native/issues/20 is resolved, remove the temporary CI lines marked in - `.github/workflows/build.yml` - `.g

Re: [PR] CI Testing [datafusion-python]

2024-07-16 Thread via GitHub
timsaucer closed pull request #762: CI Testing URL: https://github.com/apache/datafusion-python/pull/762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Optimization: make the most of Hint::AcceptsSingular when call make_scalar_function to Improve performance [datafusion]

2024-07-16 Thread via GitHub
github-actions[bot] commented on PR #10054: URL: https://github.com/apache/datafusion/pull/10054#issuecomment-2232171216 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: support `unnest` in GROUP BY clause [datafusion]

2024-07-16 Thread via GitHub
jonahgao commented on PR #11469: URL: https://github.com/apache/datafusion/pull/11469#issuecomment-2232144289 Thanks @JasonLi-cn @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Feat: Support `GROUP BY unnest expr` [datafusion]

2024-07-16 Thread via GitHub
jonahgao closed issue #11470: Feat: Support `GROUP BY unnest expr` URL: https://github.com/apache/datafusion/issues/11470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] feat: support `unnest` in GROUP BY clause [datafusion]

2024-07-16 Thread via GitHub
jonahgao merged PR #11469: URL: https://github.com/apache/datafusion/pull/11469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] [EPIC] Extract physical optimizer out of core [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 opened a new issue, #11502: URL: https://github.com/apache/datafusion/issues/11502 ### Is your feature request related to a problem or challenge? Given the research from #11207, we need to pull out physical optimizer first before extracting catalog. ### Describe the

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
parthchandra commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2232116879 > Hmmm we have microbenchmarks at https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark Wondering if we should follow the ex

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
parthchandra commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1680213976 ## kube/Dockerfile: ## @@ -0,0 +1,45 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the

Re: [PR] Support alternate format for Utf8 unparsing (CHAR) [datafusion]

2024-07-16 Thread via GitHub
sgrebnov commented on code in PR #11494: URL: https://github.com/apache/datafusion/pull/11494#discussion_r1680208153 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -45,6 +45,13 @@ pub trait Dialect { fn interval_style(&self) -> IntervalStyle { IntervalStyle::Post

Re: [PR] Get expr planners when creating new planner [datafusion]

2024-07-16 Thread via GitHub
jayzhan211 commented on code in PR #11485: URL: https://github.com/apache/datafusion/pull/11485#discussion_r1680215474 ## datafusion/core/src/execution/session_state.rs: ## @@ -1597,12 +1583,20 @@ impl SessionStateDefaults { } } +/// Adapter that implements the [`Context

Re: [PR] Support alternate format for Utf8 unparsing (CHAR) [datafusion]

2024-07-16 Thread via GitHub
sgrebnov commented on code in PR #11494: URL: https://github.com/apache/datafusion/pull/11494#discussion_r1680213787 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -45,6 +45,13 @@ pub trait Dialect { fn interval_style(&self) -> IntervalStyle { IntervalStyle::Post

Re: [PR] feat: Optimize for CASE WHEN .. THEN column ELSE null END [datafusion-comet]

2024-07-16 Thread via GitHub
parthchandra commented on code in PR #672: URL: https://github.com/apache/datafusion-comet/pull/672#discussion_r1680210075 ## native/core/src/execution/datafusion/planner.rs: ## @@ -541,6 +542,24 @@ impl PhysicalPlanner { Some(self.create_expr(case_when

Re: [PR] Support alternate format for Utf8 unparsing (CHAR) [datafusion]

2024-07-16 Thread via GitHub
sgrebnov commented on code in PR #11494: URL: https://github.com/apache/datafusion/pull/11494#discussion_r1680208153 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -45,6 +45,13 @@ pub trait Dialect { fn interval_style(&self) -> IntervalStyle { IntervalStyle::Post

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-16 Thread via GitHub
tshauck commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2232043137 Thanks! A lot of the discussion is good for context, but not as relevant to the actual changes.Sent from my iPhoneOn Jul 16, 2024, at 2:49 PM, Andrew Lamb ***@***.***> wrote: I w

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
kazuyukitanimura commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1680192507 ## kube/Dockerfile: ## @@ -0,0 +1,45 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
kazuyukitanimura commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2232034311 Hmmm we have microbenchmarks at https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark Wondering if we should follow th

Re: [PR] chore: Improve fuzz testing coverage [datafusion-comet]

2024-07-16 Thread via GitHub
kazuyukitanimura commented on code in PR #668: URL: https://github.com/apache/datafusion-comet/pull/668#discussion_r1680186720 ## fuzz-testing/src/main/scala/org/apache/comet/fuzz/Meta.scala: ## @@ -108,6 +124,8 @@ object Meta { val unaryArithmeticOps: Seq[String] = Seq("+"

Re: [PR] Support alternate format for Utf8 unparsing (CHAR) [datafusion]

2024-07-16 Thread via GitHub
sgrebnov commented on PR #11494: URL: https://github.com/apache/datafusion/pull/11494#issuecomment-2232027730 > I had a suggestion on the API -- let me know what you think @alamb - 👍 I was thinking about this as well, but was unable to come up with other good potential types for strin

Re: [PR] WIP: Enable remaining Spark 3.5.1 tests [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #676: URL: https://github.com/apache/datafusion-comet/pull/676#discussion_r1680185877 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -289,7 +289,7 @@ object CometConf extends ShimCometConf { "why a query stage cannot

Re: [PR] Support SortMergeJoin spilling [datafusion]

2024-07-16 Thread via GitHub
viirya commented on PR #11218: URL: https://github.com/apache/datafusion/pull/11218#issuecomment-2231980775 I'm blocked by fixing customer issue. I will take another look later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] feat: support `unnest` in GROUP BY clause [datafusion]

2024-07-16 Thread via GitHub
JasonLi-cn commented on PR #11469: URL: https://github.com/apache/datafusion/pull/11469#issuecomment-2231980745 > I merged up to resolve a conflict as well Thank you @alamb for your help to modify the code, I learned a lot from it. -- This is an automated message from the Apache Git

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
viirya commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1680165588 ## kube/Dockerfile: ## @@ -0,0 +1,45 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTIC

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
viirya commented on code in PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#discussion_r1680165928 ## kube/Dockerfile: ## @@ -0,0 +1,45 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTIC

[PR] Support `extract` on intervals [datafusion]

2024-07-16 Thread via GitHub
nrc opened a new pull request, #11501: URL: https://github.com/apache/datafusion/pull/11501 ## Which issue does this PR close? Closes 6327. ## What changes are included in this PR? This PR simply allows `Interval` types to be passed through to Arrow. It will require an A

Re: [I] Implement physical plan serialization for COPY plans `CsvLogicalExtensionCodec` [datafusion]

2024-07-16 Thread via GitHub
Lordworms commented on issue #11150: URL: https://github.com/apache/datafusion/issues/11150#issuecomment-2231969997 starts to work on this one, was delayed by the substrait one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove merged PR #674: URL: https://github.com/apache/datafusion-comet/pull/674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Integrate with the substrait integration test [datafusion]

2024-07-16 Thread via GitHub
Lordworms commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2231946846 Hi, @richtia I just wonder how you get the original substrait plan, since I use duckdb to be the producer and generate the plan, it didn't have some structure like ```

[PR] WIP: Enable remaining Spark 3.5.1 tests [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove opened a new pull request, #676: URL: https://github.com/apache/datafusion-comet/pull/676 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/617 Follows on from https://github.com/apache/datafusion-comet/pull/674

Re: [PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #674: URL: https://github.com/apache/datafusion-comet/pull/674#issuecomment-2231935067 @kazuyukitanimura @huaxingao Could I get a committer review? Parth has already approved. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#issuecomment-2231932832 > Thanks @andygrove I'll add docs later when I wrap up with local kubectl. I'm not sure how useful just a raw docker file, the user most likely wants to run it in the cluster?

Re: [PR] Add dialect param to use double precision for float64 in Postgres [datafusion]

2024-07-16 Thread via GitHub
Sevenannn commented on code in PR #11495: URL: https://github.com/apache/datafusion/pull/11495#discussion_r1680132675 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -45,6 +45,12 @@ pub trait Dialect { fn interval_style(&self) -> IntervalStyle { IntervalStyle::Pos

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-16 Thread via GitHub
wiedld commented on code in PR #11444: URL: https://github.com/apache/datafusion/pull/11444#discussion_r1680071170 ## datafusion/common/src/config.rs: ## @@ -1874,4 +2042,193 @@ mod tests { let parsed_metadata = table_config.parquet.key_value_metadata; assert_e

Re: [PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on code in PR #674: URL: https://github.com/apache/datafusion-comet/pull/674#discussion_r1680126583 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala: ## @@ -49,16 +48,14 @@ trait ShimCometScanExec { filePartitions,

Re: [PR] Create Comet docker file [datafusion-comet]

2024-07-16 Thread via GitHub
comphead commented on PR #675: URL: https://github.com/apache/datafusion-comet/pull/675#issuecomment-2231916502 Thanks @andygrove I'll add docs later when I wrap up with local kubectl. I'm not sure how useful just a raw docker file, the user most likely wants to run it in the cluster? --

Re: [I] DataFusion repo got 40MB larger [datafusion]

2024-07-16 Thread via GitHub
comphead commented on issue #10422: URL: https://github.com/apache/datafusion/issues/10422#issuecomment-2231914988 Git action looks even more promising as you right, the precommit checks are not reliable as they running locally. Imho I like the way https://github.com/james-callahan/gha-

Re: [I] Improve performance for grouping by variable length columns (strings) [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #9403: URL: https://github.com/apache/datafusion/issues/9403#issuecomment-2231888693 > dictionary-encoded parquet, then the underlying buffers are unique (i.e., no duplicated values) That is probably correct for arrays that share the same data page. But once

Re: [PR] Add extension hooks for encoding and decoding UDAFs and UDWFs [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11417: URL: https://github.com/apache/datafusion/pull/11417#issuecomment-2231886271 Thanks again @joroKr21 and @lewiszlw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Handle Serde for Custom AggregateUDFImpl traits [datafusion]

2024-07-16 Thread via GitHub
alamb closed issue #11422: Handle Serde for Custom AggregateUDFImpl traits URL: https://github.com/apache/datafusion/issues/11422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add extension hooks for encoding and decoding UDAFs and UDWFs [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11417: URL: https://github.com/apache/datafusion/pull/11417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add extension hooks for encoding and decoding UDAFs and UDWFs [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11417: URL: https://github.com/apache/datafusion/pull/11417#issuecomment-2231886145 > I got this error before, running `cargo update` fixed it. 🤔 looks like we probably need to increase the minimum version of chrono required to one that has `MappedLocalTime`

Re: [I] EnforceDistribution fails, seems to turn all the types of the schema to UInt64 [datafusion]

2024-07-16 Thread via GitHub
alamb closed issue #10421: EnforceDistribution fails, seems to turn all the types of the schema to UInt64 URL: https://github.com/apache/datafusion/issues/10421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] EnforceDistribution fails, seems to turn all the types of the schema to UInt64 [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #10421: URL: https://github.com/apache/datafusion/issues/10421#issuecomment-2231884710 Perhaps this function would help you: https://docs.rs/datafusion/latest/datafusion/common/fn.project_schema.html (it is what the line pointed to by @mustafasrepo above uses)

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2231883012 I will try and find time to review this PR tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Move `sql_compound_identifier_to_expr` to `ExprPlanner` [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #11473: URL: https://github.com/apache/datafusion/issues/11473#issuecomment-2231882008 > I think ideally we should move get_field to ExprPlanner. Rest of code as dependecy on SqlToRel struct Makes sense to me 👍 -- This is an automated message from

Re: [PR] Fix parse `'1'::interval` as month by default [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11454: URL: https://github.com/apache/datafusion/pull/11454#discussion_r1680103048 ## datafusion/expr/src/columnar_value.rs: ## @@ -195,19 +195,39 @@ impl ColumnarValue { kernels::cast::cast_with_options(array, cast_type, &cast_op

Re: [PR] Add parser option enable_options_value_normalization [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11330: URL: https://github.com/apache/datafusion/pull/11330#issuecomment-2231878134 Hi @xinlifoobar -- I haven't had a chance to review this PR unfortunately. Perhaps @tinfoil-knight has some time to review the design as the original filer of https://github.com/apa

[PR] CI Testing [datafusion-python]

2024-07-16 Thread via GitHub
timsaucer opened a new pull request, #762: URL: https://github.com/apache/datafusion-python/pull/762 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

Re: [PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
parthchandra commented on code in PR #674: URL: https://github.com/apache/datafusion-comet/pull/674#discussion_r1680092045 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala: ## @@ -49,16 +48,14 @@ trait ShimCometScanExec { filePartitions,

Re: [PR] feat: support `unnest` in GROUP BY clause [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11469: URL: https://github.com/apache/datafusion/pull/11469#issuecomment-2231854255 I merged up to resolve a conflict as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Introduce user defined SQL planner API [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11180: URL: https://github.com/apache/datafusion/pull/11180#discussion_r1680082243 ## datafusion/sql/src/expr/mod.rs: ## @@ -341,7 +278,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } }; -

Re: [I] ExprPlanner not propagated to SqlToRel [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #11477: URL: https://github.com/apache/datafusion/issues/11477#issuecomment-2231851660 Cross referencing -- a related discussion on https://github.com/apache/datafusion/pull/11180 https://github.com/apache/datafusion/pull/11180#discussion_r1667729599 -- This is a

Re: [PR] Get expr planners when creating new planner [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11485: URL: https://github.com/apache/datafusion/pull/11485#discussion_r1680059944 ## datafusion/sql/src/planner.rs: ## @@ -186,8 +185,6 @@ pub struct SqlToRel<'a, S: ContextProvider> { pub(crate) context_provider: &'a S, pub(crate) optio

Re: [PR] implement retract_batch for xor accumulator [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11500: URL: https://github.com/apache/datafusion/pull/11500#issuecomment-2231849997 Thanks @drewhayward -- @mustafasrepo do you have time to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Refactor: more clearly delineate btwn writer options vs session configuration [datafusion]

2024-07-16 Thread via GitHub
wiedld commented on code in PR #11444: URL: https://github.com/apache/datafusion/pull/11444#discussion_r1680071170 ## datafusion/common/src/config.rs: ## @@ -1874,4 +2042,193 @@ mod tests { let parsed_metadata = table_config.parquet.key_value_metadata; assert_e

[PR] implement retract_batch for xor accumulator [datafusion]

2024-07-16 Thread via GitHub
drewhayward opened a new pull request, #11500: URL: https://github.com/apache/datafusion/pull/11500 ## Which issue does this PR close? Closes #7666. ## What changes are included in this PR? Implements the `retract_batch()` method for the `BitwiseXorAccumulator` by callin

Re: [I] DataFusion repo got 40MB larger [datafusion]

2024-07-16 Thread via GitHub
findepi commented on issue #10422: URL: https://github.com/apache/datafusion/issues/10422#issuecomment-2231830117 > I also think if we force push to main all outstanding PRs will become quite messed up until after a rebase that's correct. also, editing release tags is something

Re: [PR] chore: Fix some regressions with Spark 3.5.1 [datafusion-comet]

2024-07-16 Thread via GitHub
parthchandra commented on code in PR #674: URL: https://github.com/apache/datafusion-comet/pull/674#discussion_r1680059722 ## spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala: ## @@ -49,16 +48,14 @@ trait ShimCometScanExec { filePartitions,

Re: [PR] feat: support UDWFs in Substrait [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11489: URL: https://github.com/apache/datafusion/pull/11489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Followup Support NULL literals in where clause [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11491: URL: https://github.com/apache/datafusion/pull/11491#discussion_r1680053247 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -86,6 +87,18 @@ fn analyze_internal( external_schema: &DFSchema, plan: LogicalPlan, ) -> Resu

Re: [PR] Add dialect param to use double precision for float64 in Postgres [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11495: URL: https://github.com/apache/datafusion/pull/11495#discussion_r1680049275 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -45,6 +45,12 @@ pub trait Dialect { fn interval_style(&self) -> IntervalStyle { IntervalStyle::Postgre

Re: [PR] Support alternate format for Utf8 unparsing (CHAR) [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11494: URL: https://github.com/apache/datafusion/pull/11494#discussion_r1680046876 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -45,6 +45,13 @@ pub trait Dialect { fn interval_style(&self) -> IntervalStyle { IntervalStyle::Postgre

Re: [I] Trailing comma output misleading error message [datafusion]

2024-07-16 Thread via GitHub
alamb closed issue #9949: Trailing comma output misleading error message URL: https://github.com/apache/datafusion/issues/9949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Update sqlparser requirement from 0.47 to 0.48 [datafusion]

2024-07-16 Thread via GitHub
dependabot[bot] commented on PR #11377: URL: https://github.com/apache/datafusion/pull/11377#issuecomment-2231792353 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] Update sqlparser requirement from 0.47 to 0.48 [datafusion]

2024-07-16 Thread via GitHub
alamb closed pull request #11377: Update sqlparser requirement from 0.47 to 0.48 URL: https://github.com/apache/datafusion/pull/11377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] upgrade sqlparser 0.47 -> 0.48 [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11453: URL: https://github.com/apache/datafusion/pull/11453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] upgrade sqlparser 0.47 -> 0.48 [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11453: URL: https://github.com/apache/datafusion/pull/11453#issuecomment-2231792135 Filed https://github.com/apache/datafusion/issues/11499 to track looking into the stack overflow. Thanks everyone -- This is an automated message from the Apache Git Service.

[I] Investigate memory use in debug builds for deeply nested array constants [datafusion]

2024-07-16 Thread via GitHub
alamb opened a new issue, #11499: URL: https://github.com/apache/datafusion/issues/11499 ### Is your feature request related to a problem or challenge? While upgrading to a new version of SQL parser , @MohamedAbdeen21 in https://github.com/apache/datafusion/pull/11453 we found that t

Re: [PR] Remove element's nullability of array_agg function [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11447: URL: https://github.com/apache/datafusion/pull/11447#issuecomment-2231783925 I defer to @jayzhan211 with what to do with this PR. I am fine either way (keep the API in case it might be helpful in the future or remove it and we can add it again if it is needed)

Re: [PR] Enable `clone_on_ref_ptr` clippy lints on proto [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11465: URL: https://github.com/apache/datafusion/pull/11465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Enable `clone_on_ref_ptr` clippy lints on proto [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11465: URL: https://github.com/apache/datafusion/pull/11465#issuecomment-2231782244 I merged this PR to main locally and ran clippy to check for logical conflicts. All is good. Thank you @lewiszlw -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Minor: Make execute_input_stream Accessible for Any Sinking Operators [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11449: URL: https://github.com/apache/datafusion/pull/11449#issuecomment-2231780865 Thanks again @berkaysynnada and @ozankabak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Minor: Make execute_input_stream Accessible for Any Sinking Operators [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11449: URL: https://github.com/apache/datafusion/pull/11449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: switch to using proper Substrait types for IntervalYearMonth and IntervalDayTime [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11471: URL: https://github.com/apache/datafusion/pull/11471#issuecomment-2231779975 Thanks again @Blizzara -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat: switch to using proper Substrait types for IntervalYearMonth and IntervalDayTime [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11471: URL: https://github.com/apache/datafusion/pull/11471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Replace to_lowercase with to_string in sql example [datafusion]

2024-07-16 Thread via GitHub
alamb merged PR #11486: URL: https://github.com/apache/datafusion/pull/11486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Make execute_input_stream Accessible for Any Sinking Operators [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11449: URL: https://github.com/apache/datafusion/pull/11449#issuecomment-2231780732 Let's move the code around in a subsequent PR if we think that is useful -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Replace to_lowercase with to_string in sql example [datafusion]

2024-07-16 Thread via GitHub
alamb commented on PR #11486: URL: https://github.com/apache/datafusion/pull/11486#issuecomment-2231779637 Thank you @lewiszlw and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: support group by unnest [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11469: URL: https://github.com/apache/datafusion/pull/11469#discussion_r1680032211 ## datafusion/sql/src/select.rs: ## @@ -297,6 +298,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { input: LogicalPlan, select_exprs: Vec,

[I] This function seems to have some code repetition with function [try_process_unnest](https://github.com/apache/datafusion/blob/f204869ff55bb3e39cf23fc0a34ebd5021e6773f/datafusion/sql/src/select.rs#

2024-07-16 Thread via GitHub
alamb opened a new issue, #11498: URL: https://github.com/apache/datafusion/issues/11498 This function seems to have some code repetition with function [try_process_unnest](https://github.com/apache/datafusion/blob/f204869ff55bb3e39cf23fc0a34ebd5021e6773f/datafusion/sql/src/sel

Re: [PR] feat: support group by unnest [datafusion]

2024-07-16 Thread via GitHub
alamb commented on code in PR #11469: URL: https://github.com/apache/datafusion/pull/11469#discussion_r1680028460 ## datafusion/sql/src/select.rs: ## @@ -354,6 +358,118 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .build() } +fn try_process_aggregate

Re: [I] [Epic] Extract catalog functionality from the core to make it more modular [datafusion]

2024-07-16 Thread via GitHub
alamb commented on issue #10782: URL: https://github.com/apache/datafusion/issues/10782#issuecomment-2231749097 Shall we file a ticket / epic to pull physical optimizer out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

  1   2   >