Re: [PR] only consider main part of the url when deciding is_collection in listing table [datafusion]

2024-05-08 Thread via GitHub
phillipleblanc commented on code in PR #10419: URL: https://github.com/apache/datafusion/pull/10419#discussion_r1593494268 ## datafusion/core/src/datasource/listing/url.rs: ## @@ -187,7 +187,10 @@ impl ListingTableUrl { /// Returns `true` if `path` refers to a collection

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub
jayzhan211 commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1593511241 ## datafusion/sqllogictest/test_files/coalesce.slt: ## @@ -209,28 +209,20 @@ select [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub
jayzhan211 commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1593511241 ## datafusion/sqllogictest/test_files/coalesce.slt: ## @@ -209,28 +209,20 @@ select [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id

Re: [I] DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` [datafusion]

2024-05-08 Thread via GitHub
ozankabak commented on issue #10414: URL: https://github.com/apache/datafusion/issues/10414#issuecomment-2099935819 > But the nice thing about DataFusion is its extensible, so I could imagine if someone has a strong requirement to keep the old syntax, we could write a custom parser that wra

[PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
lewiszlw opened a new pull request, #10420: URL: https://github.com/apache/datafusion/pull/10420 ## Which issue does this PR close? Simplify making information_schame tables and avoid potential bug that someone forgets adding logic for new table here. ## Rationale for t

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-08 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2099975473 > The downside is that the user needs to both define `simplify` and set `has_simplify` to true, but I think it is much simpler than the optional closure I personally try to

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-08 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2100015817 ## Option 1 - Original expression as a parameter ```rust fn simplify( &self, expr: Expr, ) -> Result> { // we know it'll alwa

Re: [I] select multiple columns in a single `Expr` [datafusion]

2024-05-08 Thread via GitHub
jayzhan211 commented on issue #10102: URL: https://github.com/apache/datafusion/issues/10102#issuecomment-2100049278 > > > How does the selection happens ? > > > > > > I think you can select with something like `my_struct_col['foo']` which returns the 'foo' field. > > However n

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-08 Thread via GitHub
jayzhan211 commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2100125239 > ## Option 1 - Original expression as a parameter > ```rust > fn simplify( > &self, > expr: Expr, > ) -> Result> { > // we know i

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-08 Thread via GitHub
tisonkun commented on PR #10392: URL: https://github.com/apache/datafusion/pull/10392#issuecomment-2100223608 Pushed a commit to handle other breaking changes. Now remains the JsonAccess and Agg func refactor. I'm still understanding how they are logically changed ... -- This is an autom

Re: [I] Boolean operators in expressions are ignored [datafusion-python]

2024-05-08 Thread via GitHub
timsaucer commented on issue #667: URL: https://github.com/apache/datafusion-python/issues/667#issuecomment-2100256593 Thank you! I tested and your answer works as expected. I'll put up a PR this morning to expand the documentation so others don't come with the same question. I appreciate

[I] EnforceDistribution fails, seems to turn all the types of the schema to UInt64 [datafusion]

2024-05-08 Thread via GitHub
fabianmurariu opened a new issue, #10421: URL: https://github.com/apache/datafusion/issues/10421 ### Describe the bug EnforceDistribution fails with `"PhysicalOptimizer rule 'EnforceDistribution' failed, due to generate a different schema, original schema:` ### To Reproduce

Re: [I] [Epic] Improved TreeNode APIs [datafusion]

2024-05-08 Thread via GitHub
backkem commented on issue #10121: URL: https://github.com/apache/datafusion/issues/10121#issuecomment-2100376590 The `ConcreteTreeNode` seems like quite a nice abstraction. Maybe it's interesting to add for the `LogicalPlan` & `Expr` as well. -- This is an automated message from the Apac

Re: [I] DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` [datafusion]

2024-05-08 Thread via GitHub
andygrove commented on issue #10414: URL: https://github.com/apache/datafusion/issues/10414#issuecomment-2100469329 IIRC, some of this original syntax came from a desire to suppor Hive SQL, but as @phillipleblanc said, if anyone needs this then they can add it back under a specific dialect.

[PR] Add document about basics of working with expressions [datafusion-python]

2024-05-08 Thread via GitHub
timsaucer opened a new pull request, #668: URL: https://github.com/apache/datafusion-python/pull/668 # Which issue does this PR close? Closes #667. # Rationale for this change The bug report was not an actual bug, but rather a lack of documentation about how to perform

Re: [PR] Add simplify method to aggregate function [datafusion]

2024-05-08 Thread via GitHub
milenkovicm commented on PR #10354: URL: https://github.com/apache/datafusion/pull/10354#issuecomment-2100499963 I apologise for making noise, but it looks like all of the non closure options have issue with borrowing: ``` error[E0505]: cannot move out of `expr.0` because it is bor

Re: [PR] minor: Remove docs archive [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10416: URL: https://github.com/apache/datafusion/pull/10416#issuecomment-2100532636 Filed https://github.com/apache/datafusion/issues/10422 to track potentially removing it from the repo history -- This is an automated message from the Apache Git Service. To respond

Re: [PR] UDAF: Add more fields to state fields [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10391: URL: https://github.com/apache/datafusion/pull/10391#issuecomment-2100536570 > @alamb This change is based on what the built-in function does, and what we need for UDAF to have the same behavior. If the change is not clear if it is necessary, we can also change

[PR] Upgrade Datafusion to v37.1.0 [datafusion-python]

2024-05-08 Thread via GitHub
Michael-J-Ward opened a new pull request, #669: URL: https://github.com/apache/datafusion-python/pull/669 # Which issue does this PR close? Closes #663. (hopefully) This is a cleaned up version of #662. Importantly, the final failing test is *not* commented out, so there

Re: [I] [Epic] Improved TreeNode APIs [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10121: URL: https://github.com/apache/datafusion/issues/10121#issuecomment-2100549720 > The `ConcreteTreeNode` seems like quite a nice abstraction. Maybe it's interesting to add for the `LogicalPlan` & `Expr` as well. Sounds like a nice idea to me -- I suspec

Re: [PR] Draft: upgrading to datafusion 37.1.0 [datafusion-python]

2024-05-08 Thread via GitHub
Michael-J-Ward closed pull request #662: Draft: upgrading to datafusion 37.1.0 URL: https://github.com/apache/datafusion-python/pull/662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Draft: upgrading to datafusion 37.1.0 [datafusion-python]

2024-05-08 Thread via GitHub
Michael-J-Ward commented on PR #662: URL: https://github.com/apache/datafusion-python/pull/662#issuecomment-2100552349 Closing in favor of #669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Build conda nightlies jobs are failing on main for aarch64 [datafusion-python]

2024-05-08 Thread via GitHub
Michael-J-Ward commented on issue #659: URL: https://github.com/apache/datafusion-python/issues/659#issuecomment-2100566566 Ah, `publish-docs` is the tag @andygrove used to trigger the docs generation & publication. Apparently, `conda` doesn't like that for a package/version. -- T

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub
alamb commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1594036090 ## datafusion/sqllogictest/test_files/coalesce.slt: ## @@ -209,28 +209,20 @@ select [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0,

Re: [PR] Upgrade Datafusion to v37.1.0 [datafusion-python]

2024-05-08 Thread via GitHub
Michael-J-Ward commented on PR #669: URL: https://github.com/apache/datafusion-python/pull/669#issuecomment-2100582530 One follow-on that I'd like to do is parametrize `test_array_functions()`. I didn't want to alter the test-cases unnecessarily while doing the upgrade. -- This

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1594041255 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -191,44 +191,219 @@ macro_rules! cast_int_to_int_macro { .as_any() .downc

Re: [I] Implement Spark-compatible CAST from String to Floating Point [datafusion-comet]

2024-05-08 Thread via GitHub
psvri commented on issue #326: URL: https://github.com/apache/datafusion-comet/issues/326#issuecomment-2100628882 Hello, I am not able to get time to work on this. If anyone else wants to try go ahead. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
Weijun-H commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594074667 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -225,34 +213,31 @@ impl InformationSchemaConfig { #[async_trait] impl SchemaProvider for Informati

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1594082373 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1024,7 +1024,27 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-08 Thread via GitHub
alamb commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1594086874 ## datafusion/sqllogictest/test_files/coalesce.slt: ## @@ -209,28 +209,20 @@ select [3, 4] List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0,

[I] Support "User defined coercsion" function [datafusion]

2024-05-08 Thread via GitHub
alamb opened a new issue, #10423: URL: https://github.com/apache/datafusion/issues/10423 ### Is your feature request related to a problem or challenge? DataFusion automatically "coerces" (see [docs here](https://docs.rs/datafusion/latest/datafusion/logical_expr/type_coercion/index.htm

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1594094594 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1024,7 +1024,27 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1594094594 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1024,7 +1024,27 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1594097078 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1024,7 +1024,27 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#issuecomment-2100675232 @viirya @kazuyukitanimura do you have any additional feedback? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
vidyasankarv commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594122726 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -107,7 +108,23 @@ macro_rules! cast_utf8_to_timestamp { result }}; } - +macro_rul

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
vidyasankarv commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594122726 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -107,7 +108,23 @@ macro_rules! cast_utf8_to_timestamp { result }}; } - +macro_rul

[I] docs: Document incompatibility of unhex with Spark 3.2 [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove opened a new issue, #400: URL: https://github.com/apache/datafusion-comet/issues/400 ### What is the problem the feature request solves? https://github.com/apache/datafusion-comet/pull/342 adds support for unhex, and we see compatible behavior in Comet with Spark 3.3 and abo

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
vidyasankarv commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594124981 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -274,16 +291,20 @@ impl Cast { (DataType::Utf8, DataType::Timestamp(_, _)) => {

Re: [PR] feat: Implement Spark unhex [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #342: URL: https://github.com/apache/datafusion-comet/pull/342#discussion_r1594127098 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1024,7 +1024,27 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2100732952 This PR is still in progress. I added support for String to Date32. Do we have to support Date64 as well. * Spark supports dates in the format and -MM and DataF

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
lewiszlw commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594157452 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -225,34 +213,31 @@ impl InformationSchemaConfig { #[async_trait] impl SchemaProvider for Informati

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
lewiszlw commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594159897 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -225,34 +213,31 @@ impl InformationSchemaConfig { #[async_trait] impl SchemaProvider for Informati

[PR] chore: Add criterion benchmarks for casting between integer types [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove opened a new pull request, #401: URL: https://github.com/apache/datafusion-comet/pull/401 ## Which issue does this PR close? N/A ## Rationale for this change We need benchmarks so that we can evaluate performance impacts of PRs that refactor exi

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1594176785 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -191,44 +191,219 @@ macro_rules! cast_int_to_int_macro { .as_any() .downc

Re: [PR] Upgrade Datafusion to v37.1.0 [datafusion-python]

2024-05-08 Thread via GitHub
slyons commented on PR #669: URL: https://github.com/apache/datafusion-python/pull/669#issuecomment-2100809613 Note this also closes #656 , #655, #654, #652, #651, #650 and #649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
comphead commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594237369 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -264,10 +244,7 @@ impl SchemaProvider for InformationSchemaProvider { } fn table_exist(&se

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
comphead commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594238729 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -225,37 +213,29 @@ impl InformationSchemaConfig { #[async_trait] impl SchemaProvider for Informati

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
Weijun-H commented on PR #10420: URL: https://github.com/apache/datafusion/pull/10420#issuecomment-2100885168 I am surprised that this pr passed the tests, how about adding some tests for it? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
lewiszlw commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594269857 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -264,10 +244,7 @@ impl SchemaProvider for InformationSchemaProvider { } fn table_exist(&se

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
lewiszlw commented on PR #10420: URL: https://github.com/apache/datafusion/pull/10420#issuecomment-2100923019 > I am surprised that this pr passed the tests, how about adding some tests for it? I added a test in slt. -- This is an automated message from the Apache Git Service. To r

[I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-08 Thread via GitHub
Michael-J-Ward opened a new issue, #10424: URL: https://github.com/apache/datafusion/issues/10424 ### Describe the bug The `array_slice` UDF takes 4 parameters. https://github.com/apache/datafusion/blob/96487ea0cbb7901a1e4aa18fdf6deb8961319fea/datafusion/functions-array/src/ext

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2100947892 > This PR is still in progress. I added support for String to Date32. > > * Spark supports dates in the format and -MM and DataFusion does not - supported now >

[I] `array_slice` panics with `stride=1` [datafusion]

2024-05-08 Thread via GitHub
Michael-J-Ward opened a new issue, #10425: URL: https://github.com/apache/datafusion/issues/10425 ### Describe the bug See reproduction for specific example that triggers the panic. Some combination of a column with varying size arrays, a negative start index, a positive end in

[I] pytest parameterize the tests in `test_functions.py` [datafusion-python]

2024-05-08 Thread via GitHub
Michael-J-Ward opened a new issue, #671: URL: https://github.com/apache/datafusion-python/issues/671 **Describe the solution you'd like** Each UDF in `src/functions.py` should have its own test case in `test_functions.py` instead of testeing a bunch of functions in a single test case

Re: [PR] Stop copying LogicalPlan and Exprs in `OptimizeProjections` (2% faster planning) [datafusion]

2024-05-08 Thread via GitHub
alamb commented on code in PR #10405: URL: https://github.com/apache/datafusion/pull/10405#discussion_r1594332291 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -467,6 +468,200 @@ impl LogicalPlan { self.with_new_exprs(self.expressions(), inputs.to_vec()) } +

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 6, 2024 [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10395: URL: https://github.com/apache/datafusion/issues/10395#issuecomment-2101020837 Queue - [ ] https://github.com/apache/datafusion/pull/10304 - [ ] https://github.com/apache/datafusion/pull/10381 -- This is an automated message from the Apache Git Se

Re: [PR] only consider main part of the url when deciding is_collection in listing table [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10419: URL: https://github.com/apache/datafusion/pull/10419#issuecomment-2101027243 I took the liberty of running `cargo fmt` and pushing that commit to this branch -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
alamb commented on code in PR #10420: URL: https://github.com/apache/datafusion/pull/10420#discussion_r1594350463 ## datafusion/core/src/catalog/information_schema.rs: ## @@ -225,37 +213,29 @@ impl InformationSchemaConfig { #[async_trait] impl SchemaProvider for InformationS

Re: [PR] Fix incorrect Schema over aggregate function, Remove unnecessary `exprlist_to_fields_aggregate` [datafusion]

2024-05-08 Thread via GitHub
alamb merged PR #10408: URL: https://github.com/apache/datafusion/pull/10408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Schema incorrect after select over aggregate function that returns a different type than the input [datafusion]

2024-05-08 Thread via GitHub
alamb closed issue #10346: Schema incorrect after select over aggregate function that returns a different type than the input URL: https://github.com/apache/datafusion/issues/10346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Fix incorrect Schema over aggregate function, Remove unnecessary `exprlist_to_fields_aggregate` [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10408: URL: https://github.com/apache/datafusion/pull/10408#issuecomment-2101031808 Thanks again @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Enable user defined display_name for ScalarUDF [datafusion]

2024-05-08 Thread via GitHub
alamb merged PR #10417: URL: https://github.com/apache/datafusion/pull/10417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Enable user defined display_name for ScalarUDF [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10417: URL: https://github.com/apache/datafusion/pull/10417#issuecomment-2101034440 Thanks @yyy1000 and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Support user defined display for UDF [datafusion]

2024-05-08 Thread via GitHub
alamb closed issue #10376: Support user defined display for UDF URL: https://github.com/apache/datafusion/issues/10376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] Move Covariance (Population) covar_pop to be a User Defined Aggregate Function [datafusion]

2024-05-08 Thread via GitHub
yyy1000 commented on PR #10418: URL: https://github.com/apache/datafusion/pull/10418#issuecomment-2101036966 > I agree once we port over the tests this PR should be good to go 🙏 Would you prefer moving test in a separate PR or in this PR? I'm good either way. :) -- This is an a

[I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-08 Thread via GitHub
alamb opened a new issue, #10426: URL: https://github.com/apache/datafusion/issues/10426 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/5637 One of the optimizer passes is "common subexpression elimination" that

Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10396: URL: https://github.com/apache/datafusion/pull/10396#issuecomment-2101053281 > @alamb, IMO if this PR can be merged then the next steps should be: > 1. Fix [Incorrect results with expression resolution  #10413](https://github.com/apache/datafusion/issues/1

Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10396: URL: https://github.com/apache/datafusion/pull/10396#issuecomment-2101053800 All right, I think we have our next steps outlined and tracked with tickets. 🚀 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

2024-05-08 Thread via GitHub
alamb merged PR #10396: URL: https://github.com/apache/datafusion/pull/10396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10396: URL: https://github.com/apache/datafusion/pull/10396#issuecomment-2101054046 Thanks again @peter-toth and @MohamedAbdeen21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] make common expression alias human-readable [datafusion]

2024-05-08 Thread via GitHub
alamb commented on PR #10333: URL: https://github.com/apache/datafusion/pull/10333#issuecomment-2101056433 Per the discussion on https://github.com/apache/datafusion/pull/10396, we merged that one first and now this one needs to be rebased / resolved. Marking it as draft until we can do tha

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-08 Thread via GitHub
peter-toth commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2101062450 I'm happy to take this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Fix and improve `CommonSubexprEliminate` rule [datafusion]

2024-05-08 Thread via GitHub
peter-toth commented on PR #10396: URL: https://github.com/apache/datafusion/pull/10396#issuecomment-2101063116 Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2101065729 Thanks @Michael-J-Ward -- what is your ideal outcome? That the UDF function can take three arguments (and default the fourth value to a constant 1)? -- This is an automated mes

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
parthchandra commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594382358 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -954,13 +993,63 @@ fn parse_str_to_time_only_timestamp(value: &str) -> CometResult> { Ok(S

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
parthchandra commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2101068286 > * "-0973250", "-3638-5" fuzz tests in Legacy mode should return values as mentioned [Implement Spark-compatible CAST from String to Date  #327](https://github.com/apache/data

Re: [I] EnforceDistribution fails, seems to turn all the types of the schema to UInt64 [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10421: URL: https://github.com/apache/datafusion/issues/10421#issuecomment-2101067583 Thanks for the report @fabianmurariu Is there any way we can get a self contained reproducer? I ran the query in the description and it doesn't seem to have all the tables

Re: [I] DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10414: URL: https://github.com/apache/datafusion/issues/10414#issuecomment-2101070772 If there is consensus here, given we just branched for the 38 release, maybe we can remove support on main now as part of https://github.com/apache/datafusion/pull/10404 and let t

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-08 Thread via GitHub
parthchandra commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1594390149 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -107,7 +108,23 @@ macro_rules! cast_utf8_to_timestamp { result }}; } - +macro_rul

Re: [I] Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-08 Thread via GitHub
vaibhawvipul commented on issue #374: URL: https://github.com/apache/datafusion-comet/issues/374#issuecomment-2101081443 I am working on this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2101083431 Looks like these were last modified in https://github.com/apache/datafusion/pull/9788 / https://github.com/apache/datafusion/pull/9615 from @jayzhan211 and @erenavsarogullari

Re: [I] Substrait integration doesn't recognize typed functions [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10412: URL: https://github.com/apache/datafusion/issues/10412#issuecomment-2101085347 Thanks for the report @Blizzara -- this would be a great thing to fix -- This is an automated message from the Apache Git Service. To respond to the message, please log

[I] chore: Improve CometCastSuite framework for checking for expected error messages [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove opened a new issue, #402: URL: https://github.com/apache/datafusion-comet/issues/402 ### What is the problem the feature request solves? In `CometCastSuite` we have a shared method `castTest` which executes queries against Spark and Comet and compares results. In the

Re: [I] `min_value` and `max_value` on statistics don't help avoid `ExecutionPlan.execute` [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10400: URL: https://github.com/apache/datafusion/issues/10400#issuecomment-2101087503 I think using the reported statistics to prune using [datafusion](https://docs.rs/datafusion/latest/datafusion/index.html)::[physical_optimizer](https://docs.rs/datafusion/latest/d

Re: [I] Use `min_value` and `max_value` on statistics to avoid `ExecutionPlan.execute` [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10400: URL: https://github.com/apache/datafusion/issues/10400#issuecomment-2101088327 Relabed from bug to feature -- thanks @samuelcolvin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: Implement Spark-compatible CAST from floating-point/double to decimal [datafusion-comet]

2024-05-08 Thread via GitHub
andygrove commented on code in PR #384: URL: https://github.com/apache/datafusion-comet/pull/384#discussion_r1594407351 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -960,11 +958,19 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelpe

Re: [PR] feat: Implement Spark-compatible CAST from floating-point/double to decimal [datafusion-comet]

2024-05-08 Thread via GitHub
vaibhawvipul commented on code in PR #384: URL: https://github.com/apache/datafusion-comet/pull/384#discussion_r1594409792 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -960,11 +958,19 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHe

Re: [I] Support custom SchemaAdapter on ParquetExec [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10398: URL: https://github.com/apache/datafusion/issues/10398#issuecomment-2101094129 Perhaps a good starting place would be to make Schema Adapter public. It seems entirely an private struct today https://docs.rs/datafusion/latest/datafusion/index.html?search=Sche

Re: [I] Remove `Expr::GetIndexedField` and `GetFieldAccess` and always use function `get_field` for indexing [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10374: URL: https://github.com/apache/datafusion/issues/10374#issuecomment-2101096011 One potential downside as @westonpace mentioned somewhere I can't find now, is that systems that want to look for field accesses for their own analysis (e.g. to find all nested fi

Re: [I] Better timezone functionalities [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10368: URL: https://github.com/apache/datafusion/issues/10368#issuecomment-2101097362 I haven't read this ticket in detail yet, but there is a collection of other issues on https://github.com/apache/datafusion/issues/8282 -- This is an automated message from the

Re: [I] DISCUSSION: remove `CREATE EXTERNAL TABLE` syntax: `DELIMITER`, `WITH HEADER ROW` and `COMPRESSION` [datafusion]

2024-05-08 Thread via GitHub
ozankabak commented on issue #10414: URL: https://github.com/apache/datafusion/issues/10414#issuecomment-2101098499 Sounds good. We will update the PR tomorrow 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Auto-update mechanism for dataframe test [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10373: URL: https://github.com/apache/datafusion/issues/10373#issuecomment-2101100092 Something we have used to great effect in influxdb is https://insta.rs/ You can then do the equivalent of `sqllogictest --complete` (even for results within files) with a co

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-08 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1594419991 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -191,44 +191,219 @@ macro_rules! cast_int_to_int_macro { .as_any() .do

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-08 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1594420846 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -191,44 +191,219 @@ macro_rules! cast_int_to_int_macro { .as_any() .do

[PR] build: Switch back to released version of DataFusion and arrow-rs after Arrow Java 16 is released [datafusion-comet]

2024-05-08 Thread via GitHub
viirya opened a new pull request, #403: URL: https://github.com/apache/datafusion-comet/pull/403 ## Which issue does this PR close? Closes #248. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] build: Switch back to released version of DataFusion and arrow-rs after Arrow Java 16 is released [datafusion-comet]

2024-05-08 Thread via GitHub
viirya commented on PR #403: URL: https://github.com/apache/datafusion-comet/pull/403#issuecomment-2101108193 Note that this is blocked by #250. We need to verify Java Arrow after #250 is merged. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-08 Thread via GitHub
Michael-J-Ward commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-210873 > what is your ideal outcome? That the UDF function can take three arguments (and default the fourth value to a constant 1)? I would expect that there exists some v

Re: [I] Support for filtered arrow datasets [datafusion]

2024-05-08 Thread via GitHub
alamb commented on issue #10267: URL: https://github.com/apache/datafusion/issues/10267#issuecomment-2101112138 Thanks for the report @adriangb I am not familiar with this feature but it appears the error comes from pyarrow itself https://github.com/apache/arrow/blob/3046501

Re: [PR] Simplify making information_schame tables [datafusion]

2024-05-08 Thread via GitHub
alamb merged PR #10420: URL: https://github.com/apache/datafusion/pull/10420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

  1   2   >