Re: [PR] Fix: count aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
HuSen8891 closed pull request #12085: Fix: count aggregate function should not be nullable URL: https://github.com/apache/datafusion/pull/12085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Fix: count aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
HuSen8891 commented on PR #12085: URL: https://github.com/apache/datafusion/pull/12085#issuecomment-2301359288 issue https://github.com/apache/datafusion/issues/12077 already solved, close this. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724580071 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724580071 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724580071 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724580071 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724589925 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724589925 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
2010YOUY01 commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724605120 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -143,6 +402,25 @@ pub trait GroupsAccumulator: Send { /// [`Accumulator::state`]: crate::accumul

[I] Implement GroupsAccumulator for stddev and var aggregaters [datafusion]

2024-08-21 Thread via GitHub
eejbyfeldt opened a new issue, #12094: URL: https://github.com/apache/datafusion/issues/12094 ### Is your feature request related to a problem or challenge? I order to speed up aggregates using stddev and/or variance aggregates. ### Describe the solution you'd like The st

Re: [I] Thread panics in SpawnedTask during shutdown. [datafusion]

2024-08-21 Thread via GitHub
crepererum commented on issue #12089: URL: https://github.com/apache/datafusion/issues/12089#issuecomment-2301452144 I totally see the use case for `join_unwind` though: it resumes unwinding on another thread, which often simplifies debugging a lot. Yeeting through logs and errors that just

[PR] Implement groups accumulator for stddev and variance [datafusion]

2024-08-21 Thread via GitHub
eejbyfeldt opened a new pull request, #12095: URL: https://github.com/apache/datafusion/pull/12095 ## Which issue does this PR close? Closes #12094. ## Rationale for this change Hopefully improve performance of queries using stddev and variance aggregates.

[PR] Remove `AggregateExpr` trait [datafusion]

2024-08-21 Thread via GitHub
lewiszlw opened a new pull request, #12096: URL: https://github.com/apache/datafusion/pull/12096 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11810. ## Rationale for this change ## What changes are included in this

Re: [I] UNNEST cannot appear inside a subquery [datafusion]

2024-08-21 Thread via GitHub
jonahgao commented on issue #11773: URL: https://github.com/apache/datafusion/issues/11773#issuecomment-2301598463 After fixing that function, it will give the following error. ```sh DataFusion CLI v41.0.0 > SELECT id, (SELECT * FROM UNNEST(arr) LIMIT 1) FROM ( SE

Re: [PR] chore: Use Git tag as Comet version when publishing Docker images [datafusion-comet]

2024-08-21 Thread via GitHub
andygrove merged PR #857: URL: https://github.com/apache/datafusion-comet/pull/857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] chore: Add CometColumnarToRowExec [datafusion-comet]

2024-08-21 Thread via GitHub
andygrove commented on PR #844: URL: https://github.com/apache/datafusion-comet/pull/844#issuecomment-2301822573 Some of the Spark tests are currently failing. I likely need to update them to recognize `CometColumnarToRowExec`. -- This is an automated message from the Apache Git Service.

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724905332 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -143,6 +402,25 @@ pub trait GroupsAccumulator: Send { /// [`Accumulator::state`]: crate::accumula

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724904571 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -509,6 +523,15 @@ impl GroupedHashAggregateStream { None }; +// Che

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1724904571 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -509,6 +523,15 @@ impl GroupedHashAggregateStream { None }; +// Che

Re: [PR] feat: Implement to_json for subset of types [datafusion-comet]

2024-08-21 Thread via GitHub
edmondop commented on code in PR #805: URL: https://github.com/apache/datafusion-comet/pull/805#discussion_r1724944483 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1210,6 +1210,58 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wi

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1725009117 ## datafusion/physical-plan/src/aggregates/group_values/row.rs: ## @@ -121,16 +135,31 @@ impl GroupValues for GroupValuesRows { create_hashes(cols, &sel

Re: [PR] Fix: count aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12085: URL: https://github.com/apache/datafusion/pull/12085#issuecomment-2302017338 Thanks @HuSen8891 -- I double checked and it does seem to be fixed on main. However, I don't think we have the test. Could you please update the PR to just have the additional

Re: [PR] Add additional regexp function regexp_count() [datafusion]

2024-08-21 Thread via GitHub
xinlifoobar commented on code in PR #12080: URL: https://github.com/apache/datafusion/pull/12080#discussion_r1725056193 ## datafusion/functions/src/regex/regexpcount.rs: ## @@ -0,0 +1,561 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Add additional regexp function regexp_count() [datafusion]

2024-08-21 Thread via GitHub
xinlifoobar commented on code in PR #12080: URL: https://github.com/apache/datafusion/pull/12080#discussion_r1725056193 ## datafusion/functions/src/regex/regexpcount.rs: ## @@ -0,0 +1,561 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[PR] `array_has` avoid row converter for string type [datafusion]

2024-08-21 Thread via GitHub
jayzhan211 opened a new pull request, #12097: URL: https://github.com/apache/datafusion/pull/12097 ## Which issue does this PR close? Part of #12062 ## Rationale for this change Row converter is quite slow for string types, it is also an issue for multi g

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-21 Thread via GitHub
jayzhan211 commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1725075785 ## datafusion/functions-nested/src/array_has.rs: ## @@ -261,11 +251,13 @@ enum ComparisonType { Single, } -fn general_array_has_dispatch( +/// Public fu

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-21 Thread via GitHub
jayzhan211 commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1725078425 ## datafusion/functions-nested/src/array_has.rs: ## @@ -323,3 +316,97 @@ fn general_array_has_dispatch( } Ok(Arc::new(boolean_builder.finish())) } +

Re: [PR] feat: Use CometPlugin as main entrypoint [datafusion-comet]

2024-08-21 Thread via GitHub
andygrove commented on code in PR #853: URL: https://github.com/apache/datafusion-comet/pull/853#discussion_r1725109058 ## spark/src/main/scala/org/apache/spark/Plugins.scala: ## @@ -44,6 +45,20 @@ class CometDriverPlugin extends DriverPlugin with Logging with ShimCometDriverPl

Re: [I] Document "how to read an explain plan" [datafusion]

2024-08-21 Thread via GitHub
jstirnaman commented on issue #12088: URL: https://github.com/apache/datafusion/issues/12088#issuecomment-2302140276 > InfluxData has internal documentation about how to read an explain plan that we would like to donate to the public docs (both to help others as well as to have help mai

[PR] fix: UDF, UDAF, UDWF with_alias(..) should wrap the inner function fully [datafusion]

2024-08-21 Thread via GitHub
Blizzara opened a new pull request, #12098: URL: https://github.com/apache/datafusion/pull/12098 ## Which issue does this PR close? ## Rationale for this change While calling `with_alias` on `MakeArray`, we noticed some weird type failures: ``` Error: Error(Inner { cause:

Re: [PR] fix: UDF, UDAF, UDWF with_alias(..) should wrap the inner function fully [datafusion]

2024-08-21 Thread via GitHub
Blizzara commented on code in PR #12098: URL: https://github.com/apache/datafusion/pull/12098#discussion_r1725125980 ## datafusion/expr/src/udaf.rs: ## @@ -442,7 +442,7 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { /// not implement the method, returns an error. Orde

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-21 Thread via GitHub
Omega359 commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1725170498 ## datafusion/functions-nested/src/array_has.rs: ## @@ -261,11 +251,13 @@ enum ComparisonType { Single, } -fn general_array_has_dispatch( +/// Public func

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-21 Thread via GitHub
Omega359 commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1725170498 ## datafusion/functions-nested/src/array_has.rs: ## @@ -261,11 +251,13 @@ enum ComparisonType { Single, } -fn general_array_has_dispatch( +/// Public func

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1725184008 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -52,8 +71,248 @@ impl EmitTo { std::mem::swap(v, &mut t); t

Re: [PR] Generic FlightTableFactory with a default FlightSqlDriver [datafusion]

2024-08-21 Thread via GitHub
phillipleblanc commented on PR #11938: URL: https://github.com/apache/datafusion/pull/11938#issuecomment-2302261267 @ccciudatu Just seeing this now - I'm one of the maintainers of https://github.com/datafusion-contrib/datafusion-table-providers. I think it would be useful to have a single c

Re: [PR] `array_has` avoid row converter for string type [datafusion]

2024-08-21 Thread via GitHub
Omega359 commented on code in PR #12097: URL: https://github.com/apache/datafusion/pull/12097#discussion_r1725220600 ## datafusion/functions-nested/src/array_has.rs: ## @@ -323,3 +316,97 @@ fn general_array_has_dispatch( } Ok(Arc::new(boolean_builder.finish())) } + +/

Re: [PR] Fix wildcard expansion for `HAVING` clause [datafusion]

2024-08-21 Thread via GitHub
goldmedal commented on code in PR #12046: URL: https://github.com/apache/datafusion/pull/12046#discussion_r1725272486 ## datafusion/sql/src/select.rs: ## @@ -766,9 +769,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { check_columns_satisfy_exprs( Review Comment:

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #11977: URL: https://github.com/apache/datafusion/pull/11977#issuecomment-2302352682 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Fix wildcard expansion for `HAVING` clause [datafusion]

2024-08-21 Thread via GitHub
goldmedal commented on code in PR #12046: URL: https://github.com/apache/datafusion/pull/12046#discussion_r1725283288 ## datafusion/sql/src/utils.rs: ## @@ -119,9 +121,34 @@ pub(crate) fn check_columns_satisfy_exprs( _ => check_column_satisfies_expr(columns, e, mess

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #11977: URL: https://github.com/apache/datafusion/pull/11977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Use `schema_name` to create the `physical_name` [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #11977: URL: https://github.com/apache/datafusion/pull/11977#issuecomment-2302358257 Marked as API change and updated description to note it removes `create_function_physical_name` which is public: https://docs.rs/datafusion/latest/datafusion/logical_expr/expr/fn.creat

Re: [PR] Add new user doc to translate logical plan to physical plan [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12026: URL: https://github.com/apache/datafusion/pull/12026#issuecomment-2302359477 Thanks again @jc4x4 and @edmondop -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add new user doc to translate logical plan to physical plan [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12026: URL: https://github.com/apache/datafusion/pull/12026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Library Guide: Building LogicalPlans [datafusion]

2024-08-21 Thread via GitHub
alamb closed issue #7306: Library Guide: Building LogicalPlans URL: https://github.com/apache/datafusion/issues/7306 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Speed up rpad implementation to use StringBuilder [datafusion]

2024-08-21 Thread via GitHub
alamb closed issue #11997: Speed up rpad implementation to use StringBuilder URL: https://github.com/apache/datafusion/issues/11997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Improve rpad udf by using a GenericStringBuilder [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12070: URL: https://github.com/apache/datafusion/pull/12070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve rpad udf by using a GenericStringBuilder [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12070: URL: https://github.com/apache/datafusion/pull/12070#discussion_r1725289838 ## datafusion/functions/src/unicode/rpad.rs: ## @@ -84,170 +87,182 @@ impl ScalarUDFImpl for RPadFunc { } fn invoke(&self, args: &[ColumnarValue]) -> Res

Re: [PR] fix: Panic non-integer for the second argument of `nth_value` function [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12076: URL: https://github.com/apache/datafusion/pull/12076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Panic in `NTH_VALUE()` window function (SQLancer) [datafusion]

2024-08-21 Thread via GitHub
alamb closed issue #12073: Panic in `NTH_VALUE()` window function (SQLancer) URL: https://github.com/apache/datafusion/issues/12073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Add new chart to show speedup in seconds [datafusion-benchmarks]

2024-08-21 Thread via GitHub
andygrove opened a new pull request, #13: URL: https://github.com/apache/datafusion-benchmarks/pull/13 It is useful to see speedup in absolute number of seconds as well as percentage speedup -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] fix: Panic non-integer for the second argument of `nth_value` function [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12076: URL: https://github.com/apache/datafusion/pull/12076#issuecomment-2302366940 Thanks @Weijun-H and @crepererum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Remove vestigal `datafusion-docs` module compilation [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12081: URL: https://github.com/apache/datafusion/pull/12081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Remove vestigal `datafusion-docs` module compilation [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12081: URL: https://github.com/apache/datafusion/pull/12081#issuecomment-2302367469 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[I] [EPIC] Support all TPC-DS queries natively [datafusion-comet]

2024-08-21 Thread via GitHub
andygrove opened a new issue, #858: URL: https://github.com/apache/datafusion-comet/issues/858 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- This

[PR] minor: SortExec measure elapsed_compute time when sorting [datafusion]

2024-08-21 Thread via GitHub
mhilton opened a new pull request, #12099: URL: https://github.com/apache/datafusion/pull/12099 Whilst investigating query execution performance I noticed that some SortExec nodes were reporting suspiciously short elapsed_compute times. It appears that the SortExec node wasn't running the e

Re: [PR] Improve split_part udf by using a GenericStringBuilder [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12093: URL: https://github.com/apache/datafusion/pull/12093#discussion_r1725294671 ## datafusion/functions/src/string/split_part.rs: ## @@ -105,13 +108,70 @@ impl ScalarUDFImpl for SplitPartFunc { } } -macro_rules! process_split_part { -

Re: [PR] minor: SortExec measure elapsed_compute time when sorting [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12099: URL: https://github.com/apache/datafusion/pull/12099#discussion_r1725332206 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -552,7 +552,9 @@ impl ExternalSorter { let fetch = self.fetch; let expressions = Arc::clone(&s

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-21 Thread via GitHub
alamb commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2302442887 > Just want to highlight the existing tool authored by @matthewmturner one more time -- perhaps there is no need for another terminal app as the existing one (datafusion-tui) seem

Re: [PR] Add test to verify `count` aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
HuSen8891 closed pull request #12085: Add test to verify `count` aggregate function should not be nullable URL: https://github.com/apache/datafusion/pull/12085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Add test to verify `count` aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
HuSen8891 commented on PR #12085: URL: https://github.com/apache/datafusion/pull/12085#issuecomment-2302452804 > Thanks @HuSen8891 -- I double checked and it does seem to be fixed on main. However, I don't think we have the test. > > Could you please update the PR to just have the add

Re: [PR] feat: Support sort merge join with a join condition [datafusion-comet]

2024-08-21 Thread via GitHub
comphead commented on code in PR #553: URL: https://github.com/apache/datafusion-comet/pull/553#discussion_r1725369561 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -336,4 +335,68 @@ class CometJoinSuite extends CometTestBase { } } } +

Re: [PR] Improve documentation on `StringArrayType` trait [datafusion]

2024-08-21 Thread via GitHub
comphead commented on code in PR #12027: URL: https://github.com/apache/datafusion/pull/12027#discussion_r1725372709 ## datafusion/functions/src/string/common.rs: ## @@ -252,7 +254,69 @@ impl<'a> ColumnarValueRef<'a> { } } +/// Abstracts iteration over different types of

[PR] Add test to verify count aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
HuSen8891 opened a new pull request, #12100: URL: https://github.com/apache/datafusion/pull/12100 ## Which issue does this PR close? Closes #12077 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ## Are

[PR] feat: Enable `clippy::clone_on_ref_ptr` on `proto` and `spark_exprs` crates [datafusion-comet]

2024-08-21 Thread via GitHub
comphead opened a new pull request, #859: URL: https://github.com/apache/datafusion-comet/pull/859 ## Which issue does this PR close? Related #690 . ## Rationale for this change Enable lint to make easier for the code reader/reviewer to identify if its lightweigt

Re: [PR] minor: SortExec measure elapsed_compute time when sorting [datafusion]

2024-08-21 Thread via GitHub
mhilton commented on code in PR #12099: URL: https://github.com/apache/datafusion/pull/12099#discussion_r1725428538 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -552,7 +552,9 @@ impl ExternalSorter { let fetch = self.fetch; let expressions = Arc::clone(

Re: [I] A simple count() query caused Internal Error in PhysicalOptimizer (SQLancer) [datafusion]

2024-08-21 Thread via GitHub
alamb closed issue #12077: A simple count() query caused Internal Error in PhysicalOptimizer (SQLancer) URL: https://github.com/apache/datafusion/issues/12077 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add test to verify count aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12100: URL: https://github.com/apache/datafusion/pull/12100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: UDF, UDAF, UDWF with_alias(..) should wrap the inner function fully [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12098: URL: https://github.com/apache/datafusion/pull/12098#discussion_r1725435478 ## datafusion/expr/src/udaf.rs: ## @@ -442,7 +442,7 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { /// not implement the method, returns an error. Order i

Re: [PR] Remove `AggregateExpr` trait [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12096: URL: https://github.com/apache/datafusion/pull/12096#issuecomment-2302577356 Marking as draft as CI is currnetly not passing so I don't think this is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Minor: Extract `BatchCoalescer` to its own module [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12047: URL: https://github.com/apache/datafusion/pull/12047#issuecomment-2302582336 Thank you for the review @ozankabak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Minor: Extract `BatchCoalescer` to its own module [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12047: URL: https://github.com/apache/datafusion/pull/12047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add test to verify `count` aggregate function should not be nullable [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12085: URL: https://github.com/apache/datafusion/pull/12085#issuecomment-2302584009 > Sure. Current branch is deleted, I'll add the test in another PR. PR was https://github.com/apache/datafusion/pull/12100 -- This is an automated message from the Apache Git S

Re: [PR] minor: SortExec measure elapsed_compute time when sorting [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12099: URL: https://github.com/apache/datafusion/pull/12099#discussion_r1725449158 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -552,7 +552,9 @@ impl ExternalSorter { let fetch = self.fetch; let expressions = Arc::clone(&s

Re: [PR] chore: Add documentation on running benchmarks with Microk8s [datafusion-comet]

2024-08-21 Thread via GitHub
andygrove commented on code in PR #848: URL: https://github.com/apache/datafusion-comet/pull/848#discussion_r1725452044 ## benchmarks/README.md: ## @@ -0,0 +1,104 @@ + + +# Running Comet Benchmarks in Microk8s + +This guide explains how to run benchmarks derived from TPC-H and T

Re: [PR] Support string concat || for StringViewArray [datafusion]

2024-08-21 Thread via GitHub
dharanad commented on PR #12063: URL: https://github.com/apache/datafusion/pull/12063#issuecomment-2302594492 @alamb / @XiangpengHao Can you please help me with a review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Support string concat || for StringViewArray [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12063: URL: https://github.com/apache/datafusion/pull/12063#issuecomment-2302595649 I will review this later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Support `generate_series` with interval less than a day [datafusion]

2024-08-21 Thread via GitHub
Omega359 commented on issue #12052: URL: https://github.com/apache/datafusion/issues/12052#issuecomment-2302621615 related? #11822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] fix: Produce buffered null join row only if all joined rows are failed on join filter in SMJ full join [datafusion]

2024-08-21 Thread via GitHub
viirya commented on PR #12090: URL: https://github.com/apache/datafusion/pull/12090#issuecomment-2302649298 Thanks @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1725492996 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [PR] fix: Produce buffered null join row only if all joined rows are failed on join filter in SMJ full join [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12090: URL: https://github.com/apache/datafusion/pull/12090#issuecomment-2302651380 FYI @richox -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] feat: Support sort merge join with a join condition [datafusion-comet]

2024-08-21 Thread via GitHub
viirya commented on code in PR #553: URL: https://github.com/apache/datafusion-comet/pull/553#discussion_r1725532454 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -336,4 +335,68 @@ class CometJoinSuite extends CometTestBase { } } } + +

Re: [PR] feat: Simplify configs for enabling/disabling operators [datafusion-comet]

2024-08-21 Thread via GitHub
viirya merged PR #855: URL: https://github.com/apache/datafusion-comet/pull/855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: Simplify configs for enabling/disabling operators [datafusion-comet]

2024-08-21 Thread via GitHub
viirya commented on PR #855: URL: https://github.com/apache/datafusion-comet/pull/855#issuecomment-2302718874 Thanks @andygrove @parthchandra @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Support string concat || for StringViewArray [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12063: URL: https://github.com/apache/datafusion/pull/12063#issuecomment-2302727433 > Weird logictest are passing locally also the error cause is wrong. Likewise I found it strange the tests were passing locally for me. @dharanad, I took the liberty of m

Re: [PR] feat: Support sort merge join with a join condition [datafusion-comet]

2024-08-21 Thread via GitHub
viirya commented on code in PR #553: URL: https://github.com/apache/datafusion-comet/pull/553#discussion_r1725551073 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -336,4 +335,68 @@ class CometJoinSuite extends CometTestBase { } } } + +

Re: [PR] feat: Enable `clippy::clone_on_ref_ptr` on `proto` and `spark_exprs` crates [datafusion-comet]

2024-08-21 Thread via GitHub
comphead merged PR #859: URL: https://github.com/apache/datafusion-comet/pull/859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

[I] Cannot infer common string type for string concat operation Dictionary(Int32, Utf8) || Dictionary(Int32, Utf8) [datafusion]

2024-08-21 Thread via GitHub
alamb opened a new issue, #12101: URL: https://github.com/apache/datafusion/issues/12101 ### Describe the bug When I try to concat two dictionary encoded columns it doesn't work ### To Reproduce Concat constants ```sql > select arrow_cast('foo', 'Diction

Re: [I] Cannot infer common string type for string concat operation Dictionary(Int32, Utf8) || Dictionary(Int32, Utf8) [datafusion]

2024-08-21 Thread via GitHub
alamb commented on issue #12101: URL: https://github.com/apache/datafusion/issues/12101#issuecomment-2302737828 I think this is a good first issue as there is a clear and small reproducer and the place in the code is known It should also include some new sqllogictests Here are

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-21 Thread via GitHub
comphead commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1725560272 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

[PR] Minor: rename `dictionary_coercion` to `dictionary_comparison_coercion`, add comments [datafusion]

2024-08-21 Thread via GitHub
alamb opened a new pull request, #12102: URL: https://github.com/apache/datafusion/pull/12102 ## Which issue does this PR close? Closes #. ## Rationale for this change `dictionary_coercion` implies (to me) that this function applies to all dictionary coer

Re: [PR] Minor: rename `dictionary_coercion` to `dictionary_comparison_coercion`, add comments [datafusion]

2024-08-21 Thread via GitHub
alamb commented on code in PR #12102: URL: https://github.com/apache/datafusion/pull/12102#discussion_r1725571665 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -883,7 +893,7 @@ fn both_numeric_or_null_and_numeric(lhs_type: &DataType, rhs_type: &DataType) -> ///

Re: [PR] Improve documentation on `StringArrayType` trait [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12027: URL: https://github.com/apache/datafusion/pull/12027#issuecomment-2302779575 Thank you for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Improve documentation on `StringArrayType` trait [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #12027: URL: https://github.com/apache/datafusion/pull/12027#issuecomment-2302777907 @Omega359 and @XiangpengHao -- what do you think we should do with the conversation above? https://github.com/apache/datafusion/pull/12027#issuecomment-2295332991 I can't tell if we

Re: [I] Update `STRPOS` scalar function to support Utf8View [datafusion]

2024-08-21 Thread via GitHub
alamb closed issue #11951: Update `STRPOS` scalar function to support Utf8View URL: https://github.com/apache/datafusion/issues/11951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] [Epic] Native `StringView` support for string functions [datafusion]

2024-08-21 Thread via GitHub
alamb commented on issue #11790: URL: https://github.com/apache/datafusion/issues/11790#issuecomment-2302792275 We are making pretty good progress here -- just a few more functions left 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add Utf8View support to STRPOS function [datafusion]

2024-08-21 Thread via GitHub
alamb merged PR #12087: URL: https://github.com/apache/datafusion/pull/12087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-21 Thread via GitHub
alamb commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2302848992 I plan to take another look at this tomorrow morning (again with fresh eyes -- lol) -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] feat: Enable `clippy::clone_on_ref_ptr` on `core` crate [datafusion-comet]

2024-08-21 Thread via GitHub
comphead opened a new pull request, #860: URL: https://github.com/apache/datafusion-comet/pull/860 ## Which issue does this PR close? Closes #690 . ## Rationale for this change ## What changes are included in this PR? ## How are these change

Re: [PR] Update itertools requirement from 0.12 to 0.13 [datafusion]

2024-08-21 Thread via GitHub
korowa commented on PR #10556: URL: https://github.com/apache/datafusion/pull/10556#issuecomment-2302871001 It was just a rebase against main (will do merge in future) with `cargo update itertools@0.13.0` in process for datafusion-cli lock file. -- This is an automated message from the Ap

Re: [PR] Update itertools requirement from 0.12 to 0.13 [datafusion]

2024-08-21 Thread via GitHub
korowa merged PR #10556: URL: https://github.com/apache/datafusion/pull/10556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

  1   2   >