rednaxelafx commented on issue #26420: [SPARK-27986][SQL] Support ANSI SQL filter predicate for aggregate expression. URL: https://github.com/apache/spark/pull/26420#issuecomment-552315759 I'd like to propose a solution for the codegen part that'll augment this PR. The overall direction this PR is taking sounds good to me, although I haven't reviewed the full details yet (would like to do that some time this week). I'll prepare a separate PR for demo purposes to show how it'll augment the codegen part. It's actually fairly easy and could also serve as a bit of code clean up for a lot of the declarative aggregate functions. The tl;dr is that I'd like to have explicit support for the user-specified filter clause in the infrastructure, instead of solely relying on a rewrite. A lot of aggregate functions are null-skipping by nature, e.g. `count()`, `sum()`, `avg()` etc. But that's not a property common to ALL possible aggregate functions, and some of them have interesting semantics like `first()`/ `last()` where you can configure whether or not you want to include the nulls as the result, or skip them and only take the non-null values. Having explicit support for the filter clause in the infrastructure ensures that we can properly support this feature, without having to rely on logical rewrite that might work for most aggregate functions and then a handful of exception cases have to be implemented in really ugly ways.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
