Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19056#discussion_r135360650 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala --- @@ -65,11 +66,12 @@ object PropagateEmptyRelation extends Rule[LogicalPlan] with PredicateHelper { case _: RepartitionByExpression => empty(p) // An aggregate with non-empty group expression will return one output row per group when the // input to the aggregate is not empty. If the input to the aggregate is empty then all groups - // will be empty and thus the output will be empty. + // will be empty and thus the output will be empty. If we're working on batch data, we can + // then treat the aggregate as redundant. // // If the grouping expressions are empty, however, then the aggregate will always produce a // single output row and thus we cannot propagate the EmptyRelation. - case Aggregate(ge, _, _) if ge.nonEmpty => empty(p) + case Aggregate(ge, _, _) if ge.nonEmpty and !p.isStreaming => empty(p) --- End diff -- Can you add to the docs above why we are avoiding this when its streaming.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org