[ https://issues.apache.org/jira/browse/SPARK-20686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li updated SPARK-20686: ---------------------------- Fix Version/s: (was: 2.2.1) (was: 2.3.0) 2.2.0 > PropagateEmptyRelation incorrectly handles aggregate without grouping > expressions > --------------------------------------------------------------------------------- > > Key: SPARK-20686 > URL: https://issues.apache.org/jira/browse/SPARK-20686 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 2.1.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Labels: correctness > Fix For: 2.1.2, 2.2.0 > > > The query > {code} > SELECT 1 FROM (SELECT COUNT(*) WHERE FALSE) t1 > {code} > should return a single row of output because the subquery is an aggregate > without a group-by and thus should return a single row. However, Spark > incorrectly returns zero rows. > This is caused by SPARK-16208, a patch which added an optimizer rule to > propagate EmptyRelation through operators. The logic for handling aggregates > is wrong: it checks whether aggregate expressions are non-empty for deciding > whether the output should be empty, whereas it should be checking grouping > expressions instead: > An aggregate with non-empty group expression will return one output row per > group. If the input to the grouped aggregate is empty then all groups will be > empty and thus the output will be empty. It doesn't matter whether the SELECT > statement includes aggregate expressions since that won't affect the number > of output rows. > If the grouping expressions are empty, however, then the aggregate will > always produce a single output row and thus we cannot propagate the > EmptyRelation. > The current implementation is incorrect (since it returns a wrong answer) and > also misses an optimization opportunity by not propagating EmptyRelation in > the case where a grouped aggregate has aggregate expressions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org