rednaxelafx commented on issue #23658: [SPARK-26735][SQL] Verify plan integrity for special expressions URL: https://github.com/apache/spark/pull/23658#issuecomment-457815273 Thanks for your reviews, @cloud-fan and @viirya ! I've updated the PR addressing your comments. This PR has actually caught a genuine bug in the analyzer in one of the test cases: For this query in `SubquerySuite`, ```sql SELECT * FROM t1 WHERE c1 = (SELECT max(t2.c1) FROM t2 GROUP BY t2.c1 HAVING count(*) >= 1 ORDER BY max(t2.c1)) ``` The analyzer resolves it into: ``` == Analyzed Logical Plan == c1: int, c2: int Project [c1#51, c2#52] +- Filter (c1#51 = scalar-subquery#109 []) : +- Project [max(c1)#137] : +- Sort [max(c1#28) ASC NULLS FIRST], true // NOTE HERE!! : +- Project [max(c1)#137, c1#28] : +- Filter (count(1)#139L >= cast(1 as bigint)) : +- Aggregate [c1#28], [max(c1#28) AS max(c1)#137, count(1) AS count(1)#139L, c1#28] : +- SubqueryAlias `t2` : +- Project [_1#23 AS c1#28, _2#24 AS c2#29] : +- LocalRelation [_1#23, _2#24] +- SubqueryAlias `t1` +- Project [_1#46 AS c1#51, _2#47 AS c2#52] +- LocalRelation [_1#46, _2#47] ``` ... where `Sort [max(c1#28) ASC NULLS FIRST], true` is an example of a `Sort` operator hosting an aggregate expression `max`, incorrectly. It's somewhat tedious to fix because we need to tweak the order a bit. Working on it.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
