Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
I might be a bit of a hardliner on this, but I think it's correct to
eliminate the {{ORDER BY}} from common table expressions (e.g. MSSQL agrees
with me, see [this
link](https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017#guidelines-for-creating-and-using-common-table-expressions)).
However, given the principle of least surprise, I agree it might be a good
idea to at least start with scalar and nested subqueries, and leave inline
views for another day. That might be a bit harder to do (I think the rule will
need a whitelist of operators it's ok to eliminate sorts below), and in general
I think there'll be some missed opportunities, but it's a start :)
Alternatively we could extend the analyzed logical plan to explicitly mark
the different subquery types (i.e. have a `InlineView` node, a `NestedSubquery`
node and so on). That would make these optimizations easier to express, but I
have some reservations about the semantics of introducing those nodes. What do
you think @dilipbiswal / @gatorsmile ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]