Github user henryr commented on the issue:
https://github.com/apache/spark/pull/21049
In SQL, the sort in a subquery doesn't make sense because of the relational
model - the output of a subquery is an unordered bag of tuples. Some engines
still allow the sort, some silently drop it and some throw an error.
For example:
* MariaDB:
https://mariadb.com/kb/en/library/why-is-order-by-in-a-from-subquery-ignored/
* SQL Server:
https://stackoverflow.com/questions/985921/sql-error-with-order-by-in-subquery
Oracle and Postgres allow the `ORDER BY`.
One issue might be that the underlying dataframe model might not be 100%
relational - maybe dataframes _are_ sorted lists of rows and then this
optimization would only be valid if using the SQL interface. If so, it's
probably not worth the effort to maintain. But if dataframes and SQL relations
are supposed to be equivalent, we can drop the `ORDER BY`.
We also may want to decide not to do this because it would surprise users
who had been relying on the existing behavior.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]