[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

henryr Mon, 16 Apr 2018 09:21:26 -0700

Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/21049
  
    In SQL, the sort in a subquery doesn't make sense because of the relational 
model - the output of a subquery is an unordered bag of tuples. Some engines 
still allow the sort, some silently drop it and some throw an error.
    
    For example: 
    
    * MariaDB: 
https://mariadb.com/kb/en/library/why-is-order-by-in-a-from-subquery-ignored/
    * SQL Server: 
https://stackoverflow.com/questions/985921/sql-error-with-order-by-in-subquery
     
    Oracle and Postgres allow the `ORDER BY`.
    
    One issue might be that the underlying dataframe model might not be 100% 
relational - maybe dataframes _are_ sorted lists of rows and then this 
optimization would only be valid if using the SQL interface. If so, it's 
probably not worth the effort to maintain. But if dataframes and SQL relations 
are supposed to be equivalent, we can drop the `ORDER BY`.
    
    We also may want to decide not to do this because it would surprise users 
who had been relying on the existing behavior.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

Reply via email to