hhdri opened a new pull request, #44283:
URL: https://github.com/apache/spark/pull/44283

   Hello everyone,
   
   This is my first contribution to the project. I welcome any feedback and 
edits to improve this pull request.
   
   Issue Addressed:
   Currently, it's possible to create redundant sort expressions in both Scala 
and Python APIs, leading to potentially incorrect and confusing SQL statements. 
For example:
   
   Scala:
   ```scala
   spark.range(10).orderBy($"id".desc.asc).show()
   ```
   Python:
   ```python
   spark.range(10).orderBy(f.desc('id'), ascending=False).show()
   ```
   Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST, 
causing non-descriptive error messages.
   
   Solution:
   This pull request introduces a constraint in the SortOrder class, ensuring 
that its child cannot be another instance of SortOrder. This change prevents 
the creation of nested, redundant sort expressions.
   
   Additionally, in PySpark's DataFrame.sort, there's an ascending keyword 
argument that could conflict with already sorted expressions. I've added an 
exception handler to generate more descriptive error messages in such cases.
   
   Tests:
   A test case has been added to verify that no double ordering occurs after 
this fix.
   
   I look forward to your feedback and thank you for considering this 
contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to