asmello opened a new pull request #23882: [SPARK-26979][PySpark][WIP] Add missing column name support for some SQL functions URL: https://github.com/apache/spark/pull/23882 ## What changes were proposed in this pull request? Most SQL functions defined in `spark.sql.functions` have two calling patterns, one with a Column object as input, and another with a string representing a column name, which is then converted into a Column object internally. There are, however, a few notable exceptions: - lower() - upper() - abs() - bitwiseNOT() - ltrim() - rtrim() - trim() - ascii() - base64() - unbase64() While this doesn't break anything, as you can easily create a Column object yourself prior to passing it to one of these functions, it has two undesirable consequences: 1. It is surprising - it breaks coder's expectations when they are first starting with Spark. Every API should be as consistent as possible, so as to make the learning curve smoother and to reduce causes for human error; 2. It gets in the way of stylistic conventions. Most of the time it makes Python/Scala/Java code more readable to use literal names, and the API provides ample support for that, but these few exceptions prevent this pattern from being universally applicable. This is a very simple fix, and I see no reason not to apply it. ### Side effects This PR also fixes an issue with some functions being defined multiple times by using `_create_function()`. ## How was this patch tested? Running ./dev/run-tests and testing manually. (WIP)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
