[GitHub] asmello opened a new pull request #23882: [SPARK-26979][PySpark][WIP] Add missing column name support for some SQL functions

GitBox Sun, 24 Feb 2019 05:05:06 -0800

asmello opened a new pull request #23882: [SPARK-26979][PySpark][WIP] Add 
missing column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882
 
 
   ## What changes were proposed in this pull request?
   
   Most SQL functions defined in `spark.sql.functions` have two calling 
patterns, one with a Column object as input, and another with a string 
representing a column name, which is then converted into a Column object 
internally.
   
   There are, however, a few notable exceptions:
   
   - lower()
   - upper()
   - abs()
   - bitwiseNOT()
   - ltrim()
   - rtrim()
   - trim()
   - ascii()
   - base64()
   - unbase64()
   
   While this doesn't break anything, as you can easily create a Column object 
yourself prior to passing it to one of these functions, it has two undesirable 
consequences:
   
   1. It is surprising - it breaks coder's expectations when they are first 
starting with Spark. Every API should be as consistent as possible, so as to 
make the learning curve smoother and to reduce causes for human error;
   
   2. It gets in the way of stylistic conventions. Most of the time it makes 
Python/Scala/Java code more readable to use literal names, and the API provides 
ample support for that, but these few exceptions prevent this pattern from 
being universally applicable.
   
   This is a very simple fix, and I see no reason not to apply it.
   
   ### Side effects
   
   This PR also fixes an issue with some functions being defined multiple times 
by using `_create_function()`.
   
   ## How was this patch tested?
   
   Running ./dev/run-tests and testing manually. (WIP)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] asmello opened a new pull request #23882: [SPARK-26979][PySpark][WIP] Add missing column name support for some SQL functions

Reply via email to