asmello edited a comment on issue #23879: [SPARK-26979][SQL] Add missing column 
name support for SQL functions
URL: https://github.com/apache/spark/pull/23879#issuecomment-466769335
 
 
   > These generics are only applied to `functions.py`. Can you then whitelist 
one by one?
   
   I had whitelisted one by one at `_create_function()` uses between lines 
257-273, but I forgot to do the same for line 1451. My bad, I truly apologise. 
Turns out all functions defined there are exceptions too.
   
   Still the point stands that uses of `_create_function()` are the only cases 
where the JVM function is called directly without argument conversion, and with 
those functions also taken care of, the SQL API should be fully consistent in 
this regard.
   
   > Also, there are few exceptions like `from_json` that takes `schema` as 
`Column` but also `DataType`
   
   This is a different matter altogether. `schema` is not the column being 
operated at in this case, it's a schema specification, so there's no reason it 
should take a column name here. It could, I suppose, but that sounds like an 
anti-pattern to me.
   
   > Plus, we should make it consistent in `dataframe.py` as well.
   
   What kind of impact does this have there?
   
   > `Column` and string are completely different types. It's not iterable or 
something related with duck-typing.
   
   The same can be said about `max(1, 2, 3)` and `max([1, 2, 3])`, you know. As 
long as they are related semantically, Python encourages different calling 
patterns.
   
   BTW, I've found more than one issue with `_create_function()` being used to 
define the same function more than once. This is dangerous. I feel it should be 
removed altogether.
   
   **I'll close this PR now, as I'm convinced the change should be made on 
PySpark's side.**

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to