srowen commented on issue #23882: [SPARK-26979][PySpark][WIP] Add missing 
column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#issuecomment-466800627
 
 
   I see. Maybe there's a good reason asc/desc should only take a named column, 
not sure. Well, we can leave that alone for now if in doubt.
   
   I think it's OK to focus here on fixing the duplicated definition, and the 
small change to `_create_function` that lets it work consistently with other 
similar Pyspark methods.
   
   On the Scala side, on second thought, I'm not sure whether it's better to 
add all the String-based methods, or deprecate them. On the Scala side the 
syntax is `$"col"` so there's already just 1 character of difference for the 
caller. On the Pyspark side it happens to be easy to support both versions with 
no additional declarations, and there's an argument that supporting strings is 
its best attempt to mimic the "$" syntax. Still I take the point about 
difference across APIs not being great, but there are going to be bigger ones 
anyway. This I could see taking up separately, too.
   
   For dataframe.py, I actually didn't see many places this comes up. I found 
`sampleBy` which supports string or Column, but `approxQuantile` which doesn't 
seem to. `drop` might make sense as taking only strings as it does now. It 
could be reasonable to fix `approxQuantile` here, or in a follow up.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to