srowen commented on issue #23882: [SPARK-26979][PySpark][WIP] Add missing column name support for some SQL functions URL: https://github.com/apache/spark/pull/23882#issuecomment-466800627 I see. Maybe there's a good reason asc/desc should only take a named column, not sure. Well, we can leave that alone for now if in doubt. I think it's OK to focus here on fixing the duplicated definition, and the small change to `_create_function` that lets it work consistently with other similar Pyspark methods. On the Scala side, on second thought, I'm not sure whether it's better to add all the String-based methods, or deprecate them. On the Scala side the syntax is `$"col"` so there's already just 1 character of difference for the caller. On the Pyspark side it happens to be easy to support both versions with no additional declarations, and there's an argument that supporting strings is its best attempt to mimic the "$" syntax. Still I take the point about difference across APIs not being great, but there are going to be bigger ones anyway. This I could see taking up separately, too. For dataframe.py, I actually didn't see many places this comes up. I found `sampleBy` which supports string or Column, but `approxQuantile` which doesn't seem to. `drop` might make sense as taking only strings as it does now. It could be reasonable to fix `approxQuantile` here, or in a follow up.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
