asmello commented on issue #23882: [SPARK-26979][PYTHON] Add missing string column name support for some SQL functions URL: https://github.com/apache/spark/pull/23882#issuecomment-471270276 > What do we do for something like split, regexp_extract? Same thing, the column argument can be a name or a column object, but the value is strictly a literal. The Scala API doesn't actually provide overloads in this case. In `regexp_replace`, though, there is indeed an overload which allows passing column objects or literals. The reason this is not exposed by PySpark is that a Python Column cannot be passed directly to a jvm call, so `regexp_replace(col("foo"), col("bar"), col("x"))` fails. But there are no checks in the implementation that would prevent this usage otherwise. This could be easily fixed. For instance in the implementation of `array_contains`, we have: ```python return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) ``` and if we added ```python value = value._jc if isinstance(value, Column) else value return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) ``` then `array_contains(col("foo"), col("bar"))` would be supported, but `array_contains("foo", "bar")` would still work as expected ("bar" would be handled as a literal). This would be extremely useful, but I also find it confusing, so I won't propose a change myself in this case.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
