asmello commented on issue #23882: [SPARK-26979][PYTHON] Add missing string 
column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882#issuecomment-471270276
 
 
   > What do we do for something like split, regexp_extract?
   
   Same thing, the column argument can be a name or a column object, but the 
value is strictly a literal. The Scala API doesn't actually provide overloads 
in this case.
   
   In `regexp_replace`, though, there is indeed an overload which allows 
passing column objects or literals. The reason this is not exposed by PySpark 
is that a Python Column cannot be passed directly to a jvm call, so 
`regexp_replace(col("foo"), col("bar"), col("x"))` fails. But there are no 
checks in the implementation that would prevent this usage otherwise.
   
   This could be easily fixed. For instance in the implementation of 
`array_contains`, we have:
   
   ```python
   return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
   ```
   
   and if we added
   
   ```python
   value = value._jc if isinstance(value, Column) else value
   return Column(sc._jvm.functions.array_contains(_to_java_column(col), value))
   ```
   
   then `array_contains(col("foo"), col("bar"))` would be supported, but 
`array_contains("foo", "bar")` would still work as expected ("bar" would be 
handled as a literal).
   
   This would be extremely useful, but I also find it confusing, so I won't 
propose a change myself in this case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to