zhengruifeng commented on PR #48074:
URL: https://github.com/apache/spark/pull/48074#issuecomment-2345053140
I think we should keep in line with UDF in this case:
```
In [5]: slen = udf(lambda s: len(s), IntegerType())
In [6]: add_one = udf(lambda s: s + 1, IntegerType())
In [7]: spark.range(10).select(add_one("id")).show()
+------------+
|<lambda>(id)|
+------------+
| 1|
| 2|
| 3|
| 4|
| 5|
| 6|
| 7|
| 8|
| 9|
| 10|
+------------+
In [8]: spark.range(10).select(add_one(1)).show()
---------------------------------------------------------------------------
PySparkTypeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 spark.range(10).select(add_one(1)).show()
File ~/Dev/spark/python/pyspark/sql/udf.py:495, in
UserDefinedFunction._wrapped.<locals>.wrapper(*args, **kwargs)
493 @functools.wraps(self.func, assigned=assignments)
494 def wrapper(*args: "ColumnOrName", **kwargs: "ColumnOrName") ->
Column:
--> 495 return self(*args, **kwargs)
File ~/Dev/spark/python/pyspark/sql/udf.py:405, in
UserDefinedFunction.__call__(self, *args, **kwargs)
402 sc = get_active_spark_context()
404 assert sc._jvm is not None
--> 405 jcols = [_to_java_column(arg) for arg in args] + [
406 sc._jvm.PythonSQLUtils.namedArgumentExpression(key,
_to_java_column(value))
407 for key, value in kwargs.items()
408 ]
410 profiler_enabled = sc._conf.get("spark.python.profile", "false") ==
"true"
411 memory_profiler_enabled =
sc._conf.get("spark.python.profile.memory", "false") == "true"
File ~/Dev/spark/python/pyspark/sql/classic/column.py:71, in
_to_java_column(col)
69 jcol = _create_column_from_name(col)
70 else:
---> 71 raise PySparkTypeError(
72 errorClass="NOT_COLUMN_OR_STR",
73 messageParameters={"arg_name": "col", "arg_type":
type(col).__name__},
74 )
75 return jcol
PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `col` should be a Column or
str, got int.
```
Python UDF treat `str` argument as column name, and it doesn't accept other
non-column arguments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]