Re: [PR] [SPARK-49597][PYTHON][CONNECT] Support non-column arguments in UDTF for simpler usage [spark]

via GitHub Wed, 11 Sep 2024 18:04:20 -0700


zhengruifeng commented on PR #48074:
URL: https://github.com/apache/spark/pull/48074#issuecomment-2345053140


   I think we should keep in line with UDF in this case:
   ```
   In [5]: slen = udf(lambda s: len(s), IntegerType())
   
   In [6]: add_one = udf(lambda s: s + 1, IntegerType())
   
   In [7]: spark.range(10).select(add_one("id")).show()
   +------------+
   |<lambda>(id)|
   +------------+
   |           1|
   |           2|
   |           3|
   |           4|
   |           5|
   |           6|
   |           7|
   |           8|
   |           9|
   |          10|
   +------------+
   
   
   In [8]: spark.range(10).select(add_one(1)).show()
   ---------------------------------------------------------------------------
   PySparkTypeError                          Traceback (most recent call last)
   Cell In[8], line 1
   ----> 1 spark.range(10).select(add_one(1)).show()
   
   File ~/Dev/spark/python/pyspark/sql/udf.py:495, in 
UserDefinedFunction._wrapped.<locals>.wrapper(*args, **kwargs)
       493 @functools.wraps(self.func, assigned=assignments)
       494 def wrapper(*args: "ColumnOrName", **kwargs: "ColumnOrName") -> 
Column:
   --> 495     return self(*args, **kwargs)
   
   File ~/Dev/spark/python/pyspark/sql/udf.py:405, in 
UserDefinedFunction.__call__(self, *args, **kwargs)
       402 sc = get_active_spark_context()
       404 assert sc._jvm is not None
   --> 405 jcols = [_to_java_column(arg) for arg in args] + [
       406     sc._jvm.PythonSQLUtils.namedArgumentExpression(key, 
_to_java_column(value))
       407     for key, value in kwargs.items()
       408 ]
       410 profiler_enabled = sc._conf.get("spark.python.profile", "false") == 
"true"
       411 memory_profiler_enabled = 
sc._conf.get("spark.python.profile.memory", "false") == "true"
   
   File ~/Dev/spark/python/pyspark/sql/classic/column.py:71, in 
_to_java_column(col)
        69     jcol = _create_column_from_name(col)
        70 else:
   ---> 71     raise PySparkTypeError(
        72         errorClass="NOT_COLUMN_OR_STR",
        73         messageParameters={"arg_name": "col", "arg_type": 
type(col).__name__},
        74     )
        75 return jcol
   
   PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `col` should be a Column or 
str, got int.
   ```
   
   Python UDF treat `str` argument as column name, and it doesn't accept other 
non-column arguments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49597][PYTHON][CONNECT] Support non-column arguments in UDTF for simpler usage [spark]

Reply via email to