Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

via GitHub Fri, 19 Apr 2024 21:21:27 -0700


liucao-dd commented on code in PR #46045:
URL: https://github.com/apache/spark/pull/46045#discussion_r1573144905



##########
python/pyspark/sql/connect/functions/builtin.py:
##########
@@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", 
int]) -> Column:
 repeat.__doc__ = pysparkfuncs.repeat.__doc__
 
 
-def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column:
-    return _invoke_function("split", _to_col(str), lit(pattern), lit(limit))
+def split(
+    str: "ColumnOrName",
+    pattern: Union[Column, str],
+    limit: Union["ColumnOrName", int] = -1,
+) -> Column:
+    # work around shadowing of str in the input variable name
+    from builtins import str as py_str
+
+    if isinstance(pattern, py_str):
+        _pattern = lit(pattern)
+    elif isinstance(pattern, Column):
+        _pattern = pattern
+    else:
+        raise PySparkTypeError(
+            error_class="NOT_COLUMN_OR_STR",
+            message_parameters={"arg_name": "pattern", "arg_type": 
type(pattern).__name__},
+        )
+
+    limit = lit(limit) if isinstance(limit, int) else _to_col(limit)
+    return _invoke_function("split", _to_col(str), _pattern, limit)

Review Comment:
   Sure. I removed the type check now. Maybe in the future we can standard 
this, e.g. with a decorator that inspect the function signature and do type 
check accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

Reply via email to