[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41867: [SPARK-43964][SQL][PYTHON] Support arrow-optimized Python UDTFs

via GitHub Mon, 10 Jul 2023 01:38:27 -0700


HyukjinKwon commented on code in PR #41867:
URL: https://github.com/apache/spark/pull/41867#discussion_r1257915251



##########
python/pyspark/sql/udtf.py:
##########
@@ -39,15 +42,98 @@ def _create_udtf(
     cls: Type,
     returnType: Union[StructType, str],
     name: Optional[str] = None,
+    evalType: int = PythonEvalType.SQL_TABLE_UDF,
     deterministic: bool = True,
 ) -> "UserDefinedTableFunction":
-    """Create a Python UDTF."""
+    """Create a Python UDTF with the given eval type."""
     udtf_obj = UserDefinedTableFunction(
-        cls, returnType=returnType, name=name, deterministic=deterministic
+        cls, returnType=returnType, name=name, evalType=evalType, 
deterministic=deterministic
     )
+
     return udtf_obj
 
 
+def _create_py_udtf(
+    cls: Type,
+    returnType: Union[StructType, str],
+    name: Optional[str] = None,
+    deterministic: bool = True,
+    useArrow: Optional[bool] = None,
+) -> "UserDefinedTableFunction":
+    """Create a regular or an Arrow-optimized Python UDTF."""
+    # Determine whether to create Arrow-optimized UDTFs.
+    if useArrow is not None:
+        arrow_enabled = useArrow
+    else:
+        from pyspark.sql import SparkSession
+
+        session = SparkSession._instantiatedSession
+        arrow_enabled = (
+            session.conf.get("spark.sql.execution.pythonUDTF.arrow.enabled") 
== "true"
+            if session is not None
+            else True
+        )
+
+    # Create a regular Python UDTF and check for invalid handler class.
+    regular_udtf = _create_udtf(cls, returnType, name, 
PythonEvalType.SQL_TABLE_UDF, deterministic)

Review Comment:
   Ah, okie. It's consistent with pandas UDF. Let's fix them in a separate PR 
together then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41867: [SPARK-43964][SQL][PYTHON] Support arrow-optimized Python UDTFs

Reply via email to