Re: [PR] [SPARK-44746][Python] Add more Python UDTF documentation for functions that accept input tables [spark]

via GitHub Mon, 04 Mar 2024 17:24:54 -0800


dtenedor commented on code in PR #45375:
URL: https://github.com/apache/spark/pull/45375#discussion_r1511980443



##########
python/docs/source/user_guide/sql/python_udtf.rst:
##########
@@ -63,6 +63,7 @@ To implement a Python UDTF, you first need to define a class 
implementing the me
             """
             ...
 
+        @staticmethod
         def analyze(self, *args: Any) -> AnalyzeResult:

Review Comment:
   You're right, good catch. Updated this.



##########
python/docs/source/user_guide/sql/python_udtf.rst:
##########
@@ -285,10 +327,39 @@ To implement a Python UDTF, you first need to define a 
class implementing the me
             """
             ...
 
+Emitting output rows
+--------------------
+
+The return type of the UDTF defines the schema of the table it outputs. It 
must be either a
+``StructType``, for example ``StructType().add("c1", StringType())``, or a DDL 
string representing a
+struct type, for example ``c1: string``. The `eval` and `terminate` methods 
then emit zero or more
+output rows conforming to this schema by yielding tuples, lists, or 
pyspark.sql.Row objects. For
+example:
+
+```
+def eval(self, x, y, z):
+    # Here we return a row by providing a tuple of three elements.

Review Comment:
   Sure, this is done.



##########
python/docs/source/user_guide/sql/python_udtf.rst:
##########
@@ -163,6 +185,28 @@ To implement a Python UDTF, you first need to define a 
class implementing the me
             ...         num_articles=len((
             ...             word for word in words
             ...             if word == 'a' or word == 'an' or word == 'the')))
+
+            An `analyze` implementation that returns a constant output schema, 
and also requests
+            to select a subset of columns from the input table and for input 
table to be partitioned
+            across several UDTF calls based on the values of the `date` column:
+
+            >>> @staticmethod
+            ... def analyze(*args) -> AnalyzeResult:

Review Comment:
   I added some more explanation here.



##########
python/docs/source/user_guide/sql/python_udtf.rst:
##########
@@ -75,31 +76,52 @@ To implement a Python UDTF, you first need to define a 
class implementing the me
 
             This method accepts zero or more parameters mapping 1:1 with the 
arguments provided to
             the particular UDTF call under consideration. Each parameter is an 
instance of the
-            `AnalyzeArgument` class, which contains fields including the 
provided argument's data
-            type and value (in the case of literal scalar arguments only). For 
table arguments, the
-            `isTable` field is set to true and the `dataType` field is a 
StructType representing
-            the table's column types:
-
-                dataType: DataType
-                value: Optional[Any]
-                isTable: bool
+            `AnalyzeArgument` class.
+
+            `AnalyzeArgument` fields
+            ------------------------
+            dataType: DataType
+                Indicates the type of the provided input argument to this 
particular UDTF call.
+                For input table arguments, this is a StructType representing 
the table's columns.
+            value: Optional[Any]
+                The value of the provided input argument to this particular 
UDTF call. This is
+                `None` for table arguments, or for literal scalar arguments 
that are not constant.
+            isTable: bool
+                This is true if the provided input argument to this particular 
UDTF call is a
+                table argument.
+            isConstantExpression: bool
+                This is true if the provided input argument to this particular 
UDTF call is a
+                constant scalar expression.

Review Comment:
   Yes. Updated this to explicitly say "either a literal or other 
constant-foldable scalar expression."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-44746][Python] Add more Python UDTF documentation for functions that accept input tables [spark]

Reply via email to