Re: [PR] [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table [spark]

via GitHub Wed, 28 Feb 2024 17:04:43 -0800


nickstanishadb commented on code in PR #45142:
URL: https://github.com/apache/spark/pull/45142#discussion_r1506861839



##########
python/pyspark/sql/udtf.py:
##########
@@ -123,10 +123,17 @@ class SelectedColumn:
     alias : str, default ''
         If non-empty, this is the alias for the column or expression as 
visible from the UDTF's
         'eval' method. This is required if the expression is not a simple 
column reference.
+    forwardToOutputTable : bool, default False
+        If true, the UDTF is specifying to Catalyst a metadata property 
wherein the function call
+        will copy the result of evaluating this column or expression from the 
most recent input row
+        through to the output table, to a column with the same name specified 
in the 'alias' field
+        (or the name of the simple column reference otherwise). This is useful 
because it lets the
+        optimizer push filters or other operations down through the UDTF call 
to the input table.
     """
 
     name: str
     alias: str = ""
+    forwardToOutputTable: bool = False

Review Comment:
   @dtenedor would it be possible to create a second abstraction rather than an 
optional field in the select column? For the AI_FORECAST group columns, I'd 
like to be able to select only the value columns, but pass-through the group 
columns e.g.
   
   ```
   AnalyzeResult(
       ...,
       select=["v1", "v2"],
       forward_to_output=["dim1", "dim2"]
   )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table [spark]

Reply via email to