Re: [PR] [SPARK-50404][PYTHON] PySpark DataFrame Pipe Method [spark]

via GitHub Mon, 25 Nov 2024 10:32:59 -0800


lucas-nelson-uiuc commented on code in PR #48947:
URL: https://github.com/apache/spark/pull/48947#discussion_r1857165004



##########
python/pyspark/sql/classic/dataframe.py:
##########
@@ -1699,6 +1700,19 @@ def transform(
         ), "Func returned an instance of type [%s], " "should have been 
DataFrame." % type(result)
         return result
 
+    def pipe(
+        self, *funcs: tuple[Callable[..., ParentDataFrame]]
+    ) -> ParentDataFrame:
+        result = functools.reduce(
+            lambda init, func: init.transform(func),

Review Comment:
   I feel like the same could be said about `DataFrame.transform`.
   
   Before this was in the API, users would have to write `h(g(f(input_df)))`. 
With transform, users now write 
`input_df.transform(f).transform(g).transform(h)`. Implementing 
`DataFrame.pipe` would shorten this expression to `input_df.pipe(f, g, h)`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50404][PYTHON] PySpark DataFrame Pipe Method [spark]

Reply via email to