lucas-nelson-uiuc opened a new pull request, #48947:
URL: https://github.com/apache/spark/pull/48947
Suggested implementation for a pipe method in the PySpark DataFrame. Similar
to the transform method, this method can be called directly on a DataFrame to
perform custom transformations functions. However, unlike the current transform
method which requires one call per custom transformation, pipe can accept an
ambiguous number of transformations and chain them together on the user's
behalf.
Using the existing documentation for `DataFrame.transform`, the suggested
pipe method would look like such:
```python
from pyspark.sql.functions import col
df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
def cast_all_to_int(input_df):
return input_df.select([col(col_name).cast("int") for col_name in
input_df.columns])
def sort_columns_asc(input_df):
return input_df.select(*sorted(input_df.columns))
# with transform method
df.transform(cast_all_to_int).transform(sort_columns_asc).show()
# with pipe method
df.pipe(cast_all_to_int, sort_columns_asc)
```
For functions that take parameters, users can pass closures or partially
defined functions.
```python
from typing import Callable
import functools
def add_n(input_df, n):
return input_df.select([(col(col_name) + n).alias(col_name)
for col_name in input_df.columns])
# define a partial function
add_one = functools.partial(add_n, n=1)
# or, define a function that returns a closure
def add_n(n: int) -> Callable:
def closure(input_df: DataFrame) -> DataFrame:
return input_df.select([(col(col_name) + n).alias(col_name)
for col_name in input_df.columns])
return closure
# with transform method
df.transform(add_n, 1).transform(add_n, n=10).show()
# with pipe method
df.pipe(add_one, add_n(n=10)).show()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]