HyukjinKwon opened a new pull request #28052: [SPARK-31287][PYTHON][SQL] Ignore 
type hints in groupby.(cogroup.)applyInPandas and mapInPandas
URL: https://github.com/apache/spark/pull/28052
 
 
   ### What changes were proposed in this pull request?
   
   This PR proposes to make pandas function APIs 
(`groupby.(cogroup.)applyInPandas` and `mapInPandas`) to ignore Python type 
hints.
   
   ### Why are the changes needed?
   
   Python type hints are optional. It shouldn't affect where pandas UDFs are 
not used.
   This is also a future work for them to support other type hints. We 
shouldn't at least throw an exception at this moment.
   
   ### Does this PR introduce any user-facing change?
   
   No, it's master-only change.
   
   ```python
   import pandas as pd
   
   def pandas_plus_one(pdf: pd.DataFrame) -> pd.DataFrame:
       return pdf + 1
   
   spark.range(10).groupby('id').applyInPandas(pandas_plus_one, schema="id 
long").show()
   ```
   ```python
   import pandas as pd
   from pyspark.sql.functions import pandas_udf, PandasUDFType
   
   def pandas_plus_one(left: pd.DataFrame, right: pd.DataFrame) -> pd.DataFrame:
       return left + 1
   
   
spark.range(10).groupby('id').cogroup(spark.range(10).groupby("id")).applyInPandas(pandas_plus_one,
 schema="id long").show()
   ```
   
   ```python
   from typing import Iterator
   import pandas as pd
   
   def pandas_plus_one(iter: Iterator[pd.DataFrame]) -> Iterator[pd.DataFrame]:
       return map(lambda v: v + 1, iter)
   
   spark.range(10).mapInPandas(pandas_plus_one, schema="id long").show()
   ```
   
   
   **Before:**
   
   Exception
   
   **After:**
   
   ```
   +---+
   | id|
   +---+
   |  1|
   |  2|
   |  3|
   |  4|
   |  5|
   |  6|
   |  7|
   |  8|
   |  9|
   | 10|
   +---+
   ```
   
   ### How was this patch tested?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to