[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

BryanCutler Thu, 24 May 2018 17:29:27 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/21427
  
    I first glance, I thought this issue was slightly different than 
https://issues.apache.org/jira/browse/SPARK-23929, but yeah it seems to be the 
same.  After reading through that discussion, I guess we need to be careful 
about any changes.  I'm not used to creating DataFrames by position, but it is 
possible to do so with a list of tuples like the example from the doctest:
    
    ```
           >>> @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP)  # 
doctest: +SKIP
           ... def mean_udf(key, pdf):
           ...     # key is a tuple of one numpy.int64, which is the value
           ...     # of 'id' for the current group
           ...     return pd.DataFrame([key + (pdf.v.mean(),)])
      
    ```
    Then this would be a breaking change... so maybe it would be best to add 
better documentation for now like @HyukjinKwon mentioned in SPARK-23929, and 
target a change for Spark 3.0?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

Reply via email to