Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20211#discussion_r161366590
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -233,6 +233,27 @@ def apply(self, udf):
             |  2| 1.1094003924504583|
             +---+-------------------+
     
    +        Notes on grouping column:
    --- End diff --
    
    sounds to me like we could either stick with func(key, pdf) or whatever 
pandas does.
    
    (yes, for gapply, the returned data frame is expected to have key columns 
prepended; there was one SPARK-16258 proposing to eliminate that extra work)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to