[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

cloud-fan Thu, 11 Jan 2018 02:42:07 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20211#discussion_r160918997
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -233,6 +233,27 @@ def apply(self, udf):
             |  2| 1.1094003924504583|
             +---+-------------------+
     
    +        Notes on grouping column:
    --- End diff --
    
    From a SQL background, I think we should add grouping keys to the input of 
UDF. Sometimes users do need to read the grouping keys when aggregating, and we 
should give users a way to do it. BTW this is also consistent with Dataset, see 
`KeyValueGroupedDataset.mapGroups`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

Reply via email to