[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

icexelloss Fri, 06 Oct 2017 11:26:54 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18732#discussion_r143263694
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None):
                 jgd = self._jgd.pivot(pivot_col)
             else:
                 jgd = self._jgd.pivot(pivot_col, values)
    -        return GroupedData(jgd, self.sql_ctx)
    +        return GroupedData(jgd, self._df)
    +
    +    @since(2.3)
    +    def apply(self, udf):
    --- End diff --
    
    @rxin just to recap our discussion regarding naming:
    
    You asked:
    > What's the difference between this one and the transform function you 
also proposed? I'm trying to see if all the naming
    makes sense when considered together.
    
    Answer is:
    `transform` takes a function: pd.Series -> pd.Series and apply the function 
on each column (or subset of columns). The input and output Series are of the 
same length.
    
    `apply` takes a function: pd.DataFrame -> pd.DataFrame and apply the 
function on the group. Similar to `flatMapGroups`
    
    Does this make sense to you?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

Reply via email to