[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-25 Thread citoubest
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 @davies, what do you think about this patch? Can you give me some advice? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread citoubest
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 with pandas, the param for agg is the function not a str (function names). In [13]: df Out[13]: a b c d 0 0.068300 0.263883 0.237335 1 1 0.226992

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15135 Pandas doesn't support this, does it? ``` >>> pd.read_csv('test.csv').groupby('a').agg('sum', 'avg') Traceback (most recent call last): File "", line 1, in File "/Library/Py

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread citoubest
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 OK, because pandas dataframe support the added approach to agg, so I suppose maybe spark dataframe should support, but it not. So I have tried to add this patch. If you think this patch is not ne

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15135 I understand the reasons why you want to add this -- but I feel this is too esoteric and if we add this one, there are also a lot of other cases that can be added and I don't know where we would stop.

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread citoubest
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 @rxin @davies @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wish

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-18 Thread citoubest
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 @petermaxlee In my opinion, list comprehension can reduce code length to some extent. It's better if the agg method can support the easy way in api level. --- If your project is set up for

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-17 Thread petermaxlee
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/15135 Isn't it as simple as ``` cols = [x for x in df.columns if x != "key] df.groupby("key").agg([F.min(x) for x in cols] + [F.max(x) for x in cols]) ``` --- If your project is set u

[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15135 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat