[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 @davies, what do you think about this patch? Can you give me some advice? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 with pandas, the param for agg is the function not a str (function names). In [13]: df Out[13]: a b c d 0 0.068300 0.263883 0.237335 1 1 0.226992 0.573966 0.954791 2 2 0.907550 0.930591 0.886454 1 3 0.178581 0.440734 0.414763 2 In [14]: df.groupby('d').agg([max,min]) Out[14]: a b c max min max min max min d 1 0.907550 0.068300 0.930591 0.263883 0.886454 0.237335 2 0.226992 0.178581 0.573966 0.440734 0.954791 0.414763 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15135 Pandas doesn't support this, does it? ``` >>> pd.read_csv('test.csv').groupby('a').agg('sum', 'avg') Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 3597, in aggregate return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs) File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 3114, in aggregate result, how = self._aggregate(arg, _level=_level, *args, **kwargs) File "/Library/Python/2.7/site-packages/pandas/core/base.py", line 428, in _aggregate return getattr(self, arg)(*args, **kwargs), None TypeError: f() takes exactly 1 argument (2 given) >>> pd.read_csv('test.csv').groupby('a').agg(['sum', 'avg']) Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 3597, in aggregate return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs) File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 3114, in aggregate result, how = self._aggregate(arg, _level=_level, *args, **kwargs) File "/Library/Python/2.7/site-packages/pandas/core/base.py", line 564, in _aggregate return self._aggregate_multiple_funcs(arg, _level=_level), None File "/Library/Python/2.7/site-packages/pandas/core/base.py", line 609, in _aggregate_multiple_funcs results.append(colg.aggregate(arg)) File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2574, in aggregate (_level or 0) + 1) File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2636, in _aggregate_multiple_funcs results[name] = obj.aggregate(func) File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 2570, in aggregate return getattr(self, func_or_funcs)(*args, **kwargs) File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 498, in __getattr__ (type(self).__name__, attr)) AttributeError: 'SeriesGroupBy' object has no attribute 'avg' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 OK, because pandas dataframe support the added approach to agg, so I suppose maybe spark dataframe should support, but it not. So I have tried to add this patch. If you think this patch is not necessary , I will close this request later. @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15135 I understand the reasons why you want to add this -- but I feel this is too esoteric and if we add this one, there are also a lot of other cases that can be added and I don't know where we would stop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 @rxin @davies @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user citoubest commented on the issue: https://github.com/apache/spark/pull/15135 @petermaxlee In my opinion, list comprehension can reduce code length to some extent. It's better if the agg method can support the easy way in api level. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/15135 Isn't it as simple as ``` cols = [x for x in df.columns if x != "key] df.groupby("key").agg([F.min(x) for x in cols] + [F.max(x) for x in cols]) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15135 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org