[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-25 Thread citoubest
Github user citoubest commented on the issue:

https://github.com/apache/spark/pull/15135
  
@davies, what do you think about this patch? Can you give me some advice? 
Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-20 Thread citoubest
Github user citoubest commented on the issue:

https://github.com/apache/spark/pull/15135
  
with pandas, the param for agg is the function not a str (function names).
In [13]: df
Out[13]: 
  a b c  d
0  0.068300  0.263883  0.237335  1
1  0.226992  0.573966  0.954791  2
2  0.907550  0.930591  0.886454  1
3  0.178581  0.440734  0.414763  2

In [14]: df.groupby('d').agg([max,min])
Out[14]: 
  a   b   c  
max   min   max   min   max   min
d
1  0.907550  0.068300  0.930591  0.263883  0.886454  0.237335
2  0.226992  0.178581  0.573966  0.440734  0.954791  0.414763



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-20 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15135
  
Pandas doesn't support this, does it?

```
>>> pd.read_csv('test.csv').groupby('a').agg('sum', 'avg')
Traceback (most recent call last):
  File "", line 1, in 
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
3597, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
3114, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pandas/core/base.py", line 428, 
in _aggregate
return getattr(self, arg)(*args, **kwargs), None
TypeError: f() takes exactly 1 argument (2 given)
>>> pd.read_csv('test.csv').groupby('a').agg(['sum', 'avg'])
Traceback (most recent call last):
  File "", line 1, in 
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
3597, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
3114, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/pandas/core/base.py", line 564, 
in _aggregate
return self._aggregate_multiple_funcs(arg, _level=_level), None
  File "/Library/Python/2.7/site-packages/pandas/core/base.py", line 609, 
in _aggregate_multiple_funcs
results.append(colg.aggregate(arg))
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
2574, in aggregate
(_level or 0) + 1)
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
2636, in _aggregate_multiple_funcs
results[name] = obj.aggregate(func)
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
2570, in aggregate
return getattr(self, func_or_funcs)(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/pandas/core/groupby.py", line 
498, in __getattr__
(type(self).__name__, attr))
AttributeError: 'SeriesGroupBy' object has no attribute 'avg'
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-20 Thread citoubest
Github user citoubest commented on the issue:

https://github.com/apache/spark/pull/15135
  
OK,  because pandas dataframe support the added approach to agg, so I 
suppose maybe spark dataframe should support, but it not. So I have tried to 
add this patch. If you think this patch is not necessary , I will close this 
request later. @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15135
  
I understand the reasons why you want to add this -- but I feel this is too 
esoteric and if we add this one, there are also a lot of other cases that can 
be added and I don't know where we would stop.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-19 Thread citoubest
Github user citoubest commented on the issue:

https://github.com/apache/spark/pull/15135
  
  @rxin @davies @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-18 Thread citoubest
Github user citoubest commented on the issue:

https://github.com/apache/spark/pull/15135
  
@petermaxlee 
In my opinion, list comprehension can reduce code length  to some extent. 
It's better if the agg method can support the  easy way in api level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-18 Thread petermaxlee
Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/15135
  
Isn't it as simple as 
```
cols = [x for x in df.columns if x != "key]
df.groupby("key").agg([F.min(x) for x in cols] + [F.max(x) for x in cols])
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15135: [pyspark][group]pyspark GroupedData can't apply agg func...

2016-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15135
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org