This one works as expected: ``` >>> spark.range(10).selectExpr("id", "id as k").groupBy("k").agg({"k": "count", >>> "id": "sum"}).show() +---+--------+-------+ | k|count(k)|sum(id)| +---+--------+-------+ | 0| 1| 0| | 7| 1| 7| | 6| 1| 6| | 9| 1| 9| | 5| 1| 5| | 1| 1| 1| | 3| 1| 3| | 8| 1| 8| | 2| 1| 2| | 4| 1| 4| +---+--------+-------+ ```
Have you try to remove the orderBy? that looks weird. On Fri, May 27, 2016 at 4:28 AM, Andrew Vykhodtsev <yoz...@gmail.com> wrote: > Dear list, > > I am trying to calculate sum and count on the same column: > > user_id_books_clicks = > (sqlContext.read.parquet('hdfs:///projects/kaggle-expedia/input/train.parquet') > .groupby('user_id') > .agg({'is_booking':'count', > 'is_booking':'sum'}) > .orderBy(fn.desc('count(user_id)')) > .cache() > ) > > If I do it like that, it only gives me one (last) aggregate - > sum(is_booking) > > But if I change to .agg({'user_id':'count', 'is_booking':'sum'}) - it > gives me both. I am on 1.6.1. Is it fixed in 2.+? Or should I report it to > JIRA? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org