Groupby is not an actual result but a construct to allow defining aggregations.
So you can do:
import org.apache.spark.sql.{functions => func}
val resDF = df.groupBy("client").agg(func.collect_set(df("Date")))
Note that collect_set can be a little heavy in terms of performance so if you
just want to count, you should probably use approxCountDistinct
Assaf.
From: Devi P.V [mailto:devip2...@gmail.com]
Sent: Thursday, December 08, 2016 10:38 AM
To: user @spark
Subject: How to find unique values after groupBy() in spark dataframe ?
Hi all,
I have a dataframe like following,
+-+--+
|client_id|Date |
+ +--+
| a |2016-11-23|
| b |2016-11-18|
| a |2016-11-23|
| a |2016-11-23|
| a |2016-11-24|
+-+--+
I want to find unique dates of each client_id using spark dataframe.
expected output
a (2016-11-23, 2016-11-24)
b 2016-11-18
I tried with df.groupBy("client_id").But I don't know how to find distinct
values after groupBy().
How to do this?
Is any other efficient methods are available for doing this ?
I am using scala 2.11.8 & spark 2.0
Thanks