Groupby is not an actual result but a construct to allow defining aggregations.
So you can do: import org.apache.spark.sql.{functions => func} val resDF = df.groupBy("client").agg(func.collect_set(df("Date"))) Note that collect_set can be a little heavy in terms of performance so if you just want to count, you should probably use approxCountDistinct Assaf. From: Devi P.V [mailto:devip2...@gmail.com] Sent: Thursday, December 08, 2016 10:38 AM To: user @spark Subject: How to find unique values after groupBy() in spark dataframe ? Hi all, I have a dataframe like following, +---------+----------+ |client_id|Date | +-------- +----------+ | a |2016-11-23| | b |2016-11-18| | a |2016-11-23| | a |2016-11-23| | a |2016-11-24| +---------+----------+ I want to find unique dates of each client_id using spark dataframe. expected output a (2016-11-23, 2016-11-24) b 2016-11-18 I tried with df.groupBy("client_id").But I don't know how to find distinct values after groupBy(). How to do this? Is any other efficient methods are available for doing this ? I am using scala 2.11.8 & spark 2.0 Thanks