RE: How to find unique values after groupBy() in spark dataframe ?

2016-12-08 Thread Mendelson, Assaf
Groupby is not an actual result but a construct to allow defining aggregations.

So you can do:


import org.apache.spark.sql.{functions => func}

 val resDF = df.groupBy("client").agg(func.collect_set(df("Date")))


Note that collect_set can be a little heavy in terms of performance so if you 
just want to count, you should probably use approxCountDistinct
Assaf.

From: Devi P.V [mailto:devip2...@gmail.com]
Sent: Thursday, December 08, 2016 10:38 AM
To: user @spark
Subject: How to find unique values after groupBy() in spark dataframe ?

Hi all,

I have a dataframe like following,
+-+--+
|client_id|Date  |
+ +--+
| a   |2016-11-23|
| b   |2016-11-18|
| a   |2016-11-23|
| a   |2016-11-23|
| a   |2016-11-24|
+-+--+
I want to find unique dates of each client_id using spark dataframe.
expected output

a  (2016-11-23, 2016-11-24)
b   2016-11-18
I tried with df.groupBy("client_id").But I don't know how to find distinct 
values after groupBy().
How to do this?
Is any other efficient methods are available for doing this ?
I am using scala 2.11.8 & spark 2.0

Thanks


How to find unique values after groupBy() in spark dataframe ?

2016-12-08 Thread Devi P.V
Hi all,

I have a dataframe like following,

+-+--+
|client_id|Date  |
+ +--+
| a   |2016-11-23|
| b   |2016-11-18|
| a   |2016-11-23|
| a   |2016-11-23|
| a   |2016-11-24|
+-+--+

I want to find unique dates of each client_id using spark dataframe.

expected output

a  (2016-11-23, 2016-11-24)
b   2016-11-18

I tried with df.groupBy("client_id").But I don't know how to find distinct
values after groupBy().
How to do this?
Is any other efficient methods are available for doing this ?
I am using scala 2.11.8 & spark 2.0


Thanks