Counting distinct values for a key?

N B Sun, 19 Jul 2015 11:31:37 -0700

Hello,

How do I go about performing the equivalent of the following SQL clause in
Spark Streaming? I will be using this on a Windowed DStream.


SELECT key, count(distinct(value)) from table group by key;

so for example, given the following dataset in the table:

 key | value
-----+-------
 k1  | v1
 k1  | v1
 k1  | v2
 k1  | v3
 k1  | v3
 k2  | vv1
 k2  | vv1
 k2  | vv2
 k2  | vv2
 k2  | vv2
 k3  | vvv1
 k3  | vvv1

the result will be:

 key | count
-----+-------
 k1  |     3
 k2  |     2
 k3  |     1

Thanks
Nikunj

Counting distinct values for a key?

Reply via email to