Hello, How do I go about performing the equivalent of the following SQL clause in Spark Streaming? I will be using this on a Windowed DStream.
SELECT key, count(distinct(value)) from table group by key; so for example, given the following dataset in the table: key | value -----+------- k1 | v1 k1 | v1 k1 | v2 k1 | v3 k1 | v3 k2 | vv1 k2 | vv1 k2 | vv2 k2 | vv2 k2 | vv2 k3 | vvv1 k3 | vvv1 the result will be: key | count -----+------- k1 | 3 k2 | 2 k3 | 1 Thanks Nikunj