Re: groupBy for DStream

Sean Owen Tue, 11 Nov 2014 20:02:27 -0800

A DStream is a sequence of RDDs. Just groupBy each RDD.
Likewise, count() does not return a count over all history. It returns a
count of each RDD in the stream, not one count.


You can head or take an RDD in the stream, but it doesn't make as much
sense to talk about the first element of the entire stream. It may be long
since gone before the streaming operation started

On Tue, Nov 11, 2014 at 9:19 PM, SK <skrishna...@gmail.com> wrote:

>
> Hi.
>
> 1) I dont see a groupBy() method for a DStream object. Not sure why that is
> not supported. Currently I am using filter () to separate out the different
> groups. I would like to know if there is a way to convert a DStream object
> to a regular RDD so that I can apply the RDD methods like groupBy.
>
>
> 2) The count() method for a DStream object returns a DStream[Long] instead
> of a simple Long (like RDD does). How can I extract the simple Long count
> value? I tried dstream(0) but got a compilation error that it does not take
> parameters. I also tried dstream[0], but that also resulted in a
> compilation
> error. I am not able to use the head() or take(0) method for DStream
> either.
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: groupBy for DStream

Reply via email to