Re: groupBy for DStream
1. Use foreachRDD over the dstream and on the each rdd you can call the groupBy() 2. DStream.count() Return a new DStream in which each RDD has a single element generated by counting each RDD of this DStream. Thanks Best Regards On Wed, Nov 12, 2014 at 2:49 AM, SK wrote: > > Hi. > > 1) I dont see a groupBy() method for a DStream object. Not sure why that is > not supported. Currently I am using filter () to separate out the different > groups. I would like to know if there is a way to convert a DStream object > to a regular RDD so that I can apply the RDD methods like groupBy. > > > 2) The count() method for a DStream object returns a DStream[Long] instead > of a simple Long (like RDD does). How can I extract the simple Long count > value? I tried dstream(0) but got a compilation error that it does not take > parameters. I also tried dstream[0], but that also resulted in a > compilation > error. I am not able to use the head() or take(0) method for DStream > either. > > thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: groupBy for DStream
A DStream is a sequence of RDDs. Just groupBy each RDD. Likewise, count() does not return a count over all history. It returns a count of each RDD in the stream, not one count. You can head or take an RDD in the stream, but it doesn't make as much sense to talk about the first element of the entire stream. It may be long since gone before the streaming operation started On Tue, Nov 11, 2014 at 9:19 PM, SK wrote: > > Hi. > > 1) I dont see a groupBy() method for a DStream object. Not sure why that is > not supported. Currently I am using filter () to separate out the different > groups. I would like to know if there is a way to convert a DStream object > to a regular RDD so that I can apply the RDD methods like groupBy. > > > 2) The count() method for a DStream object returns a DStream[Long] instead > of a simple Long (like RDD does). How can I extract the simple Long count > value? I tried dstream(0) but got a compilation error that it does not take > parameters. I also tried dstream[0], but that also resulted in a > compilation > error. I am not able to use the head() or take(0) method for DStream > either. > > thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
groupBy for DStream
Hi. 1) I dont see a groupBy() method for a DStream object. Not sure why that is not supported. Currently I am using filter () to separate out the different groups. I would like to know if there is a way to convert a DStream object to a regular RDD so that I can apply the RDD methods like groupBy. 2) The count() method for a DStream object returns a DStream[Long] instead of a simple Long (like RDD does). How can I extract the simple Long count value? I tried dstream(0) but got a compilation error that it does not take parameters. I also tried dstream[0], but that also resulted in a compilation error. I am not able to use the head() or take(0) method for DStream either. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupBy-for-DStream-tp18623.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org