You dont to convert JavaDStream to DStream. Even JavaDStream has foreachRDD.
On Tue, Sep 30, 2014 at 1:22 PM, Jon Gregg <jonrgr...@gmail.com> wrote: > Hi Andy > > I'm new to Spark and have been working with Scala not Java but I see > there's a dstream() method to convert from JavaDStream to DStream. Then > within > DStream > <http://people.apache.org/~pwendell/spark-1.1.0-rc4-docs/api/java/org/apache/spark/streaming/dstream/DStream.html> > there is a foreachRDD() method that allows you to do things like: > > msgConvertedToDStream.foreachRDD(rdd => println("The count is: " + > rdd.count().toInt)) > > The syntax for the casting should be changed for Java and probably the > function argument syntax is wrong too, but hopefully there's enough there > to help. > > Jon > > > On Tue, Sep 30, 2014 at 3:42 PM, Andy Davidson < > a...@santacruzintegration.com> wrote: > >> Hi >> >> I have a simple streaming app. All I want to do is figure out how many >> lines I have received in the current mini batch. If numLines was a JavaRDD >> I could simply call count(). How do you do something similar in Streaming? >> >> >> Here is my psudo code >> >> >> JavaDStream<String> msg = logs.filter(selectINFO); >> >> JavaDStream<Long> numLines = msg.count() >> >> >> Long totalCount = numLines ??? >> >> >> >> Here is what I am really trying to do. I have a python script that >> generated a graph of totalCount vs time. Python does not support streaming. >> As a work around I have a java program that does the steaming. I want to >> pass the data back to the python script. It has been suggested I can use >> rdd.pipe(). >> >> >> In python I call rdd.pipe(scriptToStartJavaSteam.sh) >> >> >> All I need to do is for each mini batch figure out how to get the the >> count of the current mini batch and write it to standard out. Seems like >> this should be simple. >> >> >> Maybe Streams do not work the way I think? In a spark core app, I am able >> to get values like count in my driver and do what ever I want with the >> local value. With streams I know I am getting mini patches because print() >> display the first 10 lines of my steam. I assume that some how print is >> executed in my driver so somehow data was sent from the workers back to >> the driver. >> >> >> Any comments or suggestions would be greatly appreciated. >> >> >> Andy >> >> >> P.s. Should I be asking a different question? >> >> >> >> >> >> >