Re: Spark Streaming - Collecting RDDs into array in the driver program
Hi, On Thu, Feb 26, 2015 at 11:24 AM, Thanigai Vellore < thanigai.vell...@gmail.com> wrote: > It appears that the function immediately returns even before the > foreachrdd stage is executed. Is that possible? > Sure, that's exactly what happens. foreachRDD() schedules a computation, it does not perform it. Maybe your streaming application would not ever terminate, but still the function needs to return, right? If you remove the toArray(), you will return a reference to the ArrayBuffer that will be appended to over time. You can then, in a different thread, check the contents of that ArrayBuffer as processing happens, or wait until processing ends. Tobias
Re: Spark Streaming - Collecting RDDs into array in the driver program
I didn't include the complete driver code but I do run the streaming context from the main program which calls this function. Again, I can print the red elements within the foreachrdd block but the array that is returned is always empty. It appears that the function immediately returns even before the foreachrdd stage is executed. Is that possible? On Feb 25, 2015 5:41 PM, "Tathagata Das" wrote: > You are just setting up the computation here using foreacRDD. You have not > even run the streaming context to get any data. > > > On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore < > thanigai.vell...@gmail.com> wrote: > >> I have this function in the driver program which collects the result from >> rdds (in a stream) into an array and return. However, even though the RDDs >> (in the dstream) have data, the function is returning an empty array...What >> am I doing wrong? >> >> I can print the RDD values inside the foreachRDD call but the array is >> always empty. >> >> def runTopFunction() : Array[(String, Int)] = { >> val topSearches = some function >> val summary = new ArrayBuffer[(String,Int)]() >> topSearches.foreachRDD(rdd => { >> summary = summary.++(rdd.collect()) >> }) >> >> return summary.toArray >> } >> >> >
Re: Spark Streaming - Collecting RDDs into array in the driver program
You are just setting up the computation here using foreacRDD. You have not even run the streaming context to get any data. On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore < thanigai.vell...@gmail.com> wrote: > I have this function in the driver program which collects the result from > rdds (in a stream) into an array and return. However, even though the RDDs > (in the dstream) have data, the function is returning an empty array...What > am I doing wrong? > > I can print the RDD values inside the foreachRDD call but the array is > always empty. > > def runTopFunction() : Array[(String, Int)] = { > val topSearches = some function > val summary = new ArrayBuffer[(String,Int)]() > topSearches.foreachRDD(rdd => { > summary = summary.++(rdd.collect()) > }) > > return summary.toArray > } > >