Re: Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Tobias Pfeiffer
Hi,

On Thu, Feb 26, 2015 at 11:24 AM, Thanigai Vellore <
thanigai.vell...@gmail.com> wrote:

> It appears that the function immediately returns even before the
> foreachrdd stage is executed. Is that possible?
>
Sure, that's exactly what happens. foreachRDD() schedules a computation, it
does not perform it. Maybe your streaming application would not ever
terminate, but still the function needs to return, right?

If you remove the toArray(), you will return a reference to the ArrayBuffer
that will be appended to over time. You can then, in a different thread,
check the contents of that ArrayBuffer as processing happens, or wait until
processing ends.

Tobias


Re: Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Thanigai Vellore
I didn't include the complete driver code but I do run the streaming
context from the main program which calls this function. Again, I can print
the red elements within the foreachrdd block but the array that is returned
is always empty. It appears that the function immediately returns even
before the foreachrdd stage is executed. Is that possible?
On Feb 25, 2015 5:41 PM, "Tathagata Das"  wrote:

> You are just setting up the computation here using foreacRDD. You have not
> even run the streaming context to get any data.
>
>
> On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore <
> thanigai.vell...@gmail.com> wrote:
>
>> I have this function in the driver program which collects the result from
>> rdds (in a stream) into an array and return. However, even though the RDDs
>> (in the dstream) have data, the function is returning an empty array...What
>> am I doing wrong?
>>
>> I can print the RDD values inside the foreachRDD call but the array is
>> always empty.
>>
>> def runTopFunction() : Array[(String, Int)] = {
>> val topSearches = some function
>> val summary = new ArrayBuffer[(String,Int)]()
>> topSearches.foreachRDD(rdd => {
>> summary = summary.++(rdd.collect())
>> })
>>
>> return summary.toArray
>> }
>>
>>
>


Re: Spark Streaming - Collecting RDDs into array in the driver program

2015-02-25 Thread Tathagata Das
You are just setting up the computation here using foreacRDD. You have not
even run the streaming context to get any data.


On Wed, Feb 25, 2015 at 2:21 PM, Thanigai Vellore <
thanigai.vell...@gmail.com> wrote:

> I have this function in the driver program which collects the result from
> rdds (in a stream) into an array and return. However, even though the RDDs
> (in the dstream) have data, the function is returning an empty array...What
> am I doing wrong?
>
> I can print the RDD values inside the foreachRDD call but the array is
> always empty.
>
> def runTopFunction() : Array[(String, Int)] = {
> val topSearches = some function
> val summary = new ArrayBuffer[(String,Int)]()
> topSearches.foreachRDD(rdd => {
> summary = summary.++(rdd.collect())
> })
>
> return summary.toArray
> }
>
>