I think you need to call collect . 

> On Mar 4, 2014, at 11:18 AM, Adrian Mocanu <amoc...@verticalscope.com> wrote:
> 
> Hi
> I’ve noticed that if in the driver of a spark app I have a foreach and add 
> stream elements to a list from the stream, the list contains no elements at 
> the end of the processing.
>  
> Take this sample code:
>   val list= new java.util.List()
>   sstream.foreachRDD (rdd => rdd.foreach( tuple => list.add(tuple) ) )
>  
> If in the add method of the list I put a print statement I see the tuples 
> added. But when I print the list it is empty. I do wait for the stream to 
> finish before I print the list so it’s not a timing/racing issue.
>  
> I think it might have something to do  with the fact that Spark sends the 
> List code to its nodes, adds the data to the list there, but never sends the 
> list back to the driver program or at least in the driver program the pointer 
> to the list does not point do the list that was over to Spark nodes and which 
> now has the tuples from the stream
>  
> Am I supposed to use RDD.collect then add the data to the List or what’s the 
> proper way to get tuples out of Spark?
>  sstream.foreachRDD (rdd => rdd.collect.foreach( tuple => list.add(tuple) ) )
>  
> Thanks
> -Adrian
>  

Reply via email to