I think you need to call collect .
> On Mar 4, 2014, at 11:18 AM, Adrian Mocanu <amoc...@verticalscope.com> wrote: > > Hi > I’ve noticed that if in the driver of a spark app I have a foreach and add > stream elements to a list from the stream, the list contains no elements at > the end of the processing. > > Take this sample code: > val list= new java.util.List() > sstream.foreachRDD (rdd => rdd.foreach( tuple => list.add(tuple) ) ) > > If in the add method of the list I put a print statement I see the tuples > added. But when I print the list it is empty. I do wait for the stream to > finish before I print the list so it’s not a timing/racing issue. > > I think it might have something to do with the fact that Spark sends the > List code to its nodes, adds the data to the list there, but never sends the > list back to the driver program or at least in the driver program the pointer > to the list does not point do the list that was over to Spark nodes and which > now has the tuples from the stream > > Am I supposed to use RDD.collect then add the data to the List or what’s the > proper way to get tuples out of Spark? > sstream.foreachRDD (rdd => rdd.collect.foreach( tuple => list.add(tuple) ) ) > > Thanks > -Adrian >