You can first union them into a single RDD and then call |foreach|. In Scala:

|rddList.reduce(_.union(_)).foreach(myFunc)
|

For the serialization issue, I don’t have any clue unless more code can be shared.

On 10/16/14 11:39 PM, /soumya/ wrote:

Hi, my programming model requires me to generate multiple RDDs for various
datasets across a single run and then run an action on it - E.g.

MyFunc myFunc = ... //It implements VoidFunction
//set some extra variables - all serializable
...
for (JavaRDD<String> rdd: rddList) {
...
sc.foreach(myFunc);

}

The problem I'm seeing is that after the first run of the loop - which
succeeds on foreach, the second one fails with
java.io.NotSerializableException for a specific object I'm setting. In my
particular case, the object contains a reference to
org.apache.hadoop.conf.Configuration. Question is:

1. Why does this succeed the first time, and fail the second?
2. Any alternatives to this programming model?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-an-action-inside-a-loop-across-multiple-RDDs-java-io-NotSerializableException-tp16580.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to