You can first union them into a single RDD and then call |foreach|. In
Scala:
|rddList.reduce(_.union(_)).foreach(myFunc)
|
For the serialization issue, I don’t have any clue unless more code can
be shared.
On 10/16/14 11:39 PM, /soumya/ wrote:
Hi, my programming model requires me to generate multiple RDDs for various
datasets across a single run and then run an action on it - E.g.
MyFunc myFunc = ... //It implements VoidFunction
//set some extra variables - all serializable
...
for (JavaRDD<String> rdd: rddList) {
...
sc.foreach(myFunc);
}
The problem I'm seeing is that after the first run of the loop - which
succeeds on foreach, the second one fails with
java.io.NotSerializableException for a specific object I'm setting. In my
particular case, the object contains a reference to
org.apache.hadoop.conf.Configuration. Question is:
1. Why does this succeed the first time, and fail the second?
2. Any alternatives to this programming model?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-an-action-inside-a-loop-across-multiple-RDDs-java-io-NotSerializableException-tp16580.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org