My code just like follows: 1 var rdd1 = ... 2 var rdd2 = ... 3 var kv = ... 4 for (i <- 0 until n) { 5 var kvGlobal = sc.broadcast(kv) // broadcast kv 6 rdd1 = rdd2.map { 7 case t => doSomething(t, kvGlobal.value) 8 } 9 var tmp = rdd1.reduceByKey().collect() 10 kv = updateKV(tmp) // update kv for each iteration 11 rdd2 = rdd1 12 } 13 rdd2.saveAsTextFile() In 1st itreation, when processed line9, each slave need to read broadcast_1; In 2nd iteration, when processed line9, each slave need to read broadcast_1 and broadcast_2; In 3rd iteration, when processed line9, each slave need to read broadcast_1, broadcast_2 and broadcast_3; ... broadcast_/n/ all correspond to kvGlobal at different iterations. why in /n/th iteration, each slave need to read from broadcast_1 to broadcast_/n/, why not just reading broadcast_/n/.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479.html Sent from the Apache Spark User List mailing list archive at Nabble.com.