My code just like follows:
 1  var rdd1 = ...
 2  var rdd2 = ...
 3  var kv = ...
 4  for (i <- 0 until n) {
 5    var kvGlobal = sc.broadcast(kv)               // broadcast kv
 6    rdd1 = rdd2.map {
 7      case t => doSomething(t, kvGlobal.value)
 8    }
 9    var tmp = rdd1.reduceByKey().collect()
10    kv = updateKV(tmp)                               // update kv for each
iteration
11    rdd2 = rdd1
12 }
13 rdd2.saveAsTextFile()
  In 1st itreation, when processed line9, each slave need to read
broadcast_1;
  In 2nd iteration, when processed line9, each slave need to read
broadcast_1 and broadcast_2;
  In 3rd iteration, when processed line9, each slave need to read
broadcast_1, broadcast_2 and broadcast_3;
  ...
  broadcast_/n/ all correspond to kvGlobal at different iterations.
  why in /n/th iteration,  each slave need to read from broadcast_1 to
broadcast_/n/, why not just reading broadcast_/n/.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to