rdd1 is cached, but it has no effect: 1 var rdd1 = ... 2 var rdd2 = ... 3 var kv = ... 4 for (i <- 0 until n) { 5 var kvGlobal = sc.broadcast(kv) // broadcast kv 6 rdd1 = rdd2.map { 7 case t => doSomething(t, kvGlobal.value) 8 }.cache() 9 var tmp = rdd1.reduceByKey().collect() 10 kv = updateKV(tmp) // update kv for each iteration 11 rdd2 = rdd1 12 } 13 rdd2.saveAsTextFile()
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5496.html Sent from the Apache Spark User List mailing list archive at Nabble.com.