Hello, I'm using Spark streaming to process kafka message, and wants to use a prop file as the input and broadcast the properties:
val props = new Properties() props.load(new FileInputStream(args(0))) val sc = initSparkContext() val propsBC = sc.broadcast(props) println(s"propFileBC 1: " + propsBC.value) val lines = createKStream(sc) val parsedLines = lines.map (l => { println(s"propFileBC 2: " + propsBC.value) process(l, propsBC.value) }).filter(...) var goodLines = lines.window(2,2) goodLines.print() If I run it with spark-submit and master local[2], it works fine. But if I used the --master spark://master:7077 (2 nodes), the 1st propsBC.value is printed, but the 2nd print inside the map function causes null pointer exception: Caused by: java.lang.NullPointerException at test.spark.Main$$anonfun$1.apply(Main.scala:79) at test.spark.Main$$anonfun$1.apply(Main.scala:78) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:284) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) Appreciate any help, thanks!