I experience spark streaming restart issues similar to what is discussed in the 2 threads below (in which I failed to find a solution). Could anybody let me know if anything is wrong in the way I start/stop or if this could be a spark bug?
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-data-checkpoint-cleaning-td14847.html http://apache-spark-user-list.1001560.n3.nabble.com/KafkaReciever-Error-when-starting-ssc-Actor-name-not-unique-tc3978.html My stream reads a Kafka topic, does some processing involving an updatStateByKey and saves the result to HDFS. The context is (re)-created at startup as follows: And the start-up and shutdown of the stream is handled as follows: When starting the stream for the first time (with spark-submit), the processing happens successfully, folders are created on the target HDFS folder and streaming stats are visible on http://sparkhost:4040/streaming. After letting the streaming work several minutes and then stopping it (ctrl-c on the command line), the following info is visible in the checkpoint folder: (checkpoint clean-up seems to happen since the stream ran for much more than 5 times 10 seconds) When re-starting the stream, the startup fails with the error below, http://sparkhost:4040/streaming shows no statistics, no new HDFS folder is added in the target folder and no new checkpoint are created: Now if I delete all older checkpoints and keep only the most recent one: I end up with this (kafka?) actor non unique name error. If I delete the checkpoint folder the stream starts successfully (but I lose my ongoing stream state, obviously) We're running spark 1.1.0 on Mesos 0.20. Our spark assembly is packaged with CDH 5.1.0 and Hive: Any comment or suggestion would be greatly appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Systematic-error-when-re-starting-Spark-stream-unless-I-delete-all-checkpoints-tp15142.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org