Hi, I've read about the recent updates about spark-streaming integration with Kafka (I refer to the new approach without receivers). In the new approach, metadata are persisted in checkpoint folders on HDFS so that the SparkStreaming context can be recreated in case of failures. This means that the streaming application will restart from the where it exited and the message consuming process continues with new messages only. Also, if I manually stop the streaming process and recreate the context from checkpoint (using an approach similar to https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala), the behavior would be the same.
Now, suppose I want to change something in the software and modify the processing pipeline. Can spark use the previous checkpoint to recreate the new application? Will I ever be able to upgrade the software without processing all the messages in Kafka again? Regards, Nicola