Hi,
I've read about the recent updates about spark-streaming integration with
Kafka (I refer to the new approach without receivers).
In the new approach, metadata are persisted in checkpoint folders on HDFS
so that the SparkStreaming context can be recreated in case of failures.
This means that the streaming application will restart from the where it
exited and the message consuming process continues with new messages only.
Also, if I manually stop the streaming process and recreate the context
from checkpoint (using an approach similar to
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala),
the behavior would be the same.

Now, suppose I want to change something in the software and modify the
processing pipeline.
Can spark use the previous checkpoint to recreate the new application? Will
I ever be able to upgrade the software without processing all the messages
in Kafka again?

Regards,
Nicola

Reply via email to