Hi, I have my application jar sitting in HDFS which defines long-running Spark Streaming job and I am using checkpoint dir also in HDFS. Every time I have any changes to the job, I go delete that jar and upload a new one.
Now if I upload a new jar and delete checkpoint directory it works fine. But if I don't delete the checkpoint directory, I get an error like: imestamp="2016-05-13T18:49:47,887+0000",level="WARN",threadName="main",logger="org.apache.spark.streaming.CheckpointReader",message="Error reading checkpoint from file hdfs://myCheckpoints/application-1/checkpoint-1463165355000",exception=*"java.io.InvalidClassException: some.package.defined.here.ConcreteClass; local class incompatible: stream classdesc serialVersionUID = -7808345595732501156, local class serialVersionUID = 1574855058137843618* I have changed the 'ConcreteClass' from my last implementation and that's what's causing the issue. I have 2 main questions: 1. *How to fix this?* I know adding private static long serialVersionUID = 1113799434508676095L; might fix it, but I don't want to add this to all classes, since any class can change between current and next version. Anything better? 2. *What all does checkpoint directory store?* Does it store all classes from previous jar? Or just their name and serialVersionUID? This <https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing> doesn't give much detail on internals of checkpointing. It's size is only ~ 6Mb. Appreciate any help. Just asked this question on: https://stackoverflow.com/questions/37217738/spark-job-fails-when-using-checkpointing-if-a-class-change-in-the-job Thanks, KP