Hi All, I run a Spark streaming application (Spark 1.2.0) on YARN (Hadoop 2.5.2) with Spark event log enabled. I set the checkpoint dir in the streaming context and run the app. It started in YARN with application id 'app_id_1' and created the Spark event log dir /spark/applicationHistory/app_id_1. I killed the app and rerun it with the same checkpoint dir, this time it had a different YARN application id 'app_id_2'. However, rerun failed due to Log directory already exists:
Exception in thread "Driver" java.io.IOException: Log directory hdfs://xxx:8020/spark/applicationHistory/app_id_1 already exists! at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.<init>(SparkContext.scala:353) at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:118) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566) at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala) at com.xxx.spark.streaming.JavaKafkaSparkHbase.main(JavaKafkaSparkHbase.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427) Is this an expected behavior? When recoverying from the checkpoint, shouldn't an event log dir with the name of a new application id created (in the above example, rerun should create /spark/applicationHistory/app_id_2)? Thanks, Max