Hi All,
I run a Spark streaming application (Spark 1.2.0) on YARN (Hadoop 2.5.2) with
Spark event log enabled. I set the checkpoint dir in the streaming context and
run the app. It started in YARN with application id 'app_id_1' and created the
Spark event log dir /spark/applicationHistory/app_id_1. I killed the app and
rerun it with the same checkpoint dir, this time it had a different YARN
application id 'app_id_2'. However, rerun failed due to Log directory already
exists:
Exception in thread "Driver" java.io.IOException: Log directory
hdfs://xxx:8020/spark/applicationHistory/app_id_1 already exists!
at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129)
at org.apache.spark.util.FileLogger.start(FileLogger.scala:115)
at
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:353)
at
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:118)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561)
at
org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:566)
at
org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at
com.xxx.spark.streaming.JavaKafkaSparkHbase.main(JavaKafkaSparkHbase.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
Is this an expected behavior? When recoverying from the checkpoint, shouldn't
an event log dir with the name of a new application id created (in the above
example, rerun should create /spark/applicationHistory/app_id_2)?
Thanks,
Max