Dear fellow Spark users, I have a streaming app which is reading data from Kafka, doing some computations and storing the results into HBase. Since I am new to Spark streaming I feel that there could still be scope of making my app better.
To begin with, I was wondering what's the best way to free up resources in case of app shutdown(because of some exception, or some other cause). While looking for some help online I bumped into the Spark doc which talks about *spark.streaming.stopGracefullyOnShutdown*. This says I*f true, Spark shuts down the StreamingContext gracefully on JVM shutdown rather than immediately*. Or, does it make more sense to add a *ShutdownHook* explicitly in my app and call JavaStreamingContext.stop()? One potential benefit which I see with *ShutdownHook* is that I could close any external resources before the JVM dies inside the *run()* method. Thoughts/suggestions?? Also, I am banking on the fact that Kafka direct takes care of exact once data delivery. It'll start consuming data after the point where the app had crashed. Is there any way I can restart my streaming app automatically in case of a failure? I'm really sorry to be a pest of questions. I could not satisfy myself with the answers I found online. Thank you so much for your valuable time. Really appreciate it! [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti>