Hello all,

I am using Spark 2.x streaming with kafka.
I noticed that spark streaming is processing subsequent micro-batches in case 
of failure as it takes a while to notify the driver about the error and 
interrupt streaming-executor thread. This is creating a problem as we are 
checkpointing the offsets internally.

To avoid the problem, we wanted to catch the exception in the RDD process and 
stop the spark streaming immediately.

streamRDD.foreachRDD { (rdd, microBatchTime) => {
                try {
                                // business logic
                }catch (Exception ex) {
                      case ex: Exception =>
                       // stop spark streaming
                       streamingContext.stop(stopSparkContext = true, 
stopGracefully = false)
                }
    }
}

But the spark application state is set to Completed. So, the application is not 
restarted automatically by spark (with max attempts config).

I checked if there is a way to notify the error during the shutdown which sets 
the spark application status to Failed. ContextWaiter#notiftError is steaming 
package scoped and couldn’t find any other interfaces to propagate the 
error/exception to stop the process.

How to tell spark streaming to stop processing subsequent micro batches if a 
micro-batch throws an exception ? Is it possible to configure spark to create 
one micro batch RDD at a time ?
How to stop the spark streaming context with error ?

Any help would be appreciated. Thanks in advance.

Regards.

Reply via email to