Hi, I have spark kafka streaming job running in Yarn cluster mode with spark.task.maxFailures=4 (default) spark.yarn.max.executor.failures=8 number of executor=1 spark.streaming.stopGracefullyOnShutdown=false checkpointing enabled
- When there is RuntimeException in a batch in executor then same batch retired 4 times and moving to next batch. Likewise it moves to many batch and later executor is failing. Executor receives the shutdown after few seconds. Driver and executor is killed. - Then driver and executor relaunching with very high offset than failed executor last offset used. I expected executor fails after a batch fails 4 times and relaunch the new executor with same failed batch. Driver creating stages with new batch range after previous batch fails 4 times. How to stop create new task in executor? How to avoid such data loss? Spark version: 1.6.1 -- Regards Vasanth kumar RJ
