Hi Spark users, I'm observing behavior where if a master node goes down for a restart, all the worker JVMs die (in standalone cluster mode). In other cluster computing setups with master-worker relationships (namely Hadoop), if a worker can't connect to the master or its connection drops it retries a few times and then does exponential backoff or similar.
In Spark though the worker just dies. Is the die-on-disconnect behavior intentional or would people be ok with 3 retries 5sec apart and then exponential backoff? Andrew
