httfighter edited a comment on issue #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_… URL: https://github.com/apache/spark/pull/23437#issuecomment-452553044 @srowen Not yet. I added a property worker.isblack to workerInfo to identify whether the worker can be used to allocate executors. The default value is false. When the worker fails to assign an executor to an application, I will record the number of failures. When the number reaches “spark.deploy.executorFailedPerWorkerThreshold”, the worker.isblack is set to true. When the master allocates the executor, it will judge whether the worker is available according to the resource and worker.isblack. I added a timeout parameter “spark.worker.black.timeout” to periodically reset worker.isblack to false. The user can repair the worker dir within the time limit to make the worker available again. If this solution is available, I should also need to add a log print to remind the user to repair the damaged worker dir. Is this solution feasible? Is there a better suggestion?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
