httfighter opened a new pull request #23437: [SPARK-26524] If the application directory fails to be created on the SPARK_WORKER_… URL: https://github.com/apache/spark/pull/23437 …DIR on some woker nodes (for example, bad disk or disk has no capacity), the application executor will be allocated indefinitely. ## What changes were proposed in this pull request? When the spark worker is started, the workerdir is created successfully. When the application is submitted, the disks mounted by the workerdir and worker122 workerdir are damaged. When a worker allocates an executor, it creates a working directory and a temporary directory. If the creation fails, the executor allocation fails. The application directory fails to be created on the SPARK_WORKER_DIR on woker121 and worker122,the application executor will be allocated indefinitely. I added two configuration items, spark.deploy.executorFailedPerWorkerThreshold and spark.worker.black.timeout. Record the number of times the woker fails to allocate an executor to the same application. The maximum value is controlled by spark.deploy.executorFailedPerWorkerThreshold. It is determined whether the worker can be set to black. And the worker black will be removed by configuring the timeout period. Master will judge whether the worker is available according to the resources of the worker and whether the worker is black. // Filter out workers that don't have enough resources to launch an executor val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE) .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB && worker.coresFree >= coresPerExecutor && !worker.isBlack) .sortBy(_.coresFree).reverse ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
