ChenjunZou edited a comment on issue #28168: [SPARK-31395][CORE]reverse preferred location to make schedule more even URL: https://github.com/apache/spark/pull/28168#issuecomment-611502399 > If that's the case, we should fix the configurations to take the locality into account, rather than reversing the hosts. @ChenjunZou, please clarify why and how reversing the hosts can resolve your problem. > > From what you said, reversing will just switch the hot spot nodes to happen. @HyukjinKwon The root cause is the model, or the data is written from a single node. ( xxx.93 for instance) So as the HDFS writing pipeline, the client firstly writes to xxx.93, plus another two nodes. the preferred locations are like that: [xxx.93. xxx.100 xxx.02] [xxx.93 xxx.102 xxx.04] [xxx.93 xxx.66 xxx.05] When spark schedules tasks. the executors in xxx.93 are always preferred by spark scheduler. other executors rarely get tasks when they (in xxx.93)are all busy. The single hot spot should be avoided. besides, I agree to add configurations.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
