[GitHub] [spark] ChenjunZou commented on issue #28168: [SPARK-31395][CORE]reverse preferred location to make schedule more even

GitBox Thu, 09 Apr 2020 05:32:27 -0700

ChenjunZou commented on issue #28168: [SPARK-31395][CORE]reverse preferred 
location to make schedule more even
URL: https://github.com/apache/spark/pull/28168#issuecomment-611502399
 
 
   > If that's the case, we should fix the configurations to take the locality 
into account, rather than reversing the hosts. @ChenjunZou, please clarify why 
and how reversing the hosts can resolve your problem.
   > 
   > From what you said, reversing will just switch the hot spot nodes to 
happen.
   
   @HyukjinKwon 
   The root cause is the model, or the data is written from a single node. ( 
xxx.93 for instance)
   So as the HDFS writing pipeline, the client firstly writes to xxx.93, plus 
another two nodes. 
   the preferred locations are like that:
   [xxx.93. xxx.100 xxx.02]
   [xxx.93 xxx.102 xxx.04]
   [xxx.93 xxx.66 xxx.05]
   
   When spark schedules tasks.  
   the executors in xxx.93 are always preferred by spark scheduler.  other 
executors rarely get tasks when they are all busy. The single hot spot should 
be avoided.  
   
   besides, I agree to add configurations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ChenjunZou commented on issue #28168: [SPARK-31395][CORE]reverse preferred location to make schedule more even

Reply via email to