tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection 
while last connection failed in the last retry IO wait
URL: https://github.com/apache/spark/pull/27943#issuecomment-602605054
 
 
   So I'm really not clear on what the desired behavior is and I think we are 
using different terminology here.  I would like you to clearly define what you 
want the desired behavior to be compared to now.
   
   @turboFei  you say "In fact, the current implementation in this patch would 
fast fail all connections."
   
   What is all connections?  If a single task fails to fetch from an address 
(host/port) is the intention for you to fail that task and all other subsequent 
tasks trying to fetch from that address immediately and not retry?  If so I'm 
against this. The retry is there for a reason, many times we see temporary 
issues with shuffle servers on node managers that on retry it works fine. 
People can configure the number of retries and wait based on how they want 
their job to act.
   
   If that is not what you are proposing please describe in detail what you 
want the first task that fails to do, subsequent tasks fetching from same 
address, and then what happens in the retrying fetcher with retries.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to