tgravescs commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait URL: https://github.com/apache/spark/pull/27943#issuecomment-602605054 So I'm really not clear on what the desired behavior is and I think we are using different terminology here. I would like you to clearly define what you want the desired behavior to be compared to now. @turboFei you say "In fact, the current implementation in this patch would fast fail all connections." What is all connections? If a single task fails to fetch from an address (host/port) is the intention for you to fail that task and all other subsequent tasks trying to fetch from that address immediately and not retry? If so I'm against this. The retry is there for a reason, many times we see temporary issues with shuffle servers on node managers that on retry it works fine. People can configure the number of retries and wait based on how they want their job to act. If that is not what you are proposing please describe in detail what you want the first task that fails to do, subsequent tasks fetching from same address, and then what happens in the retrying fetcher with retries.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
