turboFei commented on issue #27943: [SPARK-31179] Fast fail the connection while last connection failed in the last retry IO wait URL: https://github.com/apache/spark/pull/27943#issuecomment-602625438 Thanks for the reply @tgravescs Sorry for the unclear description. `All connections` I mentioned above is that the sent request connections to the same unreachable address. It is my mistake that does not recognize that there maybe several clients for the same address, may be we need keep a lastConnectionFailedTime variable for one clientPool. The problem is that, for a task, there maybe several request connections to the same address. Specially, for a shuffle read task and there is only one client in the client pool and it would always been picked by the connections, which want to connect the same ESS. If this address is unreachable, these connections would block each other(during createClient). These connections owned to a same task and want to connect the same ESS, if this ESS was still unreachable. It would cost connectionNum \* connectionTimeOut \* maxRetry to retry, and then fail the task. It is ideally that this task could fail in connectionTimeOut \* maxRetry.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
