Ngone51 commented on a change in pull request #27419: [SPARK-30694][SHUFFLE]If
exception occured while fetching blocks by ExternalBlockClient, fail early when
External Shuffle Service is not alive
URL: https://github.com/apache/spark/pull/27419#discussion_r374695885
##########
File path:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##########
@@ -103,14 +103,26 @@ public void fetchBlocks(
try {
RetryingBlockFetcher.BlockFetchStarter blockFetchStarter =
(blockIds1, listener1) -> {
+
// Unless this client is closed.
if (clientFactory != null) {
- TransportClient client = clientFactory.createClient(host, port);
+ TransportClient client = null;
+ try {
+ client = clientFactory.createClient(host, port);
+ } catch (Exception e) {
+ // throw ExternalShuffleServiceLostException exception then we
won't retry to connect
+ // un-connected External Shuffle Service.
+ String msg = "The relative remote external shuffle service
(host: " + host + "," +
+ "port: " + port + "), which maintains the block data can't
been connected.";
+ logger.info(msg);
+ throw new ExternalShuffleServiceLostException(msg);
+ }
new OneForOneBlockFetcher(client, appId, execId,
- blockIds1, listener1, conf, downloadFileManager).start();
+ blockIds1, listener1, conf, downloadFileManager).start();
} else {
logger.info("This clientFactory was closed. Skipping further
block fetch retries.");
}
+
Review comment:
revert?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]