Ngone51 commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r457896569



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1391,10 +1391,12 @@ package object config {
 
   private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
     ConfigBuilder("spark.shuffle.readHostLocalDisk")
-      .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is 
disabled and external " +
-        s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " +
-        "blocks requested from those block managers which are running on the 
same host are read " +
-        "from the disk directly instead of being fetched as remote blocks over 
the network.")
+      .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is 
disabled and 1) external " +
+        s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled or 2) 
${DYN_ALLOCATION_ENABLED.key}" +
+        s" is disabled), shuffle blocks requested from those block managers 
which are running on " +

Review comment:
       Yes, it prevents the case where executors could come and go in dynamic 
allocation. Also, I think it's still different from executor loss error. 
Because executor loss is an abnormal case which out of control of Spark while 
dynamic allocation is under control. And executor shutdown in dynamic 
allocation happens more frequently compares to executor loss. I think we should 
try our best to avoid shuffle fetch failure since its penalty is not trivial, 
especially when we can avoid it.
   
   Besides, for the case of dynamic allocation enabled, users could already use 
external shuffle service. Therefore, I can't think of a strong reason to mix 
these two branches.
   
   P.S. we could probably allow dynamic allocation here if 
`spark.dynamicAllocation.shuffleTracking.enabled` is also enabled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to