[GitHub] [spark] attilapiros commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

GitBox Mon, 31 Aug 2020 05:57:37 -0700


attilapiros commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r480111222




##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1415,10 +1415,11 @@ package object config {
 
   private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
     ConfigBuilder("spark.shuffle.readHostLocalDisk")
-      .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is 
disabled and external " +
-        s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " +
-        "blocks requested from those block managers which are running on the 
same host are read " +
-        "from the disk directly instead of being fetched as remote blocks over 
the network.")
+      .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is 
disabled, shuffle " +
+        "blocks requested from those block managers which are running on the 
same host are " +
+        "read from the disk directly instead of being fetched as remote blocks 
over the " +
+        "network. Note that for k8s workloads, this only works when nodes are 
using " +
+        "non-isolated container storage.")

Review comment:
       @Ngone51 on containerized resource manager having a non-isolated 
container storage won't be enough, as for this feature to work we need to 
detect this non-isolation. Currently this is done by using the some host in the 
blockmanager ID which works only for YARN and standalone mode, is not it?
   A question for the future: do you have a plan to introduce block manager 
grouping based on shared storage?  
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] attilapiros commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

Reply via email to