attilapiros commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r480111222
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1415,10 +1415,11 @@ package object config {
private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
ConfigBuilder("spark.shuffle.readHostLocalDisk")
- .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is
disabled and external " +
- s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " +
- "blocks requested from those block managers which are running on the
same host are read " +
- "from the disk directly instead of being fetched as remote blocks over
the network.")
+ .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is
disabled, shuffle " +
+ "blocks requested from those block managers which are running on the
same host are " +
+ "read from the disk directly instead of being fetched as remote blocks
over the " +
+ "network. Note that for k8s workloads, this only works when nodes are
using " +
+ "non-isolated container storage.")
Review comment:
@Ngone51 on containerized resource manager having a non-isolated
container storage won't be enough, as for this feature to work we need to
detect this non-isolation. Currently this is done by using the some host in the
blockmanager ID which works only for YARN and standalone mode, is not it?
A question for the future: do you have a plan to introduce block manager
grouping based on shared storage?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]