tgravescs commented on a change in pull request #25299: [SPARK-27651][Core]
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r318123515
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1020,6 +1020,22 @@ package object config {
.booleanConf
.createWithDefault(false)
+ private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
+ ConfigBuilder("spark.shuffle.readHostLocalDisk.enabled")
+ .doc("If enabled, shuffle blocks requested from those block managers
which are running on " +
+ "the same host are read from the disk directly instead of being
fetched as remote blocks " +
+ "over the network.")
+ .booleanConf
+ .createWithDefault(true)
+
+ private[spark] val STORAGE_LOCAL_DISK_BY_EXECUTORS_CACHE_SIZE =
+ ConfigBuilder("spark.storage.localDiskByExecutors.cacheSize")
+ .doc("The maximum size of the cache of the local dirs for the executors.
This cache will " +
Review comment:
first time reading this, I don't really know what this mean. is this size
the number of local dirs created on disk? What does this intern mean for the
end user. Is this on every executor or just the driver? Note after reading
more of the code I understand it but we should clarify here for user.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]