Ngone51 commented on a change in pull request #28911:
URL: https://github.com/apache/spark/pull/28911#discussion_r480157384
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -1415,10 +1415,11 @@ package object config {
private[spark] val SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED =
ConfigBuilder("spark.shuffle.readHostLocalDisk")
- .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is
disabled and external " +
- s"shuffle `${SHUFFLE_SERVICE_ENABLED.key}` is enabled), shuffle " +
- "blocks requested from those block managers which are running on the
same host are read " +
- "from the disk directly instead of being fetched as remote blocks over
the network.")
+ .doc(s"If enabled (and `${SHUFFLE_USE_OLD_FETCH_PROTOCOL.key}` is
disabled, shuffle " +
+ "blocks requested from those block managers which are running on the
same host are " +
+ "read from the disk directly instead of being fetched as remote blocks
over the " +
+ "network. Note that for k8s workloads, this only works when nodes are
using " +
+ "non-isolated container storage.")
Review comment:
> Currently this is done by using the some host in the blockmanager ID
which works only for YARN and standalone mode, is not it?
IIUC, from @holdenk 's previous comment and @dongjoon-hyun 's comment, it
should also work for Mesos/K8s when they're using the non-isolated container.
> A question for the future: do you have a plan to introduce block manager
grouping based on shared storage?
I don't. To be honest, I'm not familiar with the containerized resource
manager. I'm also not sure what the plan you're meaning here. Is it only needed
for the containerized resource manager?
##########
File path:
core/src/test/scala/org/apache/spark/shuffle/HostLocalShuffleFetchSuite.scala
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle
+
+import org.scalatest.matchers.must.Matchers
+import org.scalatest.matchers.should.Matchers._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkEnv,
SparkFunSuite, TestUtils}
+import org.apache.spark.internal.config._
+import org.apache.spark.network.netty.NettyBlockTransferService
+
+/**
+ * This test suite is used to test host local shuffle reading with external
shuffle service disabled
+ */
+class HostLocalShuffleFetchSuite extends SparkFunSuite with Matchers with
LocalSparkContext {
Review comment:
Sorry, which two asserts are you referring to? Are these two asserts in
the new test:
```scala
// Spark should read the shuffle data locally from the cached directories on
the same host,
// so there's no remote fetching at all.
assert(localBytesRead.sum > 0)
assert(remoteBytesRead.sum === 0)
```
If they are, I actually think that checking
`localBytesRead`/`remoteBytesRead` is equal to
`localBlocksFetched`/`remoteBlocksFetched` here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]