cloud-fan commented on a change in pull request #27539: [SPARK-30786] [CORE] 
Fix Block replication failure propogation issue in BlockManager
URL: https://github.com/apache/spark/pull/27539#discussion_r377590597
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
 ##########
 @@ -624,6 +624,47 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
     assert(store.getRemoteBytes("list1").isEmpty)
   }
 
+  Seq(0, Int.MaxValue - 512).foreach { maxRemoteBlockSizeFetchToMem =>
+    // Trying different values of maxRemoteBlockSizeFetchToMem to test both
+    // non-streaming and streaming flows of Block replication
+    test(s"test for Block replcation retry logic with " +
+      s"maxRemoteBlockSizeFetchToMem: ${maxRemoteBlockSizeFetchToMem}") {
+      val storageLevel = StorageLevel(
+        useDisk = false, useMemory = true, deserialized = true, replication = 
3)
+      // Retry replication logic for 2 failures
+      conf.set("spark.storage.maxReplicationFailures", "2")
+      // Custom block replication policy which prioritizes BlockManagers as 
per hostnames
+      conf.set("spark.storage.replication.policy",
+        classOf[SortOnHostNameBlockReplicationPolicy].getName)
+      conf.set("spark.network.maxRemoteBlockSizeFetchToMem", 
maxRemoteBlockSizeFetchToMem.toString)
+      val bm1 = makeBlockManager(12000, "host-1", master) // BM with 
sufficient memory
+      val bm2 = makeBlockManager(10, "host-2", master) // BM with less memory
+      val bm3 = makeBlockManager(10, "host-3", master) // BM with less memory
+      val bm4 = makeBlockManager(12000, "host-4", master) // BM with 
sufficient memory
+      val bm5 = makeBlockManager(10, "host-5", master) // BM with less memory
+      val bm6 = makeBlockManager(12000, "host-6", master) // BM with 
sufficient memory
+
+      val data = (1 to 10).toArray
+      val serializedData = serializer.newInstance().serialize(data).array()
+      val blockId = "list"
+      bm1.putIterator(blockId, List(data).iterator, storageLevel, tellMaster = 
true)
+
+      // Replication will be tried by bm1 in this order: bm2, bm3, bm4, bm5, 
bm6
+      // bm2, bm3 and bm5 had low memory. So Block can't be stored by them.
+      // bm6 has sufficient memory but the maxReplicationFailures is set to 2.
+      // So retry won't happen on bm6 as 3 failure (by trying on bm2, bm3 and 
bm5) have
 
 Review comment:
   does it mean the replication is failed (need 3 copies, but only get 2)?
   
   This is kind of unrelated to this PR, but I'd like to know some more context 
to better understand this patch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to