[GitHub] [spark] holdenk commented on a change in pull request #33508: [WIP][SPARK-36058][K8S] Add support for statefulset APIs in K8s

GitBox Mon, 26 Jul 2021 19:44:33 -0700


holdenk commented on a change in pull request #33508:
URL: https://github.com/apache/spark/pull/33508#discussion_r677076155




##########
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
##########
@@ -260,16 +267,22 @@ private[spark] class BasicExecutorFeatureStep(
         .withUid(pod.getMetadata.getUid)
         .build()
     }
+
+    val policy = kubernetesConf.get(KUBERNETES_ALLOCATION_PODSALLOCATOR) match 
{
+      case "statefulset" => "Always"

Review comment:
       So I did some thinking about this (still working on the PV integration 
test but the code-compile-test cycle is slow), looking at 
`TorrentBroadcast.scala & `BroadcastManager.scala` if an executor is restarted 
we'll come back up with an empty `cachedValues` and then if there are any 
broadcast variables referenced we'll just end up fetching them anew.
   
   We would then run into problems because we'll end up trying to store the 
block and the disk block manager might say it contains the block when go to do 
the write.
   
   There are a few ways I can think of addressing this
   
   1) Have the reads "fall through" (e.g. check to see if there present on disk 
regardless of what we know about in the block manager)
   1) Ignore existing file when putting a broadcast block and just hard-write it
   1) On startup (in the shell script for the dockerfile) cleanup the contents 
of the disk blocks location
   
   Personally I'm tempted to do the cleanup approach, since we might have 
bounced for an OOM or something and I don't want to read back a partially 
written block. WDYT @dongjoon-hyun ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] holdenk commented on a change in pull request #33508: [WIP][SPARK-36058][K8S] Add support for statefulset APIs in K8s

Reply via email to