[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-06-02 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r644202270



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator(
 }
   }
 
+  private def replacePVCsIfNeeded(
+  pod: Pod,
+  resources: Seq[HasMetadata],
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {
+val replacedResources = mutable.ArrayBuffer[HasMetadata]()

Review comment:
   You are right~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-06-02 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r644200182



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator(
 }
   }
 
+  private def replacePVCsIfNeeded(
+  pod: Pod,
+  resources: Seq[HasMetadata],
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {
+val replacedResources = mutable.ArrayBuffer[HasMetadata]()

Review comment:
   Yes, that's the point. It doesn't matter. So, I used `ArrayBuffer` 
simply to keep the input order.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r643668281



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator(
 }
   }
 
+  private def replacePVCsIfNeeded(
+  pod: Pod,
+  resources: Seq[HasMetadata],
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {
+val replacedResources = mutable.ArrayBuffer[HasMetadata]()

Review comment:
   Ya, maybe. I didn't try to add another semantic like adding uniqueness 
here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633254123



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -70,6 +70,15 @@ private[spark] object Config extends Logging {
   .booleanConf
   .createWithDefault(false)
 
+  val KUBERNETES_DRIVER_REUSE_PVC =
+ConfigBuilder("spark.kubernetes.driver.reusePersistentVolumeClaim")
+  .doc("If true, driver pod tries to reuse driver-owned on-demand 
persistent volume claims " +

Review comment:
   Technically, *driver* is the actor who is re-injecting driver-owned PVCs.

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -70,6 +70,15 @@ private[spark] object Config extends Logging {
   .booleanConf
   .createWithDefault(false)
 
+  val KUBERNETES_DRIVER_REUSE_PVC =
+ConfigBuilder("spark.kubernetes.driver.reusePersistentVolumeClaim")
+  .doc("If true, driver pod tries to reuse driver-owned on-demand 
persistent volume claims " +
+"of the deleted executor pods. This can be useful to reduce executor 
pod creation delay. " +
+s"This requires ${KUBERNETES_DRIVER_OWN_PVC.key}=true.")

Review comment:
   Sure. Which content do you want to see?
   
   BTW, the main reason why it's not returned immediately is that the executor 
pod is in `Terminating` K8s status by K8s while the executor JVM is already 
dead.

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by removing from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating` status. It's 
still in the snapshot and connected to the PVCs.

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by being removed from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating` status. It's 
still in the snapshot and connected to the PVCs.

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by being removed from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating`-status pods. 
It's still in the snapshot and connected to the PVCs.

##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going 

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633279427



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   I guess this one is answered correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633265397



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -70,6 +70,15 @@ private[spark] object Config extends Logging {
   .booleanConf
   .createWithDefault(false)
 
+  val KUBERNETES_DRIVER_REUSE_PVC =
+ConfigBuilder("spark.kubernetes.driver.reusePersistentVolumeClaim")
+  .doc("If true, driver pod tries to reuse driver-owned on-demand 
persistent volume claims " +
+"of the deleted executor pods. This can be useful to reduce executor 
pod creation delay. " +
+s"This requires ${KUBERNETES_DRIVER_OWN_PVC.key}=true.")

Review comment:
   I got what is your concerns.
   
   When you are saying `the PVC of removed executors`, you are referring the 
Spark executor JVM inside pod. Here, in the description, I chose `xxx of the 
deleted executor pods` to avoid that confusions.
   
   For example, the pod in `Terminating` status is not deleted yet. So, it's 
not included at `the deleted executor pods`. So, there is no reusable PVC in 
this case. I'll make it more clear in the conf description.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633258964



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by being removed from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating`-status pods. 
Those PVCs are still in the snapshot and connected to the pods.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633258964



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by being removed from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating`-status pods. 
It's still in the snapshot and connected to the PVCs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633258964



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by being removed from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating` status. It's 
still in the snapshot and connected to the PVCs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633258964



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -125,6 +125,14 @@ private[spark] class ExecutorPodsAllocator(
 newlyCreatedExecutors --= k8sKnownExecIds
 schedulerKnownNewlyCreatedExecs --= k8sKnownExecIds
 
+// Although we are going to delete some executors due to timeout in this 
function,
+// it takes undefined time before the actual deletion. Hence, we should 
collect all PVCs
+// in use at the beginning. False positive is okay in this context in 
order to be safe.

Review comment:
   Spark only gets `snapshots` here. So, the snapshot cannot be the same 
with K8s AS-IS status. Always, there is a delay and some deleted executors will 
be notified at the next snapshot by removing from the snapshot.
   > both in-use and not in-use (from removed executors)?
   
   In addition, we cannot reclaim the PVC in the `Terminating` status. It's 
still in the snapshot and connected to the PVCs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633256392



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -70,6 +70,15 @@ private[spark] object Config extends Logging {
   .booleanConf
   .createWithDefault(false)
 
+  val KUBERNETES_DRIVER_REUSE_PVC =
+ConfigBuilder("spark.kubernetes.driver.reusePersistentVolumeClaim")
+  .doc("If true, driver pod tries to reuse driver-owned on-demand 
persistent volume claims " +
+"of the deleted executor pods. This can be useful to reduce executor 
pod creation delay. " +
+s"This requires ${KUBERNETES_DRIVER_OWN_PVC.key}=true.")

Review comment:
   Sure. Which content do you want to see?
   
   BTW, the main reason why it's not returned immediately is that the executor 
pod is in `Terminating` K8s status by K8s while the executor JVM is already 
dead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-05-17 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r633254123



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -70,6 +70,15 @@ private[spark] object Config extends Logging {
   .booleanConf
   .createWithDefault(false)
 
+  val KUBERNETES_DRIVER_REUSE_PVC =
+ConfigBuilder("spark.kubernetes.driver.reusePersistentVolumeClaim")
+  .doc("If true, driver pod tries to reuse driver-owned on-demand 
persistent volume claims " +

Review comment:
   Technically, *driver* is the actor who is re-injecting driver-owned PVCs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org