dongjoon-hyun opened a new pull request, #55642:
URL: https://github.com/apache/spark/pull/55642

   ### What changes were proposed in this pull request?
   
   This PR aims to support a new `ExecutorPVCResizePlugin` that monitors 
executor PVC disk usage
   and grows each PVC's `spec.resources.requests.storage` when usage exceeds a
   threshold.
   
   The executor side reports the max filesystem usage ratio across
   `DiskBlockManager.localDirs`. The driver side patches the executor pod's 
PVCs to
   `currentSize * (1 + factor)` when the reported ratio exceeds the threshold.
   
   New configurations:
   
   | Key | Default | Meaning |
   |---|---|---|
   | `spark.kubernetes.executor.pvc.resizeInterval` | `0min` | Resize check 
interval. `0` disables. |
   | `spark.kubernetes.executor.pvc.resizeThreshold` | `0.5` | Usage ratio 
above which a resize is triggered. |
   | `spark.kubernetes.executor.pvc.resizeFactor` | `1.0` | Growth factor. |
   
   ### Why are the changes needed?
   
   PVC-backed `SPARK_LOCAL_DIRS` must be sized conservatively up front to avoid
   mid-job disk-full failures, which wastes storage cost. `ExecutorResizePlugin`
   already established the observe-and-patch pattern for memory; this extends 
it to
   PVC storage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. The user needs to set this to `spark.plugins` explicitly.
   
   **SUBMIT**
   
   ```
   bin/spark-submit \                                                           
                       
   --master k8s://$K8S_MASTER \                                                 
                       
   --deploy-mode cluster \                                                      
                       
   -c spark.executor.cores=4 \                                                  
                       
   -c spark.executor.memory=4g \                                                
                       
   -c spark.kubernetes.container.image=docker.apple.com/d_hyun/spark:20260430 \ 
                       
   -c spark.kubernetes.authenticate.driver.serviceAccountName=spark \           
                       
   -c spark.kubernetes.driver.pod.name=pi \                                     
                       
   -c spark.kubernetes.executor.podNamePrefix=pi \                              
                       
   -c 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
 \     
   -c 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
 \ 
   -c 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
 \
   -c 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=50Gi
 \
   -c 
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=gp3
 \
   -c spark.kubernetes.driver.podTemplateFile=eks-root-pod.yml \                
                       
   -c spark.kubernetes.executor.podTemplateFile=eks-root-pod.yml \              
                       
   -c 
spark.plugins=org.apache.spark.scheduler.cluster.k8s.ExecutorPVCResizePlugin \  
                 
   -c spark.kubernetes.executor.pvc.resizeInterval=1m \                         
                       
   --class org.apache.spark.examples.SparkPi \                                  
                       
   local:///opt/spark/examples/jars/spark-examples.jar 400000    
   ```
   
   **EXECUTOR SIZE REPORTING**
   
   ```
   $ kubectl logs -f pi-exec-1 | grep Plugin
   26/05/01 01:22:54 INFO ExecutorPVCResizeExecutorPlugin: Reporting max PVC 
disk usage ratio for executor 1: 0.6136656796630462
   26/05/01 01:23:54 INFO ExecutorPVCResizeExecutorPlugin: Reporting max PVC 
disk usage ratio for executor 1: 0.30591566408202353
   ```
   
   **RESIZED PVC**
   
   ```
   $ kubectl get pvc
   NAME              STATUS   VOLUME                                     
CAPACITY       ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
   pi-exec-1-pvc-0   Bound    pvc-d279a3da-ddfb-41c2-a32b-0f2bd83941c4   
107374182400   RWOP           gp3            <unset>                 2m28s
   pi-exec-2-pvc-0   Bound    pvc-79f092d3-4a8d-4981-946d-d745d4038fd6   50Gi   
        RWOP           gp3            <unset>                 2m28s
   ```
   
   ### How was this patch tested?
   
   Pass the CIs with a new `ExecutorPVCResizePluginSuite`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.7 (1M context)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to