dongjoon-hyun opened a new pull request #30084:
URL: https://github.com/apache/spark/pull/30084


   ### What changes were proposed in this pull request?
   
   This PR aims to detect duplicate `mountPath` and stop the job.
   
   ### Why are the changes needed?
   
   If there is a conflict on `mountPath`, the pod is created and repeats the 
following error messages and keeps running. Spark job should not keep running 
and wasting the cluster resources.
   ```
   $ k get pod -l 'spark-role in (driver,executor)'
   NAME    READY   STATUS    RESTARTS   AGE
   tpcds   1/1     Running   0          33m
   ```
   
   ```
   20/10/18 05:09:26 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
notifying snapshot subscriber.
   io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
POST at: ...
   Message: Pod "tpcds-exec-1" is invalid: 
spec.containers[0].volumeMounts[1].mountPath:
   Invalid value: "/data1": must be unique.
   ...
   ```
   We had better fail at Spark side.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but this is a bug fix.
   
   ### How was this patch tested?
   
   Pass the CI with the newly added test case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to