Github user baluchicken commented on the issue:
https://github.com/apache/spark/pull/21067
I ran some more tests about this. I think we can say that this change can
add resiliency to spark batch jobs where just like in case of YARN Spark will
retry the job from the beginning if an error happened.
Also it can add resiliency to the Spark Streaming apps. I fully understand
your concerns but if someone is going to submit a resilient Spark Streaming
app it will use Spark feature Checkpointing. For checkpointing he/she should
use some kind of PersistentVolume otherwise all info saved to this dir will be
lost in case of a node failure. For PVC the accessMode should be a
ReadWriteOnce solution because for this amount of data it is way faster than
the ReadWriteMany ones.
My new tests used the same approach described above with one modifications
I enabled the checkpointing dir backed with ReadWriteOnce PVC. ReadWriteOnce
storage can only be attached to one node. I thought Kubernetes will detach this
volume once the Node become "NotReady", but other thing happened. Kubernetes
does not detached the volume from the unknown node so despite of the Job
Controller created a new driver pod to replace the Unknown one it remained in
Pending state because of required PVC still attached to a different node. Once
the partitioned node become available again the unknown old driver pod got
terminated, the volume got unattached and get reattached to the new driver pod
which state now changed from pending to running.
So I think there is no problem with the correctness here. We can maybe add
a warning to the documentation that if someone wants to use a ReadWriteMany
backed checkpoint dir correctness issue may arise, but otherwise maybe I am
still missing something but I think don't.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]