w4-sjcho commented on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts URL: https://github.com/apache/spark/pull/27364#issuecomment-611846023 1) Yes, IMHO my claim is true. In order to use NFS using PVC, you need to first create an empty new PVC with NFS. Kubernetes' [NFS provisioner](https://github.com/kubernetes-incubator/external-storage/tree/master/nfs) will create a new empty dir in NFS under some pre-configured dir. For example, `/nfs/k8s/sjcho-my-notebook-pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b`. Then I could add files to process there using some other file-copying job, and then run spark job using that populated PVC. And then to get the results out, I run another file-copying job to get the files out. In in theory works, but for data analysis tasks, is quite cumbersome. With this change, one could simply use existing files in NFS, say `/nfs/home/sjcho/[email protected]` from the spark job directly, and also write the results directly to some existing dir under NFS such as `/nfs/home/sjcho/output`. 2) Did you use this patch in your cluster with your local NFS? Yes 3) When you try PVC NFS, is there any issue on the existing data? Using PVC NFS works, but as described above, is cumbersome. 4) If you use this in the cloud, did you try to use AWS EFS? We're using GCP, and works on [Cloud Filestore](https://cloud.google.com/filestore/docs), which I guess is the GCP's version of AWS EFS. 5) Is there any issue in that cloud NFS environment? As far as I can tell, this PR doesn't use any features other than the features provided by Kubernetes itself, so I think there would be no issues. This PR merely enables an existing volume type [`nfs`](https://kubernetes.io/docs/concepts/storage/volumes/#nfs) supported officially by Kubernetes, just like spark is currently supporting `hostPath` and `persistentVolumeClaim` right now, which you can see from the fact that they are described in the same page. As a sidenote, PVCs are good for all-single-k8s-cluster environment, but if someone wants to use Spark with NFS with multiple k8s clusters and/or with other outside of k8s tools, I think Spark definitely needs to support `nfs` volume types too.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
