w4-sjcho commented on issue #27364: [SPARK-31394][K8S] Adds support for 
Kubernetes NFS volume mounts
URL: https://github.com/apache/spark/pull/27364#issuecomment-611846023
 
 
   1) Yes, IMHO my claim is true.
   
   In order to use NFS using PVC, you need to first create an empty new PVC 
with NFS. Kubernetes' [NFS 
provisioner](https://github.com/kubernetes-incubator/external-storage/tree/master/nfs)
 will create a new empty dir in NFS under some pre-configured dir. For example, 
`/nfs/k8s/sjcho-my-notebook-pvc-dce84888-7a9d-11e6-b1ee-5254001e0c1b`. Then I 
could add files to process there using some other file-copying job, and then 
run spark job using that populated PVC. And then to get the results out, I run 
another file-copying job to get the files out.
   
   In in theory works, but for data analysis tasks, is quite cumbersome. With 
this change, one could simply use existing files in NFS, say 
`/nfs/home/sjcho/[email protected]` from the spark job directly, and also 
write the results directly to some existing dir under NFS such as 
`/nfs/home/sjcho/output`.
   
   2) Did you use this patch in your cluster with your local NFS?
   Yes
   
   3) When you try PVC NFS, is there any issue on the existing data?
   Using PVC NFS works, but as described above, is cumbersome.
   
   4) If you use this in the cloud, did you try to use AWS EFS?
   We're using GCP, and works on [Cloud 
Filestore](https://cloud.google.com/filestore/docs), which I guess is the GCP's 
version of AWS EFS.
   
   5) Is there any issue in that cloud NFS environment?
   As far as I can tell, this PR doesn't use any features other than the 
features provided by Kubernetes itself, so I think there would be no issues. 
This PR merely enables an existing volume type 
[`nfs`](https://kubernetes.io/docs/concepts/storage/volumes/#nfs) supported 
officially by Kubernetes, just like spark is currently supporting `hostPath` 
and `persistentVolumeClaim` right now, which you can see from the fact that 
they are described in the same page.
   
   As a sidenote, PVCs are good for all-single-k8s-cluster environment, but if 
someone wants to use Spark with NFS with multiple k8s clusters and/or with 
other outside of k8s tools, I think Spark definitely needs to support `nfs` 
volume types too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to