<https://lh3.googleusercontent.com/-Wdg-X-Rr7BY/WnP5DWd6wcI/AAAAAAAABTk/47fGDLCD2c0a9qRvd4f2fBHgIEAcpVhNgCLcBGAs/s1600/Picture2.png>
Hi everyone, I want to build a Kubernetes cluster on bare-metal servers. Our cluster will be dedicated to MachineLearning/DeepLearning. We have 1 master and 3 nodes (workers). Each node has 8 GPUs and about 1TB local storage. Our cluster will be used by multiple users. Users will login master server, create pods for their model (with requested GPU/CPU). I will use Persistent Volume to provision local storage, then the users can claim the Persistent Volume and use it in the pod to store their MachineLearning datasets and outputs. Because the dataset can be stored in one worker node but be processed in other workers, we want to use Infiniband over RDMA to boost the storage-to-cpu loading speed. We also want to use Infiniband to leverage Nvidia GPU-redirect between nodes. Our question is: 1. Persistent Volume has many options to provision the storage as in this link <https://kubernetes.io/docs/concepts/storage/persistent-volumes/>, so which solution is suitable for our case? (we thought about NFS but it might be slow) 2. Our cluster will be dedicated to multiple clients, is there any problem when we use InfiniBand for multiple client clusters. Thank you. -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.