<https://lh3.googleusercontent.com/-Wdg-X-Rr7BY/WnP5DWd6wcI/AAAAAAAABTk/47fGDLCD2c0a9qRvd4f2fBHgIEAcpVhNgCLcBGAs/s1600/Picture2.png>




Hi everyone, 


I want to build a Kubernetes cluster on bare-metal servers. Our cluster 
will be dedicated to MachineLearning/DeepLearning. 

We have 1 master and 3 nodes (workers). Each node has 8 GPUs and about 1TB 
local storage. Our cluster will be used by multiple users.

Users will login master server, create pods for their model (with requested 
GPU/CPU). 


I will use Persistent Volume to provision local storage, then the users can 
claim the Persistent Volume and use it in the pod to store their 
MachineLearning datasets and outputs. 

Because the dataset can be stored in one worker node but be processed in 
other workers, we want to use Infiniband over RDMA to boost the 
storage-to-cpu loading speed. 

We also want to use Infiniband to leverage Nvidia GPU-redirect between 
nodes. 


Our question is:

1. Persistent Volume has many options to provision the storage as in this 
link <https://kubernetes.io/docs/concepts/storage/persistent-volumes/>, so 
which solution is suitable for our case? (we thought about NFS but it might 
be slow)

2. Our cluster will be dedicated to multiple clients, is there any problem 
when we use InfiniBand for multiple client clusters. 


Thank you. 




-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to