Another way to deal with it is to use KVM agent hooks: https://github.com/apache/cloudstack/blob/8f6721ed4c4e1b31081a951c62ffbe5331cf16d4/agent/conf/agent.properties#L123
You can implement the logic in Groovy to modify XML during the start to support extra devices out of CloudStack management. On Fri, Feb 23, 2024 at 2:36 PM Jorge Luiz Correa <jorge.l.cor...@embrapa.br.invalid> wrote: > Hi Bryan! We are using here but in a different way, customized for our > environment and using how it is possible the features of CloudStack. In > documentation we can see support for some GPU models a little bit old > today. > > We are using pci passthrough. All hosts with GPU are configured to boot > with IOMMU and vfio-pci, not loading kernel modules for each GPU. > > Then, we create a serviceoffering to describe VMs that will have GPU. In > this serviceoffering we use the serviceofferingdetails[1].value field to > insert a block of configuration related to the GPU. It is something like > "<device> ... <hostdev> ... address type=pci" that describes the PCI bus > from each GPU. Then, we use tags to force this computeoffering to run only > in hosts with GPUs. > > We create a Cloudstack cluster with a lot of hosts equipped with GPUs. When > a user needs a VM with GPU he/she should use the created computeoffering. > VM will be instantiated in some host of the cluster and GPUs are > passthrough to VM. > > There are no control executed by cloudstack. For example, it can try to > instantiate a VM in a host when a GPU is already being used (will fail). > Our management is that the ROOT admin always controls that creation. We > launch all VMs using all GPUs from the infrastructure. Then we use a queue > manager to run jobs in those VMs with GPUs. When a user needs a dedicated > VM to develop something, we can shutdown a VM already running (that is part > of the queue manager as processor node) and then create this dedicated VM, > that uses the GPUs isolated. > > There are some possibilities when using GPUs. For example, some models > accept virtualization when we can divide a GPU. In that case, Cloudstack > would need to support that, so it would manage the driver, creating the > virtual GPUs based on information input from the user, as memory size. > Then, it should manage the hypervisor to passthrough the virtual gpu to VM. > > Another possibility that would help us in our scenario is to make some > control about PCI buses in hosts. For example, if Cloustack could check if > a PCI is being used in some host and then use this information in VM > scheduling, would be great. Cloudstack could launch VMs in a host that has > a PCI address free. This would be used not only for GPUs, but any PCI > device. > > I hope this can help in some way, to think of new scenarios etc. > > Thank you! > > Em qui., 22 de fev. de 2024 às 07:56, Bryan Tiang < > bryantian...@hotmail.com> > escreveu: > > > Hi Guys, > > > > Anyone running Cloudstack with GPU Support in Production? Say NVIDIA H100 > > or AMD M1300X? > > > > Just want to know if there is any support for this still on going, or > > anyone who is running a cloud business with GPUs. > > > > Regards, > > Bryan > > > > -- > __________________________ > Aviso de confidencialidade > > Esta mensagem da > Empresa Brasileira de Pesquisa Agropecuaria (Embrapa), empresa publica > federal regida pelo disposto na Lei Federal no. 5.851, de 7 de dezembro > de 1972, e enviada exclusivamente a seu destinatario e pode conter > informacoes confidenciais, protegidas por sigilo profissional. Sua > utilizacao desautorizada e ilegal e sujeita o infrator as penas da lei. > Se voce a recebeu indevidamente, queira, por gentileza, reenvia-la ao > emitente, esclarecendo o equivoco. > > Confidentiality note > > This message from > Empresa Brasileira de Pesquisa Agropecuaria (Embrapa), a government > company established under Brazilian law (5.851/72), is directed > exclusively to its addressee and may contain confidential data, > protected under professional secrecy rules. Its unauthorized use is > illegal and may subject the transgressor to the law's penalties. If you > are not the addressee, please send it back, elucidating the failure. >