Re: [RFC] Dynamic creation of VFs in a network definition containing an SRIOV device

2020-07-28 Thread Laine Stump

On 7/28/20 4:46 PM, Daniel Henrique Barboza wrote:



On 7/28/20 12:03 PM, Paulo de Rezende Pinatti wrote:

Context:

Libvirt can already detect the active VFs of an SRIOV PF device 
specified in a network definition and automatically assign these VFs 
to guests via an  entry referring to that network in the 
domain definition. This functionality, however, depends on the system 
administrator having activated in advance the desired number of VFs 
outside of libvirt (either manually or through system scripts).


It would be more convenient if the VFs activation could also be 
managed inside libvirt so that the whole management of the VF pool is 
done exclusively by libvirt and in only one place (the network 
definition) rather than spread in different components of the system.


Proposal:

We can extend the existing network definition by adding a new tag 
 as a child of the tag  in order to allow the user to specify 
how many VFs they wish to have activated for the corresponding SRIOV 
device when the network is started. That would look like the following:



    sriov-pool
    
  
 
  
    


At xml definition time nothing gets changed on the system, as it is 
today. When the network is started with 'virth net-start sriov-pool' 
then libvirt will activate the desired number of VFs as specified in 
the tag  of the network definition.


The operation might require resetting 'sriov_numvfs' to zero first in 
case the number of VFs currently active differs from the desired value.



You don't specifically say it here, but any time sriov_numvfs is changed 
(and it must be changed by first setting it to 0, then back to the new 
number), *all* existing VFs are destroyed, and then recreated. And when 
it is recreated, it is a completely new device, and any previous use of 
the device will be disrupted/forgotten/whatever - the exact behavior of 
any user of any of the previously existing devices is undefined, but it 
certainly will no longer work, and will be unrecoverable without 
starting over from scratch.



This means that any sort of API that can change sriov_numvfs has the 
potential to seriously mess up anything using the VFs, and so must take 
extra care to not do anything unless there's no possibility of that 
happening. Note that SR-IOV VFs aren't just used for assigning to guests 
with vfio. They can also be used for macvtap pass-through mode, and now 
for vdpa, and possibly/probably other things.



In order to avoid the situation where the user tries to start the 
network when a VF is already assigned to a running guest, the 
implementation will have to ensure all existing VFs of the target PF 
are not in use, otherwise VFs would be inadvertently hot-unplugged 
from guests upon network start. In such cases, trying to start the 
network will then result in an error.


I'm not sure about the "echo 0 > sriov_numvfs' part. It works like 
that for Mellanox
CX-4 and CX-5 cards but I can't say it works like that for every other 
SR-IOV card out

there.



It works that way for every SR-IOV card I've ever seen. If it isn't 
written in a standards document somewhere, it is at least a defacto 
standard.



Sooner enough, we'll have to handle specific behavior for the cards to 
create

the VFs.



If you're wondering if different cards create their VFs in different 
ways - at a lower level that is possibly the case. I know that in the 
past (before the sriov_totalvfs / sriov_numvfs sysfs interface existed) 
the way to create a certain number of VFs was to add options to the PF 
driver options, and the exact options were different for each vendor. 
The sysfs interface was at least partly intended to remedy that 
discrepancy between drivers.




Perhaps Laine can comment on this.

About the whole idea, it kind of changes the design of this network 
pool. As it is today,
at least from my reading of [1], Libvirt will use any available VF 
from the pool and allocate it
to the guest, coping with the existing host VF settings. Using this 
new option, Libvirt is now
setting the VFs to a specific number, which might as well be less than 
the actual setting,

disrupting the host for no apparent reason.

I would be on board with this idea if:

1 - The attribute is changed to "minimal VFs required for this pool" 
rather than "change the host
to match this VF number". This means that we wouldn't tamper with the 
created VFs if the host
already has more VFs that specified. In your example up there, setting 
10 VFs, what if the host
has 20 VFs? Why should Libvirt care about taking down 10 VFs that it 
wouldn't use in the

first place?

2 - we find a universal way (or as much closer as universal) to handle 
the creation of VFs.



Writing to sriov_numvfs is afaik, the universal interface to create VFs.




3 - we guarantee that the process of VF creation, which will take down 
all existing VFs in
case of CX-5 cards with echo 0 > numvfs for example, wouldn't disrupt 
the host in any

way.



Definitely this 

Re: [RFC] Dynamic creation of VFs in a network definition containing an SRIOV device

2020-07-28 Thread Daniel Henrique Barboza




On 7/28/20 12:03 PM, Paulo de Rezende Pinatti wrote:

Context:

Libvirt can already detect the active VFs of an SRIOV PF device specified in a 
network definition and automatically assign these VFs to guests via an 
 entry referring to that network in the domain definition. This 
functionality, however, depends on the system administrator having activated in 
advance the desired number of VFs outside of libvirt (either manually or through 
system scripts).

It would be more convenient if the VFs activation could also be managed inside 
libvirt so that the whole management of the VF pool is done exclusively by 
libvirt and in only one place (the network definition) rather than spread in 
different components of the system.

Proposal:

We can extend the existing network definition by adding a new tag  as a child of 
the tag  in order to allow the user to specify how many VFs they wish to have 
activated for the corresponding SRIOV device when the network is started. That would look 
like the following:


    sriov-pool
    
  
     
  
    


At xml definition time nothing gets changed on the system, as it is today. When the 
network is started with 'virth net-start sriov-pool' then libvirt will activate the 
desired number of VFs as specified in the tag  of the network definition.

The operation might require resetting 'sriov_numvfs' to zero first in case the 
number of VFs currently active differs from the desired value. In order to 
avoid the situation where the user tries to start the network when a VF is 
already assigned to a running guest, the implementation will have to ensure all 
existing VFs of the target PF are not in use, otherwise VFs would be 
inadvertently hot-unplugged from guests upon network start. In such cases, 
trying to start the network will then result in an error.


I'm not sure about the "echo 0 > sriov_numvfs' part. It works like that for 
Mellanox
CX-4 and CX-5 cards but I can't say it works like that for every other SR-IOV 
card out
there. Sooner enough, we'll have to handle specific behavior for the cards to 
create
the VFs. Perhaps Laine can comment on this.

About the whole idea, it kind of changes the design of this network pool. As it 
is today,
at least from my reading of [1], Libvirt will use any available VF from the 
pool and allocate it
to the guest, coping with the existing host VF settings. Using this new option, 
Libvirt is now
setting the VFs to a specific number, which might as well be less than the 
actual setting,
disrupting the host for no apparent reason.

I would be on board with this idea if:

1 - The attribute is changed to "minimal VFs required for this pool" rather than 
"change the host
to match this VF number". This means that we wouldn't tamper with the created 
VFs if the host
already has more VFs that specified. In your example up there, setting 10 VFs, 
what if the host
has 20 VFs? Why should Libvirt care about taking down 10 VFs that it wouldn't 
use in the
first place?

2 - we find a universal way (or as much closer as universal) to handle the 
creation of VFs.

3 - we guarantee that the process of VF creation, which will take down all 
existing VFs in
case of CX-5 cards with echo 0 > numvfs for example, wouldn't disrupt the host 
in any
way.


(1) is an easier sell. Rename the attribute to "vf minimalNum" or something 
like that, then
refuse to net-start if the host has less than the set amount of VFs checking 
sriov_numvfs.
Start the network if sriov_numvfs >= minimal. This would bring immediate value 
to the existing
design, allowing the user to specify the minimal amount of VFs the user intends 
to
consume from the pool.

(2) and (3) are more complicated. Specially (2).


Thanks,


DHB




[1] 
https://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition



Stopping the network with 'virsh net-destroy' will cause all VFs to be removed. 
Similarly to when starting the network, the implementation will also need to 
verify for running guests in order to prevent inadvertent hot-unplugging.

Is the functionality proposed above desirable?






[RFC] Dynamic creation of VFs in a network definition containing an SRIOV device

2020-07-28 Thread Paulo de Rezende Pinatti

Context:

Libvirt can already detect the active VFs of an SRIOV PF device 
specified in a network definition and automatically assign these VFs to 
guests via an  entry referring to that network in the domain 
definition. This functionality, however, depends on the system 
administrator having activated in advance the desired number of VFs 
outside of libvirt (either manually or through system scripts).


It would be more convenient if the VFs activation could also be managed 
inside libvirt so that the whole management of the VF pool is done 
exclusively by libvirt and in only one place (the network definition) 
rather than spread in different components of the system.


Proposal:

We can extend the existing network definition by adding a new tag  
as a child of the tag  in order to allow the user to specify how 
many VFs they wish to have activated for the corresponding SRIOV device 
when the network is started. That would look like the following:



   sriov-pool
   
 

 
   


At xml definition time nothing gets changed on the system, as it is 
today. When the network is started with 'virth net-start sriov-pool' 
then libvirt will activate the desired number of VFs as specified in the 
tag  of the network definition.


The operation might require resetting 'sriov_numvfs' to zero first in 
case the number of VFs currently active differs from the desired value. 
In order to avoid the situation where the user tries to start the 
network when a VF is already assigned to a running guest, the 
implementation will have to ensure all existing VFs of the target PF are 
not in use, otherwise VFs would be inadvertently hot-unplugged from 
guests upon network start. In such cases, trying to start the network 
will then result in an error.


Stopping the network with 'virsh net-destroy' will cause all VFs to be 
removed. Similarly to when starting the network, the implementation will 
also need to verify for running guests in order to prevent inadvertent 
hot-unplugging.


Is the functionality proposed above desirable?


--
Thanks and best regards,

Paulo de Rezende Pinatti