Re: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-26 Thread Maciej Kucia
Appreciate the feedback. It seems the conclusion is that generally one can
safety enable large number of VFs with an exception of some limited
hardware configurations which might require reducing VFs number due to BIOS
limitation.

Thanks & Regards,
Maciej

2018-01-23 3:39 GMT+01:00 Blair Bethwaite :

> This is starting to veer into magic territory for my level of
> understanding so beware... but I believe there are (or could be
> depending on your exact hardware) PCI config space considerations.
> IIUC each SRIOV VF will have its own PCI BAR. Depending on the window
> size required (which may be determined by other hardware features such
> as flow-steering), you can potentially hit compatibility issues with
> your server BIOS not supporting mapping of addresses which surpass
> 4GB. This can then result in the device hanging on initialisation (at
> server boot) and effectively bricking the box until the device is
> removed.
>
> We have seen this first hand on a Dell R730 with Mellanox ConnectX-4
> card (there are several other Dell 13G platforms with the same BIOS
> chipsets). We were explicitly increasing the PCI BAR size for the
> device (not upping the number of VFs) in relation to a memory
> exhaustion issue when running MPI collective communications on hosts
> with 28+ cores, we only had 16 (or maybe 32, I forget) VFs configured
> in the firmware.
>
> At the end of that support case (which resulted in a replacement NIC),
> the support engineer's summary included:
> """
> -When a BIOS limits the BAR to be contained in the 4GB address space -
> it is a BIOS limitation.
> Unfortunately, there is no way to tell - Some BIOS implementations use
> proprietary heuristics to decide when to map a specific BAR below 4GB.
>
> -When SR-IOV is enabled, and num-vfs is high, the corresponding VF BAR
> can be huge.
> In this case, the BIOS may exhaust the ~2GB address space that it has
> available below 4GB.
> In this case, the BIOS may hang – and the server won’t boot.
> """
>
> At the very least you should ask your hardware vendors some very
> specific questions before doing anything that might change your PCI
> BAR sizes.
>
> Cheers,
>
> On 23 January 2018 at 11:44, Pedro Sousa  wrote:
> > Hi,
> >
> > I have sr-iov in production in some customers with maximum number of VFs
> and
> > didn't notice any performance issues.
> >
> > My understanding is that of course you will have performance penalty if
> you
> > consume all those vfs, because you're dividing the bandwidth across them,
> > but other than if they're are there doing nothing you won't notice
> anything.
> >
> > But I'm just talking from my experience :)
> >
> > Regards,
> > Pedro Sousa
> >
> > On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia  wrote:
> >>
> >> Thank you for the reply. I am interested in SR-IOV and pci whitelisting
> is
> >> certainly involved.
> >> I suspect that OpenStack itself can handle those numbers of devices,
> >> especially in telco applications where not much scheduling is being
> done.
> >> The feedback I am getting is from sysadmins who work on network
> >> virtualization but I think this is just a rumor without any proof.
> >>
> >> The question is if performance penalty from SR-IOV drivers or PCI itself
> >> is negligible. Should cloud admin configure maximum number of VFs for
> >> flexibility or should it be manually managed and balanced depending on
> >> application?
> >>
> >> Regards,
> >> Maciej
> >>
> >>>
> >>>
> >>> 2018-01-22 18:38 GMT+01:00 Jay Pipes :
> 
>  On 01/22/2018 11:36 AM, Maciej Kucia wrote:
> >
> > Hi!
> >
> > Is there any noticeable performance penalty when using multiple
> virtual
> > functions?
> >
> > For simplicity I am enabling all available virtual functions in my
> > NICs.
> 
> 
>  I presume by the above you are referring to setting your
>  pci_passthrough_whitelist on your compute nodes to whitelist all VFs
> on a
>  particular PF's PCI address domain/bus?
> 
> > Sometimes application is using only few of them. I am using Intel and
> > Mellanox.
> >
> > I do not see any performance drop but I am getting feedback that this
> > might not be the best approach.
> 
> 
>  Who is giving you this feedback?
> 
>  The only issue with enabling (potentially 254 or more) VFs for each PF
>  is that each VF will end up as a record in the pci_devices table in
> the Nova
>  cell database. Multiply 254 or more times the number of PFs times the
> number
>  of compute nodes in your deployment and you can get a large number of
>  records that need to be stored. That said, the pci_devices table is
> well
>  indexed and even if you had 1M or more records in the table, the
> access of a
>  few hundred of those records when the resource tracker does a
>  PciDeviceList.get_by_compute_node() [1] will still be quite fast.
> 
>  Best,
>  -jay
> 
>  [1]
>  https://github.com/open

Re: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-22 Thread Blair Bethwaite
This is starting to veer into magic territory for my level of
understanding so beware... but I believe there are (or could be
depending on your exact hardware) PCI config space considerations.
IIUC each SRIOV VF will have its own PCI BAR. Depending on the window
size required (which may be determined by other hardware features such
as flow-steering), you can potentially hit compatibility issues with
your server BIOS not supporting mapping of addresses which surpass
4GB. This can then result in the device hanging on initialisation (at
server boot) and effectively bricking the box until the device is
removed.

We have seen this first hand on a Dell R730 with Mellanox ConnectX-4
card (there are several other Dell 13G platforms with the same BIOS
chipsets). We were explicitly increasing the PCI BAR size for the
device (not upping the number of VFs) in relation to a memory
exhaustion issue when running MPI collective communications on hosts
with 28+ cores, we only had 16 (or maybe 32, I forget) VFs configured
in the firmware.

At the end of that support case (which resulted in a replacement NIC),
the support engineer's summary included:
"""
-When a BIOS limits the BAR to be contained in the 4GB address space -
it is a BIOS limitation.
Unfortunately, there is no way to tell - Some BIOS implementations use
proprietary heuristics to decide when to map a specific BAR below 4GB.

-When SR-IOV is enabled, and num-vfs is high, the corresponding VF BAR
can be huge.
In this case, the BIOS may exhaust the ~2GB address space that it has
available below 4GB.
In this case, the BIOS may hang – and the server won’t boot.
"""

At the very least you should ask your hardware vendors some very
specific questions before doing anything that might change your PCI
BAR sizes.

Cheers,

On 23 January 2018 at 11:44, Pedro Sousa  wrote:
> Hi,
>
> I have sr-iov in production in some customers with maximum number of VFs and
> didn't notice any performance issues.
>
> My understanding is that of course you will have performance penalty if you
> consume all those vfs, because you're dividing the bandwidth across them,
> but other than if they're are there doing nothing you won't notice anything.
>
> But I'm just talking from my experience :)
>
> Regards,
> Pedro Sousa
>
> On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia  wrote:
>>
>> Thank you for the reply. I am interested in SR-IOV and pci whitelisting is
>> certainly involved.
>> I suspect that OpenStack itself can handle those numbers of devices,
>> especially in telco applications where not much scheduling is being done.
>> The feedback I am getting is from sysadmins who work on network
>> virtualization but I think this is just a rumor without any proof.
>>
>> The question is if performance penalty from SR-IOV drivers or PCI itself
>> is negligible. Should cloud admin configure maximum number of VFs for
>> flexibility or should it be manually managed and balanced depending on
>> application?
>>
>> Regards,
>> Maciej
>>
>>>
>>>
>>> 2018-01-22 18:38 GMT+01:00 Jay Pipes :

 On 01/22/2018 11:36 AM, Maciej Kucia wrote:
>
> Hi!
>
> Is there any noticeable performance penalty when using multiple virtual
> functions?
>
> For simplicity I am enabling all available virtual functions in my
> NICs.


 I presume by the above you are referring to setting your
 pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a
 particular PF's PCI address domain/bus?

> Sometimes application is using only few of them. I am using Intel and
> Mellanox.
>
> I do not see any performance drop but I am getting feedback that this
> might not be the best approach.


 Who is giving you this feedback?

 The only issue with enabling (potentially 254 or more) VFs for each PF
 is that each VF will end up as a record in the pci_devices table in the 
 Nova
 cell database. Multiply 254 or more times the number of PFs times the 
 number
 of compute nodes in your deployment and you can get a large number of
 records that need to be stored. That said, the pci_devices table is well
 indexed and even if you had 1M or more records in the table, the access of 
 a
 few hundred of those records when the resource tracker does a
 PciDeviceList.get_by_compute_node() [1] will still be quite fast.

 Best,
 -jay

 [1]
 https://github.com/openstack/nova/blob/stable/pike/nova/compute/resource_tracker.py#L572
 and then

 https://github.com/openstack/nova/blob/stable/pike/nova/pci/manager.py#L71

> Any recommendations?
>
> Thanks,
> Maciej
>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

 ___
 O

Re: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-22 Thread Pedro Sousa
Hi,

I have sr-iov in production in some customers with maximum number of VFs
and didn't notice any performance issues.

My understanding is that of course you will have performance penalty if you
consume all those vfs, because you're dividing the bandwidth across them,
but other than if they're are there doing nothing you won't notice anything.

But I'm just talking from my experience :)

Regards,
Pedro Sousa

On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia  wrote:

> Thank you for the reply. I am interested in SR-IOV and pci whitelisting is
> certainly involved.
> I suspect that OpenStack itself can handle those numbers of devices,
> especially in telco applications where not much scheduling is being done.
> The feedback I am getting is from sysadmins who work on network
> virtualization but I think this is just a rumor without any proof.
>
> The question is if performance penalty from SR-IOV drivers or PCI itself
> is negligible. Should cloud admin configure maximum number of VFs for
> flexibility or should it be manually managed and balanced depending on
> application?
>
> Regards,
> Maciej
>
>
>>
>> 2018-01-22 18:38 GMT+01:00 Jay Pipes :
>>
>>> On 01/22/2018 11:36 AM, Maciej Kucia wrote:
>>>
 Hi!

 Is there any noticeable performance penalty when using multiple virtual
 functions?

 For simplicity I am enabling all available virtual functions in my NICs.

>>>
>>> I presume by the above you are referring to setting your
>>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a
>>> particular PF's PCI address domain/bus?
>>>
>>> Sometimes application is using only few of them. I am using Intel and
 Mellanox.

 I do not see any performance drop but I am getting feedback that this
 might not be the best approach.

>>>
>>> Who is giving you this feedback?
>>>
>>> The only issue with enabling (potentially 254 or more) VFs for each PF
>>> is that each VF will end up as a record in the pci_devices table in the
>>> Nova cell database. Multiply 254 or more times the number of PFs times the
>>> number of compute nodes in your deployment and you can get a large number
>>> of records that need to be stored. That said, the pci_devices table is well
>>> indexed and even if you had 1M or more records in the table, the access of
>>> a few hundred of those records when the resource tracker does a
>>> PciDeviceList.get_by_compute_node() [1] will still be quite fast.
>>>
>>> Best,
>>> -jay
>>>
>>> [1] https://github.com/openstack/nova/blob/stable/pike/nova/comp
>>> ute/resource_tracker.py#L572 and then
>>> https://github.com/openstack/nova/blob/stable/pike/nova/pci/
>>> manager.py#L71
>>>
>>> Any recommendations?

 Thanks,
 Maciej


 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


>>> ___
>>> OpenStack-operators mailing list
>>> OpenStack-operators@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>
>>
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-22 Thread Maciej Kucia
Thank you for the reply. I am interested in SR-IOV and pci whitelisting is
certainly involved.
I suspect that OpenStack itself can handle those numbers of devices,
especially in telco applications where not much scheduling is being done.
The feedback I am getting is from sysadmins who work on network
virtualization but I think this is just a rumor without any proof.

The question is if performance penalty from SR-IOV drivers or PCI itself is
negligible. Should cloud admin configure maximum number of VFs for
flexibility or should it be manually managed and balanced depending on
application?

Regards,
Maciej


>
> 2018-01-22 18:38 GMT+01:00 Jay Pipes :
>
>> On 01/22/2018 11:36 AM, Maciej Kucia wrote:
>>
>>> Hi!
>>>
>>> Is there any noticeable performance penalty when using multiple virtual
>>> functions?
>>>
>>> For simplicity I am enabling all available virtual functions in my NICs.
>>>
>>
>> I presume by the above you are referring to setting your
>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs on a
>> particular PF's PCI address domain/bus?
>>
>> Sometimes application is using only few of them. I am using Intel and
>>> Mellanox.
>>>
>>> I do not see any performance drop but I am getting feedback that this
>>> might not be the best approach.
>>>
>>
>> Who is giving you this feedback?
>>
>> The only issue with enabling (potentially 254 or more) VFs for each PF is
>> that each VF will end up as a record in the pci_devices table in the Nova
>> cell database. Multiply 254 or more times the number of PFs times the
>> number of compute nodes in your deployment and you can get a large number
>> of records that need to be stored. That said, the pci_devices table is well
>> indexed and even if you had 1M or more records in the table, the access of
>> a few hundred of those records when the resource tracker does a
>> PciDeviceList.get_by_compute_node() [1] will still be quite fast.
>>
>> Best,
>> -jay
>>
>> [1] https://github.com/openstack/nova/blob/stable/pike/nova/comp
>> ute/resource_tracker.py#L572 and then
>> https://github.com/openstack/nova/blob/stable/pike/nova/pci/
>> manager.py#L71
>>
>> Any recommendations?
>>>
>>> Thanks,
>>> Maciej
>>>
>>>
>>> ___
>>> OpenStack-operators mailing list
>>> OpenStack-operators@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-22 Thread Jay Pipes

On 01/22/2018 11:36 AM, Maciej Kucia wrote:

Hi!

Is there any noticeable performance penalty when using multiple virtual 
functions?


For simplicity I am enabling all available virtual functions in my NICs.


I presume by the above you are referring to setting your 
pci_passthrough_whitelist on your compute nodes to whitelist all VFs on 
a particular PF's PCI address domain/bus?


Sometimes application is using only few of them. I am using Intel and 
Mellanox.


I do not see any performance drop but I am getting feedback that this 
might not be the best approach.


Who is giving you this feedback?

The only issue with enabling (potentially 254 or more) VFs for each PF 
is that each VF will end up as a record in the pci_devices table in the 
Nova cell database. Multiply 254 or more times the number of PFs times 
the number of compute nodes in your deployment and you can get a large 
number of records that need to be stored. That said, the pci_devices 
table is well indexed and even if you had 1M or more records in the 
table, the access of a few hundred of those records when the resource 
tracker does a PciDeviceList.get_by_compute_node() [1] will still be 
quite fast.


Best,
-jay

[1] 
https://github.com/openstack/nova/blob/stable/pike/nova/compute/resource_tracker.py#L572 
and then

https://github.com/openstack/nova/blob/stable/pike/nova/pci/manager.py#L71


Any recommendations?

Thanks,
Maciej


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

2018-01-22 Thread Maciej Kucia
Hi!

Is there any noticeable performance penalty when using multiple virtual
functions?

For simplicity I am enabling all available virtual functions in my NICs.
Sometimes application is using only few of them. I am using Intel and
Mellanox.

I do not see any performance drop but I am getting feedback that this might
not be the best approach.

Any recommendations?

Thanks,
Maciej
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators