Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Martin Polednik


- Original Message -
 From: Lennart Poettering lenn...@poettering.net
 To: Martin Polednik mpoled...@redhat.com
 Cc: Andrei Borzenkov arvidj...@gmail.com, 
 systemd-devel@lists.freedesktop.org, ibar...@redhat.com
 Sent: Tuesday, January 27, 2015 2:21:21 PM
 Subject: Re: [systemd-devel] persisting sriov_numvfs
 
 On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote:
 
 Hmm, I see. In many ways this feels like VLAN setup from a
 configuration PoV, right? i.e. you have one hw device the driver
 creates, and then you configure a couple of additional interfaces on
 top of it.
 
 This of course then raises the question: shouldn't this functionality
 be exposed by the kernel the same way as VLANs? i.e. with a
 rtnetlink-based API to create additional interfaces, instead of /sys?
 
 In systemd I figure the right way to expose this to the user would be
 via
 .netdev files, the same way as we expose VLAN devices. Not however
 that that would be networkd territory,

No, this is not limited to NICs. It is generic feature that can be in
principle used with any hardware and there are e.g. FC or FCoE HBAs
with SR-IOV support. It is true that today it is mostly comes with NICs
though.

Any general framework for setting it up should not be tied to specific
card type.
   
   Well, I doubt that there will be graphics cards that support this
   right? I mean, it's really only network connectivity that can support
   a concept like this easily, since you can easily merge packet streams
   from multiple VMs on one connection. However, I am not sure how you
   want to physically merge VGA streams onto a single VGA connector...
   
   If this is about ethernet, FC, FCOE, then I still think that the
   network management solution should consider this as something you can
   configure on physical links like VLANs. Hence networkd or
   NetworkManager and so on should cover it.
   
   Lennart
  
  Afaik some storage cards support this, for GPU's it's possibly for the
  GPUPU applications and such - where you don't care about the physical
  output, but the processing core of gpu itself (but I'm not aware of such
  implementation yet, nvidia seems to be doing something but the details
  are nowhere to be found).
 
 Hmm, so there are three options I think.
 
 a) Expose this in networkd .netdev files, as I suggested
originally. This would be appropriate if we can add and remove VFs
freely any time, without the other VFs being affected. Can you
clarify whether going from let's say 4 to 5 VFs requires removing
all VFs and recreating them? THis would be the nicest exposure I
think, but be specific to networkd.

Removing and recreating the VFs is unfortunately required when changing the
number of them (both ways - increasing and decreasing their count).

https://www.kernel.org/doc/Documentation/PCI/pci-iov-howto.txt

 b) Expose this via udev .link files. This would be appropriate if
adding/removing VFs is a one-time thing, when a device pops
up. This would be networking specific, not cover anything else like
GPU or storage or so. Would still be quite nice. Would probably the
best option, after a), if VFs cannot be added/removed dynamically
all the time without affecting the other VFs.
 
 c) Expose this via udev rules files. This would be generic, would work
for networking as well as GPUs or storage. This would entail
writing our rules files when you want to configure the number of
VFs. Care needs to be taken to use the right way to identify
devices as they come and go, so that you can apply configuration to
them in a stable way. This is somewhat uglier, as we don't really
think that udev rules should be used that much for configuration,
especially not for configuration written out by programs, rather
than manually. However, logind already does this, to assign seat
identifiers to udev devices to enable multi-seat support.
 
 A combination of b) for networking and c) for the rest might be an
 option too.

I myself would vote for b) + c) since we want to cover most of the
possible use cases for SR-IOV and MR-IOV, which hopefully shares
the interface; adding Dan back to CC as he is the one to speak for network. 

 Lennart
 
 --
 Lennart Poettering, Red Hat
 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Martin Polednik


- Original Message -
 From: Lennart Poettering lenn...@poettering.net
 To: Andrei Borzenkov arvidj...@gmail.com
 Cc: Martin Polednik mpoled...@redhat.com, 
 systemd-devel@lists.freedesktop.org, ibar...@redhat.com
 Sent: Tuesday, January 27, 2015 1:21:32 PM
 Subject: Re: [systemd-devel] persisting sriov_numvfs
 
 On Tue, 27.01.15 06:47, Andrei Borzenkov (arvidj...@gmail.com) wrote:
 
   Hmm, I see. In many ways this feels like VLAN setup from a
   configuration PoV, right? i.e. you have one hw device the driver
   creates, and then you configure a couple of additional interfaces on
   top of it.
   
   This of course then raises the question: shouldn't this functionality
   be exposed by the kernel the same way as VLANs? i.e. with a
   rtnetlink-based API to create additional interfaces, instead of /sys?
   
   In systemd I figure the right way to expose this to the user would be via
   .netdev files, the same way as we expose VLAN devices. Not however
   that that would be networkd territory,
  
  No, this is not limited to NICs. It is generic feature that can be in
  principle used with any hardware and there are e.g. FC or FCoE HBAs
  with SR-IOV support. It is true that today it is mostly comes with NICs
  though.
  
  Any general framework for setting it up should not be tied to specific
  card type.
 
 Well, I doubt that there will be graphics cards that support this
 right? I mean, it's really only network connectivity that can support
 a concept like this easily, since you can easily merge packet streams
 from multiple VMs on one connection. However, I am not sure how you
 want to physically merge VGA streams onto a single VGA connector...
 
 If this is about ethernet, FC, FCOE, then I still think that the
 network management solution should consider this as something you can
 configure on physical links like VLANs. Hence networkd or
 NetworkManager and so on should cover it.
 
 Lennart

Afaik some storage cards support this, for GPU's it's possibly for the
GPUPU applications and such - where you don't care about the physical
output, but the processing core of gpu itself (but I'm not aware of such
implementation yet, nvidia seems to be doing something but the details
are nowhere to be found).
 
 --
 Lennart Poettering, Red Hat
 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-23 Thread Martin Polednik


- Original Message -
 From: Lennart Poettering lenn...@poettering.net
 To: Dan Kenigsberg dan...@redhat.com
 Cc: systemd-devel@lists.freedesktop.org, mpole...@redhat.com, 
 ibar...@redhat.com
 Sent: Friday, January 23, 2015 3:49:59 AM
 Subject: Re: [systemd-devel] persisting sriov_numvfs
 
 On Mon, 19.01.15 14:18, Dan Kenigsberg (dan...@redhat.com) wrote:
 
  Hello, list.
  
  I'm an http://oVirt.org developer, and we plan to (finally) support
  SR-IOV cards natively. Working on this feature, we've noticed that
  something is missing in the platform OS.
  
  If I maintain a host with sr-iov cards, I'd like to use the new kernel
  method of defining how many virtual functions (VFs) are to be exposed by
  each physical function:
 
 Quite frankly, I cannot make sense of these sentences. I have no clue
 what a SR-IOV, virtual function, physical function is supposed
 to be.
 
 Please explain what this all is, before we can think of adding any
 friendlier config option to udev/networkd/systemd for this.

Hello,

I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host
side.

SR-IOV is a specification from PCI SIG, where single hardware device
(we're using NICs for example) can actually act as a multiple devices.
This device is then considered PF (physical function) and spawned devices
are so called VFs (virtual functions). This functionality allows system
administrators to assign these devices to virtual machines to get near
bare metal performance of the device and possibly share it amongst multiple
VMs.

Spawning of the VFs was previously done via device driver, using max_vfs
attribute. This means that if you wanted to persist these VFs, you had to
add this to modules-load.d. Since some of the device driver creators used
different names, spawning of VFs was moved to sysfs and can be operated via
echo ${number}  /sys/bus/pci/devices/${device_name}/sriov_numvfs where
${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if 
changing the number of VFs from nonzero value, it first needs to be set to 0.

We've encountered the need to persist this configuration and load it before
network scripts (and possibly in future other scripts) so that the hardware
can be referenced in those scripts. There is currently no such option. We
are seeking help in creating a standardized way of handling this persistence.

mpolednik
 
 Lennart
 
 --
 Lennart Poettering, Red Hat
 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel