Re: [systemd-devel] persisting sriov_numvfs
On Sat, Feb 14, 2015 at 12:47:54AM +0100, Tom Gundersen wrote: On Tue, Jan 27, 2015 at 5:49 PM, Lennart Poettering lenn...@poettering.net wrote: On Tue, 27.01.15 08:41, Martin Polednik (mpoled...@redhat.com) wrote: b) Expose this via udev .link files. This would be appropriate if adding/removing VFs is a one-time thing, when a device pops up. This would be networking specific, not cover anything else like GPU or storage or so. Would still be quite nice. Would probably the best option, after a), if VFs cannot be added/removed dynamically all the time without affecting the other VFs. c) Expose this via udev rules files. This would be generic, would work for networking as well as GPUs or storage. This would entail writing our rules files when you want to configure the number of VFs. Care needs to be taken to use the right way to identify devices as they come and go, so that you can apply configuration to them in a stable way. This is somewhat uglier, as we don't really think that udev rules should be used that much for configuration, especially not for configuration written out by programs, rather than manually. However, logind already does this, to assign seat identifiers to udev devices to enable multi-seat support. A combination of b) for networking and c) for the rest might be an option too. I myself would vote for b) + c) since we want to cover most of the possible use cases for SR-IOV and MR-IOV, which hopefully shares the interface; adding Dan back to CC as he is the one to speak for network. I have added b) to our TODO list for networkd/udev .link files. I discussed this with Michal Sekletar who has been looking at this. It appears that the sysfs attribute can only be set after the underlying netdev is IFF_UP. Is that expected? If so, I don't think it is appropriate for udev to deal with this. If anything it should be networkd (who is responsible for bringing the links up), but I must say I don't think this kernel API makes much sense, so hopefully we can come up with something better... I tried this only with hardware using bnx2x driver but I don't assume that other hardware will behave any different. Anyway, so far it *seems* like udev is not the right place to implement this. Michal c) should probably be done outside of systemd/udev. Just write a tool (or even documenting this might suffice), that creates udev rules in /etc/udev/rules.d, matches against ID_PATH and then sets the right attribute. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On Tue, Jan 27, 2015 at 5:49 PM, Lennart Poettering lenn...@poettering.net wrote: On Tue, 27.01.15 08:41, Martin Polednik (mpoled...@redhat.com) wrote: b) Expose this via udev .link files. This would be appropriate if adding/removing VFs is a one-time thing, when a device pops up. This would be networking specific, not cover anything else like GPU or storage or so. Would still be quite nice. Would probably the best option, after a), if VFs cannot be added/removed dynamically all the time without affecting the other VFs. c) Expose this via udev rules files. This would be generic, would work for networking as well as GPUs or storage. This would entail writing our rules files when you want to configure the number of VFs. Care needs to be taken to use the right way to identify devices as they come and go, so that you can apply configuration to them in a stable way. This is somewhat uglier, as we don't really think that udev rules should be used that much for configuration, especially not for configuration written out by programs, rather than manually. However, logind already does this, to assign seat identifiers to udev devices to enable multi-seat support. A combination of b) for networking and c) for the rest might be an option too. I myself would vote for b) + c) since we want to cover most of the possible use cases for SR-IOV and MR-IOV, which hopefully shares the interface; adding Dan back to CC as he is the one to speak for network. I have added b) to our TODO list for networkd/udev .link files. I discussed this with Michal Sekletar who has been looking at this. It appears that the sysfs attribute can only be set after the underlying netdev is IFF_UP. Is that expected? If so, I don't think it is appropriate for udev to deal with this. If anything it should be networkd (who is responsible for bringing the links up), but I must say I don't think this kernel API makes much sense, so hopefully we can come up with something better... c) should probably be done outside of systemd/udev. Just write a tool (or even documenting this might suffice), that creates udev rules in /etc/udev/rules.d, matches against ID_PATH and then sets the right attribute. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
- Original Message - From: Lennart Poettering lenn...@poettering.net To: Martin Polednik mpoled...@redhat.com Cc: Andrei Borzenkov arvidj...@gmail.com, systemd-devel@lists.freedesktop.org, ibar...@redhat.com Sent: Tuesday, January 27, 2015 2:21:21 PM Subject: Re: [systemd-devel] persisting sriov_numvfs On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote: Hmm, I see. In many ways this feels like VLAN setup from a configuration PoV, right? i.e. you have one hw device the driver creates, and then you configure a couple of additional interfaces on top of it. This of course then raises the question: shouldn't this functionality be exposed by the kernel the same way as VLANs? i.e. with a rtnetlink-based API to create additional interfaces, instead of /sys? In systemd I figure the right way to expose this to the user would be via .netdev files, the same way as we expose VLAN devices. Not however that that would be networkd territory, No, this is not limited to NICs. It is generic feature that can be in principle used with any hardware and there are e.g. FC or FCoE HBAs with SR-IOV support. It is true that today it is mostly comes with NICs though. Any general framework for setting it up should not be tied to specific card type. Well, I doubt that there will be graphics cards that support this right? I mean, it's really only network connectivity that can support a concept like this easily, since you can easily merge packet streams from multiple VMs on one connection. However, I am not sure how you want to physically merge VGA streams onto a single VGA connector... If this is about ethernet, FC, FCOE, then I still think that the network management solution should consider this as something you can configure on physical links like VLANs. Hence networkd or NetworkManager and so on should cover it. Lennart Afaik some storage cards support this, for GPU's it's possibly for the GPUPU applications and such - where you don't care about the physical output, but the processing core of gpu itself (but I'm not aware of such implementation yet, nvidia seems to be doing something but the details are nowhere to be found). Hmm, so there are three options I think. a) Expose this in networkd .netdev files, as I suggested originally. This would be appropriate if we can add and remove VFs freely any time, without the other VFs being affected. Can you clarify whether going from let's say 4 to 5 VFs requires removing all VFs and recreating them? THis would be the nicest exposure I think, but be specific to networkd. Removing and recreating the VFs is unfortunately required when changing the number of them (both ways - increasing and decreasing their count). https://www.kernel.org/doc/Documentation/PCI/pci-iov-howto.txt b) Expose this via udev .link files. This would be appropriate if adding/removing VFs is a one-time thing, when a device pops up. This would be networking specific, not cover anything else like GPU or storage or so. Would still be quite nice. Would probably the best option, after a), if VFs cannot be added/removed dynamically all the time without affecting the other VFs. c) Expose this via udev rules files. This would be generic, would work for networking as well as GPUs or storage. This would entail writing our rules files when you want to configure the number of VFs. Care needs to be taken to use the right way to identify devices as they come and go, so that you can apply configuration to them in a stable way. This is somewhat uglier, as we don't really think that udev rules should be used that much for configuration, especially not for configuration written out by programs, rather than manually. However, logind already does this, to assign seat identifiers to udev devices to enable multi-seat support. A combination of b) for networking and c) for the rest might be an option too. I myself would vote for b) + c) since we want to cover most of the possible use cases for SR-IOV and MR-IOV, which hopefully shares the interface; adding Dan back to CC as he is the one to speak for network. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On 01/27/2015 12:40 PM, Tom Gundersen wrote: Hi Dan, On Mon, Jan 19, 2015 at 3:18 PM, Dan Kenigsberg dan...@redhat.com wrote: I'm an http://oVirt.org developer, and we plan to (finally) support SR-IOV cards natively. Working on this feature, we've noticed that something is missing in the platform OS. If I maintain a host with sr-iov cards, I'd like to use the new kernel method of defining how many virtual functions (VFs) are to be exposed by each physical function: # echo 3 /sys/class/net/enp2s0f0/device/sriov_numvfs This spawns 3 new devices, for which udev allocated (on my host) the names enp2s16, enp2s16f2 and enp2s16f4. I can attach these VFs to virtual machines, but I can also use them as yet another host NIC. Let's assume that I did the latter, and persisted its IP address using initscripts in /etc/sysconfig/network-scripts/ifcfg-enp2s16f4. However, on the next boot, sriov_numvfs is reset to 0, there's no device named enp2s16f4, and certainly no IP address asigned to it. The admin can solve his own private issue by writing a service to start after udev allocats device names but before network services kick in, and re-apply his echo there. But it feels like something that should be solved in a more generic fashion. It is also not limitted to network device. As similar issue would affect anything that attempts to refer to a VF by its name, and survive reboot. How should this be implemented in the realm of systemd? Sorry for the delay in getting back to you. My understanding is that the number of vfs must be basically set once and not changed after that? It seems that it is possible to change it, but only at the cost of removing all of them first, which I guess is not really an option in case they are in use. Enabling this stuff via module parameter manually or via .conf file has been deprecate and users are encourage to use the pci sysfs interface instead. If that is the case, and what you essentially want is to just override the kernel default (0 VFs), then I think we can add a feature to udev's .link files to handle this. This means the VFs will be allocated very early during boot, as soon as the PF appears. On the downside, there is no mechanism to nicely update this setting during run-time (which may not be a problem if that is not really supported anyway), you would have to reinsert the PF or reboot the machine for the .link file to be applied. You can create number of VF to the cards maximum per PF via| |# echo number /sys/bus/pci/devices/\:01\:00.0/sriov_numvfs # echo number /sys/bus/pci/devices/\:01\:00.1/sriov_numvfs ... etc. ( these should be able to be matched in link files via Path as in Path=pci-:01:00.0-* for the above sample right ? ) Then you can tweak the VF settings To set the vNIC MAC address on the Virtual Function # ip link set pf vf vf_index mac vnic_mac # ip link set em1 vf 0 mac 00:52:44:11:22:33 It's common to set fixed mac address instead of randomly generated ones via bash script at startup To turn HW packet source mac spoof check on or off for the specified VF # ip link set pf vf vf_index spoofchk on|off # ip link set em1 vf 0 spoofchk on Change the link state as seen by the VF # ip link set pf vf vf_index state auto|enable|disable # ip link set em1 vf 0 state disabled To set a VLAN and priority on Virtual Function # ip link set dev down # ip link set pf vf vf_index vlan vlan id qos priority # ip link set dev up Here for example is em1 is the PF (physical function) , em2 is the interface assigned to VF 0. # ip link set em2 down # ip link set em1 vf 0 vlan 2 qos 2 # ip link set em2 up If someone ships you those cards you can verify configuration use ip link show command like so # ip link show dev em1 And it's output be something like this 7: em1: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:02:c9:e6:01:12 brd ff:ff:ff:ff:ff:ff vf 0 MAC mac, vlan id, spoof checking off, link-state auto vf 1 MAC mac, vlan id, spoof checking on, link-state enable vf 2 MAC mac, vlan id, spoof checking off, link-state disable etc... Moreover, .link files are specific to network devices, so this will not help you with other kinds of PFs. I think that may be ok, depending on how common it is to use this for non-network hardware. If that is a niche usecase, it will always be possible to write an udev rule to achieve the same result as the .link file (for any kind of hardware), it is just a bit more cumbersome. If I'm not mistaken some of those cards can support for example infiniband,fc and etherenet at the same time ( which used to be configured when the module was loaded ) But what's missing from link files here? set the number of VF ? ( Note the maximum number of VFs that you can create and the maximum number of VFs that you can use for passthrough can be different.) That said it's probably best to get the Intel guys on board on this since a) Intel
Re: [systemd-devel] persisting sriov_numvfs
On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote: Hmm, I see. In many ways this feels like VLAN setup from a configuration PoV, right? i.e. you have one hw device the driver creates, and then you configure a couple of additional interfaces on top of it. This of course then raises the question: shouldn't this functionality be exposed by the kernel the same way as VLANs? i.e. with a rtnetlink-based API to create additional interfaces, instead of /sys? In systemd I figure the right way to expose this to the user would be via .netdev files, the same way as we expose VLAN devices. Not however that that would be networkd territory, No, this is not limited to NICs. It is generic feature that can be in principle used with any hardware and there are e.g. FC or FCoE HBAs with SR-IOV support. It is true that today it is mostly comes with NICs though. Any general framework for setting it up should not be tied to specific card type. Well, I doubt that there will be graphics cards that support this right? I mean, it's really only network connectivity that can support a concept like this easily, since you can easily merge packet streams from multiple VMs on one connection. However, I am not sure how you want to physically merge VGA streams onto a single VGA connector... If this is about ethernet, FC, FCOE, then I still think that the network management solution should consider this as something you can configure on physical links like VLANs. Hence networkd or NetworkManager and so on should cover it. Lennart Afaik some storage cards support this, for GPU's it's possibly for the GPUPU applications and such - where you don't care about the physical output, but the processing core of gpu itself (but I'm not aware of such implementation yet, nvidia seems to be doing something but the details are nowhere to be found). Hmm, so there are three options I think. a) Expose this in networkd .netdev files, as I suggested originally. This would be appropriate if we can add and remove VFs freely any time, without the other VFs being affected. Can you clarify whether going from let's say 4 to 5 VFs requires removing all VFs and recreating them? THis would be the nicest exposure I think, but be specific to networkd. b) Expose this via udev .link files. This would be appropriate if adding/removing VFs is a one-time thing, when a device pops up. This would be networking specific, not cover anything else like GPU or storage or so. Would still be quite nice. Would probably the best option, after a), if VFs cannot be added/removed dynamically all the time without affecting the other VFs. c) Expose this via udev rules files. This would be generic, would work for networking as well as GPUs or storage. This would entail writing our rules files when you want to configure the number of VFs. Care needs to be taken to use the right way to identify devices as they come and go, so that you can apply configuration to them in a stable way. This is somewhat uglier, as we don't really think that udev rules should be used that much for configuration, especially not for configuration written out by programs, rather than manually. However, logind already does this, to assign seat identifiers to udev devices to enable multi-seat support. A combination of b) for networking and c) for the rest might be an option too. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
- Original Message - From: Lennart Poettering lenn...@poettering.net To: Andrei Borzenkov arvidj...@gmail.com Cc: Martin Polednik mpoled...@redhat.com, systemd-devel@lists.freedesktop.org, ibar...@redhat.com Sent: Tuesday, January 27, 2015 1:21:32 PM Subject: Re: [systemd-devel] persisting sriov_numvfs On Tue, 27.01.15 06:47, Andrei Borzenkov (arvidj...@gmail.com) wrote: Hmm, I see. In many ways this feels like VLAN setup from a configuration PoV, right? i.e. you have one hw device the driver creates, and then you configure a couple of additional interfaces on top of it. This of course then raises the question: shouldn't this functionality be exposed by the kernel the same way as VLANs? i.e. with a rtnetlink-based API to create additional interfaces, instead of /sys? In systemd I figure the right way to expose this to the user would be via .netdev files, the same way as we expose VLAN devices. Not however that that would be networkd territory, No, this is not limited to NICs. It is generic feature that can be in principle used with any hardware and there are e.g. FC or FCoE HBAs with SR-IOV support. It is true that today it is mostly comes with NICs though. Any general framework for setting it up should not be tied to specific card type. Well, I doubt that there will be graphics cards that support this right? I mean, it's really only network connectivity that can support a concept like this easily, since you can easily merge packet streams from multiple VMs on one connection. However, I am not sure how you want to physically merge VGA streams onto a single VGA connector... If this is about ethernet, FC, FCOE, then I still think that the network management solution should consider this as something you can configure on physical links like VLANs. Hence networkd or NetworkManager and so on should cover it. Lennart Afaik some storage cards support this, for GPU's it's possibly for the GPUPU applications and such - where you don't care about the physical output, but the processing core of gpu itself (but I'm not aware of such implementation yet, nvidia seems to be doing something but the details are nowhere to be found). -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On Tue, 27.01.15 06:47, Andrei Borzenkov (arvidj...@gmail.com) wrote: Hmm, I see. In many ways this feels like VLAN setup from a configuration PoV, right? i.e. you have one hw device the driver creates, and then you configure a couple of additional interfaces on top of it. This of course then raises the question: shouldn't this functionality be exposed by the kernel the same way as VLANs? i.e. with a rtnetlink-based API to create additional interfaces, instead of /sys? In systemd I figure the right way to expose this to the user would be via .netdev files, the same way as we expose VLAN devices. Not however that that would be networkd territory, No, this is not limited to NICs. It is generic feature that can be in principle used with any hardware and there are e.g. FC or FCoE HBAs with SR-IOV support. It is true that today it is mostly comes with NICs though. Any general framework for setting it up should not be tied to specific card type. Well, I doubt that there will be graphics cards that support this right? I mean, it's really only network connectivity that can support a concept like this easily, since you can easily merge packet streams from multiple VMs on one connection. However, I am not sure how you want to physically merge VGA streams onto a single VGA connector... If this is about ethernet, FC, FCOE, then I still think that the network management solution should consider this as something you can configure on physical links like VLANs. Hence networkd or NetworkManager and so on should cover it. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On Tue, 27.01.15 08:41, Martin Polednik (mpoled...@redhat.com) wrote: b) Expose this via udev .link files. This would be appropriate if adding/removing VFs is a one-time thing, when a device pops up. This would be networking specific, not cover anything else like GPU or storage or so. Would still be quite nice. Would probably the best option, after a), if VFs cannot be added/removed dynamically all the time without affecting the other VFs. c) Expose this via udev rules files. This would be generic, would work for networking as well as GPUs or storage. This would entail writing our rules files when you want to configure the number of VFs. Care needs to be taken to use the right way to identify devices as they come and go, so that you can apply configuration to them in a stable way. This is somewhat uglier, as we don't really think that udev rules should be used that much for configuration, especially not for configuration written out by programs, rather than manually. However, logind already does this, to assign seat identifiers to udev devices to enable multi-seat support. A combination of b) for networking and c) for the rest might be an option too. I myself would vote for b) + c) since we want to cover most of the possible use cases for SR-IOV and MR-IOV, which hopefully shares the interface; adding Dan back to CC as he is the one to speak for network. I have added b) to our TODO list for networkd/udev .link files. c) should probably be done outside of systemd/udev. Just write a tool (or even documenting this might suffice), that creates udev rules in /etc/udev/rules.d, matches against ID_PATH and then sets the right attribute. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On 01/27/2015 01:41 PM, Martin Polednik wrote: - Original Message - From: Lennart Poettering lenn...@poettering.net To: Martin Polednik mpoled...@redhat.com Cc: Andrei Borzenkov arvidj...@gmail.com, systemd-devel@lists.freedesktop.org, ibar...@redhat.com Sent: Tuesday, January 27, 2015 2:21:21 PM Subject: Re: [systemd-devel] persisting sriov_numvfs On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote: snip Hmm, so there are three options I think. a) Expose this in networkd .netdev files, as I suggested originally. This would be appropriate if we can add and remove VFs freely any time, without the other VFs being affected. Can you clarify whether going from let's say 4 to 5 VFs requires removing all VFs and recreating them? THis would be the nicest exposure I think, but be specific to networkd. Removing and recreating the VFs is unfortunately required when changing the number of them (both ways - increasing and decreasing their count). How common is that in practice to change the number of vf as opposed to admins simply take down the vNIC interface ( link set vf down) reconfigure it and start it again if when it would be but back into use? And what would be considered as an sane default to do here in link file(s) Always enable SR-IOV and always create numvfs equalling to totalvfs so for intel based cards it might look something like this ( adminstrators would have to overwrite to disable sr-iov and or reduce number of vf ) .link [Match] Driver=igb Path=pci-:01:00.0-* ... [Link] Name=enp1s0 ... Or never configured this out of the box as in the admin has to manually do it himself which would make this look like something like this to enable SRIOV and setting the numvfs ( and or overwrite the default ) .. .link [Match] Driver=igb Path=pci-:01:00.0-* Sriov=yes Numvfs=7 ... [Link] Name=enp1s0 ... Does not udev-builtin-net be updated to assign a ( persistent ) name to VF devices based upon the name of the PF ( something like enp1s0vf1,enp1s0vf2 etc based on the sample above)? As well as provide persistent mac addresses for those VF's out of the box as an default since it seems to be a common thing to hack up via bootup script and or atleast be configured to do so ( PersistantMac=yes which is generated based on the mac address of the PF device maybe ?). JBG ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On Fri, 23.01.15 08:51, Martin Polednik (mpoled...@redhat.com) wrote: Quite frankly, I cannot make sense of these sentences. I have no clue what a SR-IOV, virtual function, physical function is supposed to be. Please explain what this all is, before we can think of adding any friendlier config option to udev/networkd/systemd for this. Hello, I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host side. SR-IOV is a specification from PCI SIG, where single hardware device (we're using NICs for example) can actually act as a multiple devices. This device is then considered PF (physical function) and spawned devices are so called VFs (virtual functions). This functionality allows system administrators to assign these devices to virtual machines to get near bare metal performance of the device and possibly share it amongst multiple VMs. Spawning of the VFs was previously done via device driver, using max_vfs attribute. This means that if you wanted to persist these VFs, you had to add this to modules-load.d. Since some of the device driver creators used different names, spawning of VFs was moved to sysfs and can be operated via echo ${number} /sys/bus/pci/devices/${device_name}/sriov_numvfs where ${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if changing the number of VFs from nonzero value, it first needs to be set to 0. We've encountered the need to persist this configuration and load it before network scripts (and possibly in future other scripts) so that the hardware can be referenced in those scripts. There is currently no such option. We are seeking help in creating a standardized way of handling this persistence. Hmm, I see. In many ways this feels like VLAN setup from a configuration PoV, right? i.e. you have one hw device the driver creates, and then you configure a couple of additional interfaces on top of it. This of course then raises the question: shouldn't this functionality be exposed by the kernel the same way as VLANs? i.e. with a rtnetlink-based API to create additional interfaces, instead of /sys? In systemd I figure the right way to expose this to the user would be via .netdev files, the same way as we expose VLAN devices. Not however that that would be networkd territory, and RHEL does not use that. This means you'd have to talk to the NetworkManager folks about this... Anyway, Tom, I think you should say something about this! Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
В Tue, 27 Jan 2015 03:30:22 +0100 Lennart Poettering lenn...@poettering.net пишет: On Fri, 23.01.15 08:51, Martin Polednik (mpoled...@redhat.com) wrote: Quite frankly, I cannot make sense of these sentences. I have no clue what a SR-IOV, virtual function, physical function is supposed to be. Please explain what this all is, before we can think of adding any friendlier config option to udev/networkd/systemd for this. Hello, I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host side. SR-IOV is a specification from PCI SIG, where single hardware device (we're using NICs for example) can actually act as a multiple devices. This device is then considered PF (physical function) and spawned devices are so called VFs (virtual functions). This functionality allows system administrators to assign these devices to virtual machines to get near bare metal performance of the device and possibly share it amongst multiple VMs. Spawning of the VFs was previously done via device driver, using max_vfs attribute. This means that if you wanted to persist these VFs, you had to add this to modules-load.d. Since some of the device driver creators used different names, spawning of VFs was moved to sysfs and can be operated via echo ${number} /sys/bus/pci/devices/${device_name}/sriov_numvfs where ${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if changing the number of VFs from nonzero value, it first needs to be set to 0. We've encountered the need to persist this configuration and load it before network scripts (and possibly in future other scripts) so that the hardware can be referenced in those scripts. There is currently no such option. We are seeking help in creating a standardized way of handling this persistence. Hmm, I see. In many ways this feels like VLAN setup from a configuration PoV, right? i.e. you have one hw device the driver creates, and then you configure a couple of additional interfaces on top of it. This of course then raises the question: shouldn't this functionality be exposed by the kernel the same way as VLANs? i.e. with a rtnetlink-based API to create additional interfaces, instead of /sys? In systemd I figure the right way to expose this to the user would be via .netdev files, the same way as we expose VLAN devices. Not however that that would be networkd territory, No, this is not limited to NICs. It is generic feature that can be in principle used with any hardware and there are e.g. FC or FCoE HBAs with SR-IOV support. It is true that today it is mostly comes with NICs though. Any general framework for setting it up should not be tied to specific card type. and RHEL does not use that. This means you'd have to talk to the NetworkManager folks about this... Anyway, Tom, I think you should say something about this! Lennart ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
- Original Message - From: Lennart Poettering lenn...@poettering.net To: Dan Kenigsberg dan...@redhat.com Cc: systemd-devel@lists.freedesktop.org, mpole...@redhat.com, ibar...@redhat.com Sent: Friday, January 23, 2015 3:49:59 AM Subject: Re: [systemd-devel] persisting sriov_numvfs On Mon, 19.01.15 14:18, Dan Kenigsberg (dan...@redhat.com) wrote: Hello, list. I'm an http://oVirt.org developer, and we plan to (finally) support SR-IOV cards natively. Working on this feature, we've noticed that something is missing in the platform OS. If I maintain a host with sr-iov cards, I'd like to use the new kernel method of defining how many virtual functions (VFs) are to be exposed by each physical function: Quite frankly, I cannot make sense of these sentences. I have no clue what a SR-IOV, virtual function, physical function is supposed to be. Please explain what this all is, before we can think of adding any friendlier config option to udev/networkd/systemd for this. Hello, I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host side. SR-IOV is a specification from PCI SIG, where single hardware device (we're using NICs for example) can actually act as a multiple devices. This device is then considered PF (physical function) and spawned devices are so called VFs (virtual functions). This functionality allows system administrators to assign these devices to virtual machines to get near bare metal performance of the device and possibly share it amongst multiple VMs. Spawning of the VFs was previously done via device driver, using max_vfs attribute. This means that if you wanted to persist these VFs, you had to add this to modules-load.d. Since some of the device driver creators used different names, spawning of VFs was moved to sysfs and can be operated via echo ${number} /sys/bus/pci/devices/${device_name}/sriov_numvfs where ${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if changing the number of VFs from nonzero value, it first needs to be set to 0. We've encountered the need to persist this configuration and load it before network scripts (and possibly in future other scripts) so that the hardware can be referenced in those scripts. There is currently no such option. We are seeking help in creating a standardized way of handling this persistence. mpolednik Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On Mon, 19.01.15 14:18, Dan Kenigsberg (dan...@redhat.com) wrote: Hello, list. I'm an http://oVirt.org developer, and we plan to (finally) support SR-IOV cards natively. Working on this feature, we've noticed that something is missing in the platform OS. If I maintain a host with sr-iov cards, I'd like to use the new kernel method of defining how many virtual functions (VFs) are to be exposed by each physical function: Quite frankly, I cannot make sense of these sentences. I have no clue what a SR-IOV, virtual function, physical function is supposed to be. Please explain what this all is, before we can think of adding any friendlier config option to udev/networkd/systemd for this. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On 01/19/2015 09:57 PM, Dan Kenigsberg wrote: On Mon, Jan 19, 2015 at 04:51:48PM +, Jóhann B. Guðmundsson wrote: On 01/19/2015 02:18 PM, Dan Kenigsberg wrote: How should this be implemented in the realm of systemd? I would think via udev rule + systemd-networkd Could you elaborate your idea? Do you suggest adding a udev rule to take effect when enp2s0f0 is brought to life? Right I was thinking something along Li Dongyang patch [¹] Bottom line udev handles the low-level settings of network interfaces while systemd-networkd replaces the legacy network initscript and handles the setup of basic or more complex network settings (static IP/dhcp, bridge,vlan, veth..) for containers/virtualzation. Any tweaks beyond that is just conf snippets in modprobe.d/sysctl.d I suggest if you guys are not up to speed to take a trip, grap a beer and attend Tom's talk and the network track at fosdem [2] JBG 1. http://www.spinics.net/lists/hotplug/msg05082.html 2. https://fosdem.org/2015/schedule/track/network_management_and_sdn/ ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] persisting sriov_numvfs
On Mon, Jan 19, 2015 at 04:51:48PM +, Jóhann B. Guðmundsson wrote: On 01/19/2015 02:18 PM, Dan Kenigsberg wrote: How should this be implemented in the realm of systemd? I would think via udev rule + systemd-networkd Could you elaborate your idea? Do you suggest adding a udev rule to take effect when enp2s0f0 is brought to life? But where does networkd comes into play? Please also note that my own trouble stems from networking, but the need to persist sriov_numvfs may come up on storage of graphics devices, too. What are ovirt's plan regarding systemd-networkd support/integration? There are no immediate plans to use networkd to configure networking on the host. We are still using legacy ifcfg by default. However we now have a modular design that lets us use something more modern when the need arises. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] persisting sriov_numvfs
Hello, list. I'm an http://oVirt.org developer, and we plan to (finally) support SR-IOV cards natively. Working on this feature, we've noticed that something is missing in the platform OS. If I maintain a host with sr-iov cards, I'd like to use the new kernel method of defining how many virtual functions (VFs) are to be exposed by each physical function: # echo 3 /sys/class/net/enp2s0f0/device/sriov_numvfs This spawns 3 new devices, for which udev allocated (on my host) the names enp2s16, enp2s16f2 and enp2s16f4. I can attach these VFs to virtual machines, but I can also use them as yet another host NIC. Let's assume that I did the latter, and persisted its IP address using initscripts in /etc/sysconfig/network-scripts/ifcfg-enp2s16f4. However, on the next boot, sriov_numvfs is reset to 0, there's no device named enp2s16f4, and certainly no IP address asigned to it. The admin can solve his own private issue by writing a service to start after udev allocats device names but before network services kick in, and re-apply his echo there. But it feels like something that should be solved in a more generic fashion. It is also not limitted to network device. As similar issue would affect anything that attempts to refer to a VF by its name, and survive reboot. How should this be implemented in the realm of systemd? Regards, Dan. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel