Re: [systemd-devel] persisting sriov_numvfs

2015-02-16 Thread Michal Sekletar
On Sat, Feb 14, 2015 at 12:47:54AM +0100, Tom Gundersen wrote:
 On Tue, Jan 27, 2015 at 5:49 PM, Lennart Poettering
 lenn...@poettering.net wrote:
  On Tue, 27.01.15 08:41, Martin Polednik (mpoled...@redhat.com) wrote:
 
   b) Expose this via udev .link files. This would be appropriate if
  adding/removing VFs is a one-time thing, when a device pops
  up. This would be networking specific, not cover anything else like
  GPU or storage or so. Would still be quite nice. Would probably the
  best option, after a), if VFs cannot be added/removed dynamically
  all the time without affecting the other VFs.
  
   c) Expose this via udev rules files. This would be generic, would work
  for networking as well as GPUs or storage. This would entail
  writing our rules files when you want to configure the number of
  VFs. Care needs to be taken to use the right way to identify
  devices as they come and go, so that you can apply configuration to
  them in a stable way. This is somewhat uglier, as we don't really
  think that udev rules should be used that much for configuration,
  especially not for configuration written out by programs, rather
  than manually. However, logind already does this, to assign seat
  identifiers to udev devices to enable multi-seat support.
  
   A combination of b) for networking and c) for the rest might be an
   option too.
 
  I myself would vote for b) + c) since we want to cover most of the
  possible use cases for SR-IOV and MR-IOV, which hopefully shares
  the interface; adding Dan back to CC as he is the one to speak for network.
 
  I have added b) to our TODO list for networkd/udev .link files.
 
 I discussed this with Michal Sekletar who has been looking at this. It
 appears that the sysfs attribute can only be set after the underlying
 netdev is IFF_UP. Is that expected? If so, I don't think it is
 appropriate for udev to deal with this. If anything it should be
 networkd (who is responsible for bringing the links up), but I must
 say I don't think this kernel API makes much sense, so hopefully we
 can come up with something better...

I tried this only with hardware using bnx2x driver but I don't assume that other
hardware will behave any different. Anyway, so far it *seems* like udev is not
the right place to implement this.

Michal

 
  c) should probably be done outside of systemd/udev. Just write a tool
  (or even documenting this might suffice), that creates udev rules in
  /etc/udev/rules.d, matches against ID_PATH and then sets the right
  attribute.
 
  Lennart
 
  --
  Lennart Poettering, Red Hat
  ___
  systemd-devel mailing list
  systemd-devel@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-02-13 Thread Tom Gundersen
On Tue, Jan 27, 2015 at 5:49 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 27.01.15 08:41, Martin Polednik (mpoled...@redhat.com) wrote:

  b) Expose this via udev .link files. This would be appropriate if
 adding/removing VFs is a one-time thing, when a device pops
 up. This would be networking specific, not cover anything else like
 GPU or storage or so. Would still be quite nice. Would probably the
 best option, after a), if VFs cannot be added/removed dynamically
 all the time without affecting the other VFs.
 
  c) Expose this via udev rules files. This would be generic, would work
 for networking as well as GPUs or storage. This would entail
 writing our rules files when you want to configure the number of
 VFs. Care needs to be taken to use the right way to identify
 devices as they come and go, so that you can apply configuration to
 them in a stable way. This is somewhat uglier, as we don't really
 think that udev rules should be used that much for configuration,
 especially not for configuration written out by programs, rather
 than manually. However, logind already does this, to assign seat
 identifiers to udev devices to enable multi-seat support.
 
  A combination of b) for networking and c) for the rest might be an
  option too.

 I myself would vote for b) + c) since we want to cover most of the
 possible use cases for SR-IOV and MR-IOV, which hopefully shares
 the interface; adding Dan back to CC as he is the one to speak for network.

 I have added b) to our TODO list for networkd/udev .link files.

I discussed this with Michal Sekletar who has been looking at this. It
appears that the sysfs attribute can only be set after the underlying
netdev is IFF_UP. Is that expected? If so, I don't think it is
appropriate for udev to deal with this. If anything it should be
networkd (who is responsible for bringing the links up), but I must
say I don't think this kernel API makes much sense, so hopefully we
can come up with something better...

 c) should probably be done outside of systemd/udev. Just write a tool
 (or even documenting this might suffice), that creates udev rules in
 /etc/udev/rules.d, matches against ID_PATH and then sets the right
 attribute.

 Lennart

 --
 Lennart Poettering, Red Hat
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Martin Polednik


- Original Message -
 From: Lennart Poettering lenn...@poettering.net
 To: Martin Polednik mpoled...@redhat.com
 Cc: Andrei Borzenkov arvidj...@gmail.com, 
 systemd-devel@lists.freedesktop.org, ibar...@redhat.com
 Sent: Tuesday, January 27, 2015 2:21:21 PM
 Subject: Re: [systemd-devel] persisting sriov_numvfs
 
 On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote:
 
 Hmm, I see. In many ways this feels like VLAN setup from a
 configuration PoV, right? i.e. you have one hw device the driver
 creates, and then you configure a couple of additional interfaces on
 top of it.
 
 This of course then raises the question: shouldn't this functionality
 be exposed by the kernel the same way as VLANs? i.e. with a
 rtnetlink-based API to create additional interfaces, instead of /sys?
 
 In systemd I figure the right way to expose this to the user would be
 via
 .netdev files, the same way as we expose VLAN devices. Not however
 that that would be networkd territory,

No, this is not limited to NICs. It is generic feature that can be in
principle used with any hardware and there are e.g. FC or FCoE HBAs
with SR-IOV support. It is true that today it is mostly comes with NICs
though.

Any general framework for setting it up should not be tied to specific
card type.
   
   Well, I doubt that there will be graphics cards that support this
   right? I mean, it's really only network connectivity that can support
   a concept like this easily, since you can easily merge packet streams
   from multiple VMs on one connection. However, I am not sure how you
   want to physically merge VGA streams onto a single VGA connector...
   
   If this is about ethernet, FC, FCOE, then I still think that the
   network management solution should consider this as something you can
   configure on physical links like VLANs. Hence networkd or
   NetworkManager and so on should cover it.
   
   Lennart
  
  Afaik some storage cards support this, for GPU's it's possibly for the
  GPUPU applications and such - where you don't care about the physical
  output, but the processing core of gpu itself (but I'm not aware of such
  implementation yet, nvidia seems to be doing something but the details
  are nowhere to be found).
 
 Hmm, so there are three options I think.
 
 a) Expose this in networkd .netdev files, as I suggested
originally. This would be appropriate if we can add and remove VFs
freely any time, without the other VFs being affected. Can you
clarify whether going from let's say 4 to 5 VFs requires removing
all VFs and recreating them? THis would be the nicest exposure I
think, but be specific to networkd.

Removing and recreating the VFs is unfortunately required when changing the
number of them (both ways - increasing and decreasing their count).

https://www.kernel.org/doc/Documentation/PCI/pci-iov-howto.txt

 b) Expose this via udev .link files. This would be appropriate if
adding/removing VFs is a one-time thing, when a device pops
up. This would be networking specific, not cover anything else like
GPU or storage or so. Would still be quite nice. Would probably the
best option, after a), if VFs cannot be added/removed dynamically
all the time without affecting the other VFs.
 
 c) Expose this via udev rules files. This would be generic, would work
for networking as well as GPUs or storage. This would entail
writing our rules files when you want to configure the number of
VFs. Care needs to be taken to use the right way to identify
devices as they come and go, so that you can apply configuration to
them in a stable way. This is somewhat uglier, as we don't really
think that udev rules should be used that much for configuration,
especially not for configuration written out by programs, rather
than manually. However, logind already does this, to assign seat
identifiers to udev devices to enable multi-seat support.
 
 A combination of b) for networking and c) for the rest might be an
 option too.

I myself would vote for b) + c) since we want to cover most of the
possible use cases for SR-IOV and MR-IOV, which hopefully shares
the interface; adding Dan back to CC as he is the one to speak for network. 

 Lennart
 
 --
 Lennart Poettering, Red Hat
 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Jóhann B. Guðmundsson


On 01/27/2015 12:40 PM, Tom Gundersen wrote:

Hi Dan,

On Mon, Jan 19, 2015 at 3:18 PM, Dan Kenigsberg dan...@redhat.com wrote:

I'm an http://oVirt.org developer, and we plan to (finally) support
SR-IOV cards natively. Working on this feature, we've noticed that
something is missing in the platform OS.

If I maintain a host with sr-iov cards, I'd like to use the new kernel
method of defining how many virtual functions (VFs) are to be exposed by
each physical function:

 # echo 3  /sys/class/net/enp2s0f0/device/sriov_numvfs

This spawns 3 new devices, for which udev allocated (on my host) the names
enp2s16, enp2s16f2 and enp2s16f4.

I can attach these VFs to virtual machines, but I can also use them as
yet another host NIC. Let's assume that I did the latter, and persisted
its IP address using initscripts in
/etc/sysconfig/network-scripts/ifcfg-enp2s16f4.

However, on the next boot, sriov_numvfs is reset to 0, there's no
device named enp2s16f4, and certainly no IP address asigned to it.

The admin can solve his own private issue by writing a service to start
after udev allocats device names but before network services kick in,
and re-apply his echo there. But it feels like something that should
be solved in a more generic fashion. It is also not limitted to network
device. As similar issue would affect anything that attempts to refer to
a VF by its name, and survive reboot.

How should this be implemented in the realm of systemd?

Sorry for the delay in getting back to you.

My understanding is that the number of vfs must be basically set once
and not changed after that? It seems that it is possible to change it,
but only at the cost of removing all of them first, which I guess is
not really an option in case they are in use.


Enabling this stuff via module parameter manually or via .conf file has 
been deprecate and users are encourage to use the pci sysfs interface 
instead.




If that is the case, and what you essentially want is to just override
the kernel default (0 VFs), then I think we can add a feature to
udev's .link files to handle this.

This means the VFs will be allocated very early during boot, as soon
as the PF appears.

On the downside, there is no mechanism to nicely update this setting
during run-time (which may not be a problem if that is not really
supported anyway), you would have to reinsert the PF or reboot the
machine for the .link file to be applied.


You can create number of VF to the cards maximum per PF via|

|# echo number  /sys/bus/pci/devices/\:01\:00.0/sriov_numvfs
# echo number  /sys/bus/pci/devices/\:01\:00.1/sriov_numvfs
...
etc.

( these should be able to be matched in link files via Path as in 
Path=pci-:01:00.0-* for the above sample right ?  )


Then you can tweak the VF settings

To set the vNIC MAC address on the Virtual Function

# ip link set pf vf vf_index mac vnic_mac

# ip link set em1 vf 0 mac 00:52:44:11:22:33

It's common to set fixed mac address instead of randomly generated ones 
via bash script at startup


To turn HW packet source mac spoof check on or off for the specified VF

# ip link set pf vf vf_index spoofchk on|off

# ip link set em1 vf 0 spoofchk on

Change the link state as seen by the VF

# ip link set pf vf vf_index state auto|enable|disable

# ip link set em1 vf 0 state disabled

To set a VLAN and priority on Virtual Function

# ip link set dev down
# ip link set pf vf vf_index vlan vlan id qos priority
# ip link set dev up

Here for example is em1 is the PF (physical function) , em2 is the 
interface assigned to VF 0.


# ip link set em2 down
# ip link set em1 vf 0 vlan 2 qos 2
# ip link set em2 up

If someone ships you those cards you can verify configuration use ip 
link show command like so


# ip link show dev em1

And it's output be something like this

7: em1: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000

link/ether 00:02:c9:e6:01:12 brd ff:ff:ff:ff:ff:ff
vf 0 MAC mac, vlan id, spoof checking off, link-state auto
vf 1 MAC mac, vlan id, spoof checking on, link-state enable
vf 2 MAC mac, vlan id, spoof checking off, link-state disable

etc...


  Moreover, .link files are
specific to network devices, so this will not help you with other
kinds of PFs. I think that may be ok, depending on how common it is to
use this for non-network hardware. If that is a niche usecase, it will
always be possible to write an udev rule to achieve the same result as
the .link file (for any kind of hardware), it is just a bit more
cumbersome.


If I'm not mistaken some of those cards can support for example 
infiniband,fc and etherenet at the same time ( which used to be 
configured when the module was loaded )


But what's missing from link files here?
set the number of VF ?
( Note the maximum number of VFs that you can create and the maximum 
number of VFs that you can use for passthrough can be different.)


That said it's probably best to get the Intel guys on board on this 
since a) Intel 

Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Lennart Poettering
On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote:

Hmm, I see. In many ways this feels like VLAN setup from a
configuration PoV, right? i.e. you have one hw device the driver
creates, and then you configure a couple of additional interfaces on
top of it.

This of course then raises the question: shouldn't this functionality
be exposed by the kernel the same way as VLANs? i.e. with a
rtnetlink-based API to create additional interfaces, instead of /sys?

In systemd I figure the right way to expose this to the user would be 
via
.netdev files, the same way as we expose VLAN devices. Not however
that that would be networkd territory,
   
   No, this is not limited to NICs. It is generic feature that can be in
   principle used with any hardware and there are e.g. FC or FCoE HBAs
   with SR-IOV support. It is true that today it is mostly comes with NICs
   though.
   
   Any general framework for setting it up should not be tied to specific
   card type.
  
  Well, I doubt that there will be graphics cards that support this
  right? I mean, it's really only network connectivity that can support
  a concept like this easily, since you can easily merge packet streams
  from multiple VMs on one connection. However, I am not sure how you
  want to physically merge VGA streams onto a single VGA connector...
  
  If this is about ethernet, FC, FCOE, then I still think that the
  network management solution should consider this as something you can
  configure on physical links like VLANs. Hence networkd or
  NetworkManager and so on should cover it.
  
  Lennart
 
 Afaik some storage cards support this, for GPU's it's possibly for the
 GPUPU applications and such - where you don't care about the physical
 output, but the processing core of gpu itself (but I'm not aware of such
 implementation yet, nvidia seems to be doing something but the details
 are nowhere to be found).

Hmm, so there are three options I think.

a) Expose this in networkd .netdev files, as I suggested
   originally. This would be appropriate if we can add and remove VFs
   freely any time, without the other VFs being affected. Can you
   clarify whether going from let's say 4 to 5 VFs requires removing
   all VFs and recreating them? THis would be the nicest exposure I
   think, but be specific to networkd.

b) Expose this via udev .link files. This would be appropriate if
   adding/removing VFs is a one-time thing, when a device pops
   up. This would be networking specific, not cover anything else like
   GPU or storage or so. Would still be quite nice. Would probably the
   best option, after a), if VFs cannot be added/removed dynamically
   all the time without affecting the other VFs.

c) Expose this via udev rules files. This would be generic, would work
   for networking as well as GPUs or storage. This would entail
   writing our rules files when you want to configure the number of
   VFs. Care needs to be taken to use the right way to identify
   devices as they come and go, so that you can apply configuration to
   them in a stable way. This is somewhat uglier, as we don't really
   think that udev rules should be used that much for configuration,
   especially not for configuration written out by programs, rather
   than manually. However, logind already does this, to assign seat
   identifiers to udev devices to enable multi-seat support. 

A combination of b) for networking and c) for the rest might be an
option too.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Martin Polednik


- Original Message -
 From: Lennart Poettering lenn...@poettering.net
 To: Andrei Borzenkov arvidj...@gmail.com
 Cc: Martin Polednik mpoled...@redhat.com, 
 systemd-devel@lists.freedesktop.org, ibar...@redhat.com
 Sent: Tuesday, January 27, 2015 1:21:32 PM
 Subject: Re: [systemd-devel] persisting sriov_numvfs
 
 On Tue, 27.01.15 06:47, Andrei Borzenkov (arvidj...@gmail.com) wrote:
 
   Hmm, I see. In many ways this feels like VLAN setup from a
   configuration PoV, right? i.e. you have one hw device the driver
   creates, and then you configure a couple of additional interfaces on
   top of it.
   
   This of course then raises the question: shouldn't this functionality
   be exposed by the kernel the same way as VLANs? i.e. with a
   rtnetlink-based API to create additional interfaces, instead of /sys?
   
   In systemd I figure the right way to expose this to the user would be via
   .netdev files, the same way as we expose VLAN devices. Not however
   that that would be networkd territory,
  
  No, this is not limited to NICs. It is generic feature that can be in
  principle used with any hardware and there are e.g. FC or FCoE HBAs
  with SR-IOV support. It is true that today it is mostly comes with NICs
  though.
  
  Any general framework for setting it up should not be tied to specific
  card type.
 
 Well, I doubt that there will be graphics cards that support this
 right? I mean, it's really only network connectivity that can support
 a concept like this easily, since you can easily merge packet streams
 from multiple VMs on one connection. However, I am not sure how you
 want to physically merge VGA streams onto a single VGA connector...
 
 If this is about ethernet, FC, FCOE, then I still think that the
 network management solution should consider this as something you can
 configure on physical links like VLANs. Hence networkd or
 NetworkManager and so on should cover it.
 
 Lennart

Afaik some storage cards support this, for GPU's it's possibly for the
GPUPU applications and such - where you don't care about the physical
output, but the processing core of gpu itself (but I'm not aware of such
implementation yet, nvidia seems to be doing something but the details
are nowhere to be found).
 
 --
 Lennart Poettering, Red Hat
 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Lennart Poettering
On Tue, 27.01.15 06:47, Andrei Borzenkov (arvidj...@gmail.com) wrote:

  Hmm, I see. In many ways this feels like VLAN setup from a
  configuration PoV, right? i.e. you have one hw device the driver
  creates, and then you configure a couple of additional interfaces on
  top of it.
  
  This of course then raises the question: shouldn't this functionality
  be exposed by the kernel the same way as VLANs? i.e. with a
  rtnetlink-based API to create additional interfaces, instead of /sys?
  
  In systemd I figure the right way to expose this to the user would be via
  .netdev files, the same way as we expose VLAN devices. Not however
  that that would be networkd territory,
 
 No, this is not limited to NICs. It is generic feature that can be in
 principle used with any hardware and there are e.g. FC or FCoE HBAs
 with SR-IOV support. It is true that today it is mostly comes with NICs
 though.
 
 Any general framework for setting it up should not be tied to specific
 card type.

Well, I doubt that there will be graphics cards that support this
right? I mean, it's really only network connectivity that can support
a concept like this easily, since you can easily merge packet streams
from multiple VMs on one connection. However, I am not sure how you
want to physically merge VGA streams onto a single VGA connector...

If this is about ethernet, FC, FCOE, then I still think that the
network management solution should consider this as something you can
configure on physical links like VLANs. Hence networkd or
NetworkManager and so on should cover it.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Lennart Poettering
On Tue, 27.01.15 08:41, Martin Polednik (mpoled...@redhat.com) wrote:

  b) Expose this via udev .link files. This would be appropriate if
 adding/removing VFs is a one-time thing, when a device pops
 up. This would be networking specific, not cover anything else like
 GPU or storage or so. Would still be quite nice. Would probably the
 best option, after a), if VFs cannot be added/removed dynamically
 all the time without affecting the other VFs.
  
  c) Expose this via udev rules files. This would be generic, would work
 for networking as well as GPUs or storage. This would entail
 writing our rules files when you want to configure the number of
 VFs. Care needs to be taken to use the right way to identify
 devices as they come and go, so that you can apply configuration to
 them in a stable way. This is somewhat uglier, as we don't really
 think that udev rules should be used that much for configuration,
 especially not for configuration written out by programs, rather
 than manually. However, logind already does this, to assign seat
 identifiers to udev devices to enable multi-seat support.
  
  A combination of b) for networking and c) for the rest might be an
  option too.
 
 I myself would vote for b) + c) since we want to cover most of the
 possible use cases for SR-IOV and MR-IOV, which hopefully shares
 the interface; adding Dan back to CC as he is the one to speak for network. 

I have added b) to our TODO list for networkd/udev .link files.

c) should probably be done outside of systemd/udev. Just write a tool
(or even documenting this might suffice), that creates udev rules in
/etc/udev/rules.d, matches against ID_PATH and then sets the right
attribute.  

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-27 Thread Jóhann B. Guðmundsson


On 01/27/2015 01:41 PM, Martin Polednik wrote:


- Original Message -

From: Lennart Poettering lenn...@poettering.net
To: Martin Polednik mpoled...@redhat.com
Cc: Andrei Borzenkov arvidj...@gmail.com, 
systemd-devel@lists.freedesktop.org, ibar...@redhat.com
Sent: Tuesday, January 27, 2015 2:21:21 PM
Subject: Re: [systemd-devel] persisting sriov_numvfs

On Tue, 27.01.15 07:35, Martin Polednik (mpoled...@redhat.com) wrote:

snip

Hmm, so there are three options I think.

a) Expose this in networkd .netdev files, as I suggested
originally. This would be appropriate if we can add and remove VFs
freely any time, without the other VFs being affected. Can you
clarify whether going from let's say 4 to 5 VFs requires removing
all VFs and recreating them? THis would be the nicest exposure I
think, but be specific to networkd.

Removing and recreating the VFs is unfortunately required when changing the
number of them (both ways - increasing and decreasing their count).




How common is that in practice to change the number of vf as opposed to 
admins simply take down the vNIC interface ( link set vf down) 
reconfigure it and start it again if when it would be but back into use?


And what would be considered as an sane default to do here in link file(s)

Always enable SR-IOV and always create numvfs equalling to totalvfs so 
for intel based cards it might look something like this ( adminstrators 
would have to overwrite to disable sr-iov and or reduce number of vf )


.link

[Match]
Driver=igb
Path=pci-:01:00.0-*
...

[Link]
Name=enp1s0
...

Or never configured this out of the box as in the admin has to manually 
do it himself which would make this look like something like this to 
enable SRIOV and setting the numvfs ( and or overwrite the default ) ..

.link

[Match]
Driver=igb
Path=pci-:01:00.0-*
Sriov=yes
Numvfs=7
...

[Link]
Name=enp1s0
...

Does not udev-builtin-net be updated to assign a ( persistent ) name to 
VF devices based upon the name of the PF ( something like 
enp1s0vf1,enp1s0vf2 etc based on the sample above)?
As well as provide persistent mac addresses for those VF's out of the 
box as an default since it seems to be a common thing to hack up via 
bootup script and or atleast be configured to do so ( PersistantMac=yes 
which is generated based on the mac address of the PF device maybe ?).



JBG

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-26 Thread Lennart Poettering
On Fri, 23.01.15 08:51, Martin Polednik (mpoled...@redhat.com) wrote:

  Quite frankly, I cannot make sense of these sentences. I have no clue
  what a SR-IOV, virtual function, physical function is supposed
  to be.
  
  Please explain what this all is, before we can think of adding any
  friendlier config option to udev/networkd/systemd for this.
 
 Hello,
 
 I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host
 side.
 
 SR-IOV is a specification from PCI SIG, where single hardware device
 (we're using NICs for example) can actually act as a multiple devices.
 This device is then considered PF (physical function) and spawned devices
 are so called VFs (virtual functions). This functionality allows system
 administrators to assign these devices to virtual machines to get near
 bare metal performance of the device and possibly share it amongst multiple
 VMs.
 
 Spawning of the VFs was previously done via device driver, using max_vfs
 attribute. This means that if you wanted to persist these VFs, you had to
 add this to modules-load.d. Since some of the device driver creators used
 different names, spawning of VFs was moved to sysfs and can be operated via
 echo ${number}  /sys/bus/pci/devices/${device_name}/sriov_numvfs where
 ${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if 
 changing the number of VFs from nonzero value, it first needs to be set to 0.
 
 We've encountered the need to persist this configuration and load it before
 network scripts (and possibly in future other scripts) so that the hardware
 can be referenced in those scripts. There is currently no such option. We
 are seeking help in creating a standardized way of handling this persistence.

Hmm, I see. In many ways this feels like VLAN setup from a
configuration PoV, right? i.e. you have one hw device the driver
creates, and then you configure a couple of additional interfaces on
top of it.

This of course then raises the question: shouldn't this functionality
be exposed by the kernel the same way as VLANs? i.e. with a
rtnetlink-based API to create additional interfaces, instead of /sys?

In systemd I figure the right way to expose this to the user would be via
.netdev files, the same way as we expose VLAN devices. Not however
that that would be networkd territory, and RHEL does not use
that. This means you'd have to talk to the NetworkManager folks about
this...

Anyway, Tom, I think you should say something about this!

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-26 Thread Andrei Borzenkov
В Tue, 27 Jan 2015 03:30:22 +0100
Lennart Poettering lenn...@poettering.net пишет:

 On Fri, 23.01.15 08:51, Martin Polednik (mpoled...@redhat.com) wrote:
 
   Quite frankly, I cannot make sense of these sentences. I have no clue
   what a SR-IOV, virtual function, physical function is supposed
   to be.
   
   Please explain what this all is, before we can think of adding any
   friendlier config option to udev/networkd/systemd for this.
  
  Hello,
  
  I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host
  side.
  
  SR-IOV is a specification from PCI SIG, where single hardware device
  (we're using NICs for example) can actually act as a multiple devices.
  This device is then considered PF (physical function) and spawned devices
  are so called VFs (virtual functions). This functionality allows system
  administrators to assign these devices to virtual machines to get near
  bare metal performance of the device and possibly share it amongst multiple
  VMs.
  
  Spawning of the VFs was previously done via device driver, using max_vfs
  attribute. This means that if you wanted to persist these VFs, you had to
  add this to modules-load.d. Since some of the device driver creators used
  different names, spawning of VFs was moved to sysfs and can be operated via
  echo ${number}  /sys/bus/pci/devices/${device_name}/sriov_numvfs where
  ${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if 
  changing the number of VFs from nonzero value, it first needs to be set to 
  0.
  
  We've encountered the need to persist this configuration and load it before
  network scripts (and possibly in future other scripts) so that the hardware
  can be referenced in those scripts. There is currently no such option. We
  are seeking help in creating a standardized way of handling this 
  persistence.
 
 Hmm, I see. In many ways this feels like VLAN setup from a
 configuration PoV, right? i.e. you have one hw device the driver
 creates, and then you configure a couple of additional interfaces on
 top of it.
 
 This of course then raises the question: shouldn't this functionality
 be exposed by the kernel the same way as VLANs? i.e. with a
 rtnetlink-based API to create additional interfaces, instead of /sys?
 
 In systemd I figure the right way to expose this to the user would be via
 .netdev files, the same way as we expose VLAN devices. Not however
 that that would be networkd territory,

No, this is not limited to NICs. It is generic feature that can be in
principle used with any hardware and there are e.g. FC or FCoE HBAs
with SR-IOV support. It is true that today it is mostly comes with NICs
though.

Any general framework for setting it up should not be tied to specific
card type.
 
   and RHEL does not use
 that. This means you'd have to talk to the NetworkManager folks about
 this...
 
 Anyway, Tom, I think you should say something about this!
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-23 Thread Martin Polednik


- Original Message -
 From: Lennart Poettering lenn...@poettering.net
 To: Dan Kenigsberg dan...@redhat.com
 Cc: systemd-devel@lists.freedesktop.org, mpole...@redhat.com, 
 ibar...@redhat.com
 Sent: Friday, January 23, 2015 3:49:59 AM
 Subject: Re: [systemd-devel] persisting sriov_numvfs
 
 On Mon, 19.01.15 14:18, Dan Kenigsberg (dan...@redhat.com) wrote:
 
  Hello, list.
  
  I'm an http://oVirt.org developer, and we plan to (finally) support
  SR-IOV cards natively. Working on this feature, we've noticed that
  something is missing in the platform OS.
  
  If I maintain a host with sr-iov cards, I'd like to use the new kernel
  method of defining how many virtual functions (VFs) are to be exposed by
  each physical function:
 
 Quite frankly, I cannot make sense of these sentences. I have no clue
 what a SR-IOV, virtual function, physical function is supposed
 to be.
 
 Please explain what this all is, before we can think of adding any
 friendlier config option to udev/networkd/systemd for this.

Hello,

I'm oVirt developer responsible for VFIO/SR-IOV passthrough on the host
side.

SR-IOV is a specification from PCI SIG, where single hardware device
(we're using NICs for example) can actually act as a multiple devices.
This device is then considered PF (physical function) and spawned devices
are so called VFs (virtual functions). This functionality allows system
administrators to assign these devices to virtual machines to get near
bare metal performance of the device and possibly share it amongst multiple
VMs.

Spawning of the VFs was previously done via device driver, using max_vfs
attribute. This means that if you wanted to persist these VFs, you had to
add this to modules-load.d. Since some of the device driver creators used
different names, spawning of VFs was moved to sysfs and can be operated via
echo ${number}  /sys/bus/pci/devices/${device_name}/sriov_numvfs where
${number} = /sys/bus/pci/devices/${device_name}/sriov_totalvfs and if 
changing the number of VFs from nonzero value, it first needs to be set to 0.

We've encountered the need to persist this configuration and load it before
network scripts (and possibly in future other scripts) so that the hardware
can be referenced in those scripts. There is currently no such option. We
are seeking help in creating a standardized way of handling this persistence.

mpolednik
 
 Lennart
 
 --
 Lennart Poettering, Red Hat
 
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-22 Thread Lennart Poettering
On Mon, 19.01.15 14:18, Dan Kenigsberg (dan...@redhat.com) wrote:

 Hello, list.
 
 I'm an http://oVirt.org developer, and we plan to (finally) support
 SR-IOV cards natively. Working on this feature, we've noticed that
 something is missing in the platform OS.
 
 If I maintain a host with sr-iov cards, I'd like to use the new kernel
 method of defining how many virtual functions (VFs) are to be exposed by
 each physical function:

Quite frankly, I cannot make sense of these sentences. I have no clue
what a SR-IOV, virtual function, physical function is supposed
to be. 

Please explain what this all is, before we can think of adding any
friendlier config option to udev/networkd/systemd for this.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-19 Thread Jóhann B. Guðmundsson


On 01/19/2015 09:57 PM, Dan Kenigsberg wrote:

On Mon, Jan 19, 2015 at 04:51:48PM +, Jóhann B. Guðmundsson wrote:

On 01/19/2015 02:18 PM, Dan Kenigsberg wrote:

How should this be implemented in the realm of systemd?

I would think via udev rule + systemd-networkd

Could you elaborate your idea? Do you suggest adding a udev rule to take
effect when enp2s0f0 is brought to life?


Right I was thinking something along Li Dongyang patch [¹]

Bottom line udev handles the low-level settings of network interfaces 
while systemd-networkd replaces the legacy network initscript and 
handles the setup of basic or more complex network settings (static 
IP/dhcp, bridge,vlan, veth..) for containers/virtualzation.


Any tweaks beyond that is just conf snippets in modprobe.d/sysctl.d

I suggest if you guys are not up to speed to take a trip, grap a beer 
and attend Tom's talk and the network track at fosdem [2]


JBG

1. http://www.spinics.net/lists/hotplug/msg05082.html
2. https://fosdem.org/2015/schedule/track/network_management_and_sdn/
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] persisting sriov_numvfs

2015-01-19 Thread Dan Kenigsberg
On Mon, Jan 19, 2015 at 04:51:48PM +, Jóhann B. Guðmundsson wrote:
 
 On 01/19/2015 02:18 PM, Dan Kenigsberg wrote:
 How should this be implemented in the realm of systemd?
 
 I would think via udev rule + systemd-networkd

Could you elaborate your idea? Do you suggest adding a udev rule to take
effect when enp2s0f0 is brought to life? But where does networkd comes
into play?

Please also note that my own trouble stems from networking, but the need
to persist sriov_numvfs may come up on storage of graphics devices, too.


 What are ovirt's plan regarding systemd-networkd support/integration?

There are no immediate plans to use networkd to configure networking on
the host. We are still using legacy ifcfg by default.  However we now
have a modular design that lets us use something more modern when the
need arises.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] persisting sriov_numvfs

2015-01-19 Thread Dan Kenigsberg
Hello, list.

I'm an http://oVirt.org developer, and we plan to (finally) support
SR-IOV cards natively. Working on this feature, we've noticed that
something is missing in the platform OS.

If I maintain a host with sr-iov cards, I'd like to use the new kernel
method of defining how many virtual functions (VFs) are to be exposed by
each physical function:

# echo 3  /sys/class/net/enp2s0f0/device/sriov_numvfs

This spawns 3 new devices, for which udev allocated (on my host) the names
enp2s16, enp2s16f2 and enp2s16f4.

I can attach these VFs to virtual machines, but I can also use them as
yet another host NIC. Let's assume that I did the latter, and persisted
its IP address using initscripts in
/etc/sysconfig/network-scripts/ifcfg-enp2s16f4.

However, on the next boot, sriov_numvfs is reset to 0, there's no
device named enp2s16f4, and certainly no IP address asigned to it.

The admin can solve his own private issue by writing a service to start
after udev allocats device names but before network services kick in,
and re-apply his echo there. But it feels like something that should
be solved in a more generic fashion. It is also not limitted to network
device. As similar issue would affect anything that attempts to refer to
a VF by its name, and survive reboot.

How should this be implemented in the realm of systemd?

Regards,
Dan.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel