Re: [systemd-devel] Network Interface Names: solution for a desktop OS

2016-04-12 Thread Jordan Hargrave
On Tue, Apr 12, 2016 at 5:39 PM, Xen  wrote:
> Just want to summarize here very shortly.
>
>
> If you turn the hotplug naming scheme into something more attractive.
>
> If you turn the USB naming scheme into something more attractive.
>
> If you accept like a 99.9% confidence interval for waiting until
> hardware has shown itself, then taking the (embedded + pci bus) numbers
> and condensing that into a sequential list for ethernet and wireless
>
> And if you deal with any other naming scheme there might be.
>
> Then it would solve the utmost total majority of instability issues in
> the kernel loading the driver for multiple NICs in quick succession
> (this whole issue was mostly about very short intervals I believe) even
> if you don't have a 100.0% guarantee (but you would have a rounded
> 100% guarantee) particularly if you take into account that
> networking+firewall+routing should perhaps not start for a production
> system that is very important if not all networking devices for it are
> present.
>
> If you are going to start routing and firewalling on non-present
> networking devices, you have a problem anyway and the current
> "PredictableNetworkInterfaceNames" is not going to solve that.
>
> Then you will have ethernet0 and wireless0 names for the majority (vast
> majority) of consumer devices out there.
>
> You will have an almost 100.000% guarantee that an ordinary user with 2
> ethernet NICs (like a motherboard with 2 ports) will never ever
> experience NIC swapping (eth0 becoming eth1 and vice versa).
>
> You will not see a difference for USB and hotplug because you only
> prettify the names, compared to the current system.
>
> The statistical likelihood of this ever going wrong for those 2 NICs is
> just very very very very small. I don't care what you say about NICs
> showing up 2 hours after the system is booted. If you have a system that
> has to wait 2 hours for a relevant or essential NIC, it is going to be
> nonfunctional anyway.
>
> If you feel I'm being thick, please say so. I feel I am (but do explain
> ;-)).
>
> I just don't see how this is going to turn into any problem ever for
> anyone. If you do the renaming prior to starting networking, it is
> nearly impossible EVER that this will impact real people in a real way.
>
> Maybe that is not acceptable. From my point of view currently with the
> knowledge I have, it would work out fine and "waiting for devices to be
> present before you act on it" seems like a very nice thing to do anyway.
> It feels nice to me.
>
> It is only relevant for networking setups and if both devices you need
> are not present (or even more) you should not act on it anyway and the
> system should fail. Or you should have a provision that you are alerted
> of networking hardware not having come online.
>
> There is not really any scenario where this condensing of enp3s0 names
> is going to cause a problem.
>
> And if it does, reboot you know ;-). But it is not going to happen.
> Consumer systems usually have one NIC (eth). Routers running systemd
> need to guarantee that both needed devices (or more) are present before
> starting networking. I bet it is not a problem for them to depend on
> fixed bus names, especially if they are embedded systems. But hardware
> failure would usually disrupt functioning anyway.
>
> And you could turn that system off if you wanted. It doesn't have to be
> the same for everyone, as long as it is convenient and usable for the
> majority. Right?
>
> All you need to do is wait a few seconds before you start renaming or
> wait on some defined trigger.
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel

I am the primary developer of biosdevname.  I've been wanting this
naming functionality built into systemd or even the OS itself.
Primarily I am interested in servers with multiple physical and
virtual NICs but getting it working on desktops would be a bonus as
well.

The problem lies in the mapping itself.  Network devices can be on a
single, dual, or even quad-port cards.  Each one of these ports can be
'virtualized' through SR-IOV or NIC partitioning, one physical card
can potentially have hundreds of virtual NICs.  Other cards implement
multiple network interfaces for a single PCI bus:dev:func pair.
SMBIOS table has a mapping from slot number to PCI device, this can be
used to determine the physical slot number of a network card (and its
ports).

So there are at least 4 variables that you must keep track of, for add-in cards

PCI slot #
NIC physical port # (for multi-port cards)
NIC device ID (Each physical port can implement multiple network
devices)  see mlx4 driver or i.
NIC partition number (each device can then have multiple
partitions/virtual devices)  See. SR-IOV or Dell NPAR (network
partitioning)

For embedded devices (onboard), PCI slot # is replaced by instance

Re: [systemd-devel] Resolving systemd naming problems on multi-port PCI cards

2016-04-07 Thread Jordan Hargrave
On Thu, Apr 7, 2016 at 11:48 AM, Kay Sievers <k...@vrfy.org> wrote:
> On Thu, Apr 7, 2016 at 6:08 PM, Jordan Hargrave <jhar...@gmail.com> wrote:
>> The current systemd naming scheme for Network cards has a problem
>> correctly naming multi-port NIC devices in a PCI slot.
>>
>> Systemd currently generates names of the form:
>>
>> enpAsBfCdD
>>
>> pA = PCI bus number
>> sB = PCI device number (confusingly called 'SLOT')
>
> Geographical addressing uses sometimes slot sometimes device. The
> kernel uses "slot"
> https://github.com/torvalds/linux/blob/master/arch/x86/pci/early.c
>
>> fC = PCI function number
>> [dD = NIC device port (sysfs dev_port)]
>>
>> eg. enp5s0f0 for a NIC at 05:00.0, dev_port = 0
>>
>> These names already aren't necessarily persistent if PCI bus topology
>> changes (Bus number changes due to adding cards across reboot, etc).
>
> Sure, geographical addressing is not expected to cover hardware
> reconfiguration or firmwares which just do "random" renumbering at
> reboot time.
>
>> --or--
>> ensBfCdD
>>
>> sB = _SUN slot
>> fC = PCI function number
>> [dD = NIC device port (sysfs dev_port)]
>>
>> eg. ens2f0d1 for a single-port NIC at 0?:00.0 in PCI slot 2, dev_port = 1
>>
>> The problem is the 2nd naming scheme cannot handle multi-port NICs.
>> Multi-port NICs often have one or more bridges before the PCI slot
>> number itself.
>>
>> eg. for my quad-port Intel NIC in PCI slot 2 the devices are actually:
>> 44:00.0
>> 44:00.1
>> 45:00.0
>> 45:00.1
>>
>> Using the 2nd naming scheme, the names generated are:
>> ens2f0
>> ens2f1
>> ens2f0
>> ens2f1
>>
>> Oops. Problem. There is a name collision.
>> So depending on who gets
>> initialized first I'll see either:
>>
>> ens2f0
>> ens2f1
>> enp69s0f0
>> enp69s1f0
>>
>> or
>> enp68s0f0
>> enp68s1f0
>> ens2f0
>> ens2f1
>
> How does /sys/bus/pci/slots/ look in that case?
>

There are three entries:
/sys/bus/pci/slots/PCI1 : address = :41:00.0
/sys/bus/pci/slots/PCI2 : address = :42:00.0
/sys/bus/pci/slots/PCI3 : address = :04:00.0

Normally systemd won't discover "PCI2" on my multi-port as it only
looks at a matching device in /sys/bus/pci/slots/address.  So it
checks :44:00.0, :44:00.1, etc. That doesn't match.  On a
single-port NIC in a PCI slot, it would match.

Here's the device tree of the devices that all live under :42:00.0
/sys/devices/pci:40/:40:03.0/:42:00.0/:43:02.0,PCI2
/sys/devices/pci:40/:40:03.0/:42:00.0/:43:04.0,PCI2
/sys/devices/pci:40/:40:03.0/:42:00.0/:43:02.0/:44:00.0,PCI2
NIC Port 1
/sys/devices/pci:40/:40:03.0/:42:00.0/:43:02.0/:44:00.1,PCI2
NIC Port 2
/sys/devices/pci:40/:40:03.0/:42:00.0/:43:04.0/:45:00.0,PCI2
NIC Port 3
/sys/devices/pci:40/:40:03.0/:42:00.0/:43:04.0/:45:00.1,PCI2
NIC Port 4

I changed systemd to also search the parent devices for a match, but
that causes the naming conflict as now 4 devices match, with same
device and function numbers.

> When is the PCI hotplug driver loaded? Before or after the network card 
> driver?
>

Slot files are created at PCI device enumeration, so before network
driver loads.

>> There is a way to fix this by combining the two naming schemes, with a
>> bit of a hack.
>>
>> enpAsBfCdD
>>
>> pA = PCI bus # (no change)
>> sB = _SUN slot # (no change)
>> fC = This is what changes. Instead of C = function number (0..7) it is
>> Device:Function (0..31)
>> dD = Device port (no change)
>>
>> On my system this generates new names:
>> enp4s0 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f0
>> enp4s0d1 at /sys/devices/pci:00/:00:03.0 1 SLOT 3  => enp3s4f0d1
>> enp4s0f1 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f1
>> enp4s0f1d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f1d1
>> enp4s0f2 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f2
>> enp4s0f2d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f2d1
>> enp4s0f3 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f3
>> enp4s0f3d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f3d1
>> enp4s0f4 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f4
>> enp4s0f4d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f4d1
>> enp4s0f5 at /sys/devices/pci:00/:00:03.0 SLOT  => enp3s4f5
>> enp4s0f5d1 at /sys/dev

[systemd-devel] Resolving systemd naming problems on multi-port PCI cards

2016-04-07 Thread Jordan Hargrave
The current systemd naming scheme for Network cards has a problem
correctly naming multi-port NIC devices in a PCI slot.

Systemd currently generates names of the form:

enpAsBfCdD

pA = PCI bus number
sB = PCI device number (confusingly called 'SLOT')
fC = PCI function number
[dD = NIC device port (sysfs dev_port)]

eg. enp5s0f0 for a NIC at 05:00.0, dev_port = 0

These names already aren't necessarily persistent if PCI bus topology
changes (Bus number changes due to adding cards across reboot, etc).

--or--
ensBfCdD

sB = _SUN slot
fC = PCI function number
[dD = NIC device port (sysfs dev_port)]

eg. ens2f0d1 for a single-port NIC at 0?:00.0 in PCI slot 2, dev_port = 1

The problem is the 2nd naming scheme cannot handle multi-port NICs.
Multi-port NICs often have one or more bridges before the PCI slot
number itself.

eg. for my quad-port Intel NIC in PCI slot 2 the devices are actually:
44:00.0
44:00.1
45:00.0
45:00.1

Using the 2nd naming scheme, the names generated are:
ens2f0
ens2f1
ens2f0
ens2f1

Oops. Problem. There is a name collision.  So depending on who gets
initialized first I'll see either:

ens2f0
ens2f1
enp69s0f0
enp69s1f0

or
enp68s0f0
enp68s1f0
ens2f0
ens2f1

There is a way to fix this by combining the two naming schemes, with a
bit of a hack.

enpAsBfCdD

pA = PCI bus # (no change)
sB = _SUN slot # (no change)
fC = This is what changes. Instead of C = function number (0..7) it is
Device:Function (0..31)
dD = Device port (no change)

On my system this generates new names:
enp4s0 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f0
enp4s0d1 at /sys/devices/pci:00/:00:03.0 1 SLOT 3  => enp3s4f0d1
enp4s0f1 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f1
enp4s0f1d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f1d1
enp4s0f2 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f2
enp4s0f2d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f2d1
enp4s0f3 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f3
enp4s0f3d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f3d1
enp4s0f4 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f4
enp4s0f4d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f4d1
enp4s0f5 at /sys/devices/pci:00/:00:03.0 SLOT  => enp3s4f5
enp4s0f5d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f5d1
enp4s0f6 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f6
enp4s0f6d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f6d1
enp4s0f7 at /sys/devices/pci:00/:00:03.0 SLOT 3=> enp3s4f7
enp4s0f7d1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f7d1
enp4s1 at /sys/devices/pci:00/:00:03.0 SLOT 3  => enp3s4f8
  (Device 1:0 => Function 8)
enp4s1d1 at /sys/devices/pci:00/:00:03.0 SLOT 3=>
enp3s4f8d1   (Device 1:0 => Function 8)

enp68s0f0 at /sys/devices/pci:40/:40:03.0 SLOT 2   => enp68s2f0
enp68s0f1 at /sys/devices/pci:40/:40:03.0 SLOT 2   => enp68s2f1
enp69s0f0 at /sys/devices/pci:40/:40:03.0 SLOT 2   => enp69s2f0
enp69s0f1 at /sys/devices/pci:40/:40:03.0 SLOT 2   => enp69s2f1

This way it is always able to determine the physical PCI slot the device is in.

This scheme still does have a limitation... the names may not be
persistent if PCI topology changes due to the PCI bus number still
being part of the name.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Proposal: Add biosdevname naming scheme to systemd

2016-02-04 Thread Jordan Hargrave
On Tue, Oct 20, 2015 at 10:04 PM, Jordan Hargrave <jhar...@gmail.com> wrote:

> On Tue, Oct 20, 2015 at 3:02 PM, Andrei Borzenkov <arvidj...@gmail.com>
> wrote:
> > 20.10.2015 17:30, Jordan Hargrave пишет:
> >
> >> On Tue, Oct 20, 2015 at 1:15 AM, Andrei Borzenkov <arvidj...@gmail.com>
> >> wrote:
> >>>
> >>> On Tue, Oct 20, 2015 at 7:46 AM, Jordan Hargrave <jhar...@gmail.com>
> >>> wrote:
> >>>>
> >>>> On Mon, Mar 2, 2015 at 1:17 PM, Tom Gundersen <t...@jklm.no> wrote:
> >>>>>
> >>>>> Hi Jordan,
> >>>>>
> >>>>> On Mon, Mar 2, 2015 at 4:45 PM, Jordan Hargrave <jhar...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> There are currently two competing naming mechanisms for network
> cards,
> >>>>>> biosdevname and systemd.  Systemd currently has some limitations on
> >>>>>> naming
> >>>>>> cards that use network partitioning or support SR-IOV.
> >>>>>
> >>>>>
> >>>>> Could you point to an example so we can fix it? I thought all bug
> >>>>> reports had been handled, but maybe I lost track of something.
> >>>>>
> >>>>
> >>>> I have a quad-port NIC:
> >>>> :40:00.0 = PCIE bridge (SMBIOS Slot 2)
> >>>> :41:00.0 = Ethernet Device (port1)
> >>>> :41:00.1 = Ethernet Device (port2)
> >>>> :42:00.0 = Ethernet Device (port3)
> >>>> :42:00.1 = Ethernet Device (port4)
> >>>>
> >>>> biosdevname would name these p2p1, p2p2, p2p3, p2p4 respectively.
> >>>>
> >>>
> >>> How does it determine that 41 and 42 are the same device? I.e. how
> >>> does it differ from real bridge with two independent two-port cards
> >>> behind? Could you explain what information it is using? Is it exported
> >>> in sysfs?
> >>>
> >>> ...
> >>
> >>
> >> It knows they are on the same slot as the parent device has SMBIOS
> >> Slot#2 (Type 9).  So all child devices of a physical slot are on the
> >> same card.  I'm currently using a patch to systemd that reads SMBIOS
> >> type 9.  There isn't a kernel sysfs variable that displays this.
> >>
> >
> > This gives us slot ID, but how do we know which of function 0 on this
> slot
> > ID is port 0 and which is port 2? There is nothing in SMBIOS description
> of
> > Type 9 that answers it.
>
> Looking for a resolution for this.. adding port numbers to systemd
enumeration of devices in PCI slots.

Systemd still doesn't have the concept of a 'Port' number, so multi-port
NICs get named with enpsf instead of ensp.

There needs to be a way to generate systemd names that have the following
variables for add-in cards:
  Slot Number
  Port Number
  Instance Number (for SR-IOV or Network Partitioned devices)

It is possible to calculate the port number without any knowledge of
sibling devices.  Requires knowledge of device PCI ID, parent tree and
'dev_port' attribute (for mellanox cards). Then the devices could be named
ensSLOTpPORT

Network cards I've noticed have the following PCI bus topology:

dual port cards:
bus:00.0 = port 1
bus:00.1 = port 2

quad port cards have two different layouts:  top level bus (upstream port)
with two downstream ports, or all devices as multifunction

Broadcom cards seem to follow this style
bus:00.0 = port 1
bus:00.1 = port 2
bus:00.2 = port 3
bus:00.3 = port 4

Intel cards seem to follow this style
bus:00.0 = upstream port.  sbus is secondary bus
sbus:xx.y = bridge 1
sbus:xx.z = bridge 2
sbus+1:00.0 = port 1
sbus+1:00.1 = port 2
sbus+2:00.0 = port 3
sbus+2:00.1 = port 4

Walking up the /sys/bus/pci/devices heirarchy from the network 'device'
link until the 'acpi_index' field is found/  Then read sysfs 'label'
field.  On Dell systems this is the form "SLOT x" so the slot ID can be
determined.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] network interface renaming via PCI ID w/ systemd-udevd

2015-11-13 Thread Jordan Hargrave
On Fri, Nov 13, 2015 at 12:44 PM, Filipe Brandenburger
 wrote:
> On Thu, Nov 12, 2015 at 10:13 AM, Matthew Hall  wrote:
>> On Thu, Nov 12, 2015 at 10:37:56AM +0100, Lennart Poettering wrote:
>>> Since time began eth* is where the kernel automatically picked iface
>>> names from. If you want to assign your own names go for some other
>>> namespace, or be prepared to race against the kernel, and deal with
>>> it.
>>
>> Again, this logic worked well when the level of dynamism was lower.
>
> I think I see where you're coming from... Some distributions (in my
> recollection, RHEL) would use weird tricks to keep interface ordering
> stable while still keeping the eth0, eth1, ethX names.
>
> If I recall correctly, RHEL's /etc/init.d/network would try to match
> interface names from /etc/sysconfig/networking-scripts/ifcfg-ethX to
> the MAC address listed inside that configuration file. If it had to
> switch it from, say, eth0 to eth1, it would do weird tricks such as
> looking whether eth1 existed already, then rename it to tmp98765 with
> a random number, then rename eth0 to eth1. In many cases, something
> would go awry and you would end up with an interface named tmp98765.
>
> As you can imagine, this was fraught with problems and race
> conditions. It doesn't really work when you're trying to boot with as
> much parallelism (which is something we aim for these days) or even
> hot plug new interfaces...
>
>> But now the level of dynamism is higher and different principles should 
>> apply.
>
> Yes. I'd say that's a good thing.
>
>> You aren't thinking very much about how it will work for newer users.
>
> New users mostly don't care... I really think retraining your fingers
> from eth -> enp or whatever you pick, net, lan, wired, etc. is
> probably much easier than trying to preserve a relic of a name that
> mostly serves no purpose these days... As mentioned, keeping it is not
> simple since it's still the dumping ground for the kernel (and that's
> unlikely to change), avoiding the race with the kernel is much better
> than trying to deal with it, the complexity is just not worth it... I
> hope you get to see the light!
>
> Cheers!
> Filipe
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel

I'm the primary developer of biosdevname, used by RedHat/Fedora for
consistent naming.

Servers using SR-IOV can have potentially hundreds of NIC devices, all
loaded in parallel.
eth* is still the internal name used by the kernel, and the number
gets assigned first-come first-served.   Trying to keep consistent
eth* names without collision here is impossible.  The eth* device name
then gets changed by systemd or biosdevname.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Fwd: [PATCH] Add support for detecting NIC partitions on Dell Servers

2015-11-10 Thread Jordan Hargrave
On Tue, Nov 10, 2015 at 4:53 AM, Kay Sievers <k...@vrfy.org> wrote:
> On Tue, Nov 10, 2015 at 5:49 AM, Jordan Hargrave <jhar...@gmail.com> wrote:
>> Cleaned up linux coding style
>>
>> This patch will integrate some of the features of biosdevname into systemd.
>> The code detects the port and index for detecting NIC partitions. This 
>> creates
>> a new environment variable, ID_NET_NAME_PARTITION of the format
>> _
>>
>> The patch will also decode SMBIOS slot number for NIC, and store in the 
>> variable
>> ID_NET_NAME_SMBIOS_SLOT.  Systemd does not have a method for naming
>> ports on a multi-port card plugged into a slot.
>
> Again, I don't think systemd should carry an SMBIOS parser.
>
> Sorry,
> Kay

From a customer usability standpoint, having the slot numbers as part
of systemd would be a very useful feature.  The current method only
works for single-port NICs in a slot.  Multi-port NICs, especially
ones with SR-IOV or multiple partitions get garbled names like

enp4s0
enp4s1
enp4s0d1
enp4s0f1
enp4s0f2
enp4s0f3
enp4s0f4
enp4s0f5
enp4s0f6
enp4s0f7
enp4s0f1d1
enp4s0f2d1
enp4s0f3d1
enp4s0f4d1
enp4s0f5d1
enp4s0f6d1
enp4s0f7d1
enp4s1d1
enp68s0f0
enp68s0f1
enp69s0f0
enp69s0f1

That's another annoying thing with systemd names, the bus number is
*decimal*.  lspci is in hex, so the customer has to do a conversion to
figure out even what PCI device that is.

All enp4 are a dual-port NIC in Slot 3 with 8 SR-IOV devices.

All enp68xx and enp69xxx are a single quad-port NIC in slot 2.
Systemd breaks here if trying to name using slot numbers with the
existing method.  As there are 4 devices under the slot with same
device numbers, systemd would name them
ens2f0
ens2f1
ens2f0
ens2f1

Which causes name collision.  I was able to verify this as either they
got named:
ens2f0
ens2f1
enp69s0f0
enp69s0f1

or
enp68s0f0
enp68s0f1
ens2f0
ens2f1

at startup.

That's the best feature of biosdevname, being able to tell which slot
the NIC is located just from the name.  Systemd still has some
limitations and/or bugs in this regard.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Fwd: [PATCH] Add support for detecting NIC partitions on Dell Servers

2015-11-09 Thread Jordan Hargrave
On Mon, Nov 9, 2015 at 4:43 PM, Lennart Poettering
<lenn...@poettering.net> wrote:
> On Mon, 09.11.15 12:42, Jordan Hargrave (jhar...@gmail.com) wrote:
>
>> From: Jordan Hargrave <jhar...@gmail.com>
>>
>> This patch will integrate some of the features of biosdevname into systemd.
>> The code detects the port and index for detecting NIC partitions. This 
>> creates
>> a new environment variable, ID_NET_NAME_PARTITION of the format
>> _
>
> "partitions"? What's that supposed to be? SR-IOV?
>

Similar to SR-IOV but on Dell servers partitions actually show up as
PCI devices in the initial enumeration.  The PCI device vital product
data area has a map of which Bus:Dev:Func is which port and which port
instance.

So say we have a PCI device tree for a quad-port NIC, 4 physical ports
but 8 NICs.

04:00.0 port 1, instance 1
04:00.1 port 2, instance 1
04:00.2 port 3, instance 1
04:00.3 port 4, instance 1
04:00.4 port 1, instance 2
04:00.5 port 2, instance 2
04:00.6 port 3, instance 2
04:00.7 port 4, instance 2

We use this partition info to display if a bond is created that
contains the same physical nic port.

>> The patch will also decode SMBIOS slot number for NIC, and store in
>> the variable ID_NET_NAME_SMBIOS_SLOT.  Systemd does not have a method
>> for naming ports on a multi-port card plugged into a slot.
>
> Hmm, isn't this stuff the same as exported by the kernel as the
> "index" field, i.e. SMBIOS Type 41? Can you elaborate on the relation
> of that field and the stuff this patch adds?
>

 The index is exported for embedded devices, but not add-in cards.
acpi_index may be exported for add-in cards, but generally not very
useful as it has a number like 17 or 24... no relation to the actual
physical slot where a NIC is located.

>> Signed-off-by: Jordan Hargrave <jordan_hargr...@dell.com>
>
> I didn't look too closely in the actual sources, but I did notice that
> it is line-broken, and doesn't follow CODING_STYLE in quite a few
> cases, regarding placement of brackets, or error handling for
> example.
>
> In order to avoid any confusion with line-broken patches, and to make
> review easier, please submit the patch as github PR, which is the way
> we generally prefer receiving patches these days!

Where do I submit this?

>
> Thanks,
>
> Lennart
>
> --
> Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Fwd: [PATCH] Add support for detecting NIC partitions on Dell Servers

2015-11-09 Thread Jordan Hargrave
Cleaned up linux coding style

This patch will integrate some of the features of biosdevname into systemd.
The code detects the port and index for detecting NIC partitions. This creates
a new environment variable, ID_NET_NAME_PARTITION of the format
_

The patch will also decode SMBIOS slot number for NIC, and store in the variable
ID_NET_NAME_SMBIOS_SLOT.  Systemd does not have a method for naming
ports on a multi-port card plugged into a slot.

Signed-off-by: Jordan Hargrave <jordan_hargr...@dell.com>
---
 src/udev/udev-builtin-net_id.c | 206 +
 1 file changed, 206 insertions(+)

diff --git a/src/udev/udev-builtin-net_id.c b/src/udev/udev-builtin-net_id.c
index bf5c9c6..04b08dd 100644
--- a/src/udev/udev-builtin-net_id.c
+++ b/src/udev/udev-builtin-net_id.c
@@ -119,16 +119,130 @@ struct netnames {
 bool mac_valid;

 struct udev_device *pcidev;
+struct udev_device *physdev;
 char pci_slot[IFNAMSIZ];
 char pci_path[IFNAMSIZ];
 char pci_onboard[IFNAMSIZ];
 const char *pci_onboard_label;
+int  npar_port;
+int  npar_pfi;
+int  smbios_slot;

 char usb_ports[IFNAMSIZ];
 char bcma_core[IFNAMSIZ];
 char ccw_group[IFNAMSIZ];
 };

+#define FLAG_IOV 0x80
+#define FLAG_NPAR 0x1000
+
+#define VPDI_TAG 0x82
+#define VPDR_TAG 0x90
+
+struct vpd_tag {
+char  cc[2];
+unsigned char len;
+char  data[1];
+};
+
+/* Read VPD tag ID */
+static int vpd_readtag(int fd, int *len)
+{
+unsigned char tag, tlen[2];
+
+if (read(fd, , 1) != 1)
+return -1;
+if (tag == 0x00 || tag == 0xFF || tag == 0x7F)
+return -1;
+if (tag & 0x80) {
+if (read(fd, tlen, 2) != 2)
+return -1;
+*len = tlen[0] + (tlen[1] << 8);
+return tag;
+}
+*len = (tag & 0x7);
+return (tag & ~0x7);
+}
+
+static void *vpd_findtag(void *buf, int len, const char *sig)
+{
+int off, siglen;
+struct vpd_tag *t;
+
+off = 0;
+siglen = strlen(sig);
+while (off < len) {
+t = (struct vpd_tag *)((unsigned char *)buf + off);
+if (!memcmp(t->data, sig, siglen))
+return t;
+off += (t->len + 3);
+}
+return NULL;
+}
+
+static void dev_pci_npar_dcm(struct udev_device *dev, struct netnames *names,
+ int len, const char *dcm,
+ const char *fmt, int step)
+{
+int domain, bus, slot, func, off, mydf;
+int port, df, pfi, flag;
+
+if (sscanf(udev_device_get_sysname(names->physdev), "%x:%x:%x.%u",
+   , , , ) != 4)
+return;
+mydf = (slot << 3) + func;
+for (off = 3; off < len; off += step) {
+if (sscanf(dcm+off, fmt, , , , ) != 4)
+continue;
+if ((flag & FLAG_NPAR) && mydf == df) {
+names->npar_port = port;
+names->npar_pfi = pfi;
+}
+}
+}
+
+static void dev_pci_npar(struct udev_device *dev, struct netnames *names)
+{
+const char *filename;
+int len, fd;
+struct vpd_tag *dcm;
+void *buf;
+
+/* Search for VPD or IOV VPD */
+filename = strjoina(udev_device_get_syspath(names->physdev), "/vpd");
+fd = open(filename, O_RDONLY);
+if (fd < 0)
+return;
+if (vpd_readtag(fd, ) != VPDI_TAG)
+goto done;
+lseek(fd, len, SEEK_CUR);
+
+/* Check VPD-R */
+if (vpd_readtag(fd, ) != VPDR_TAG)
+goto done;
+buf = alloca(len);
+if (read(fd, buf, len) != len)
+goto done;
+
+/* Check for DELL VPD tag */
+if (!vpd_findtag(buf, len, "DSV1028VPDR.VER"))
+goto done;
+
+/* Find DCM/DC2 tag */
+dcm = vpd_findtag(buf, len, "DCM");
+if (dcm != NULL)
+dev_pci_npar_dcm(dev, names, dcm->len, dcm->data,
+ "%1x%1x%2x%6x", 10);
+else {
+dcm = vpd_findtag(buf, len, "DC2");
+if (dcm != NULL)
+dev_pci_npar_dcm(dev, names, dcm->len, dcm->data,
+ "%1x%2x%2x%6x", 11);
+}
+done:
+close(fd);
+}
+
 /* retrieve on-board index number and label from firmware */
 static int dev_pci_onboard(struct udev_device *dev, struct netnames *names) {
 unsigned dev_port = 0;
@@ -187,6 +301,78 @@ static bool is_pci_multifunction(struct udev_device *dev) {
 return false;
 }

+

Re: [systemd-devel] Proposal: Add biosdevname naming scheme to systemd

2015-10-20 Thread Jordan Hargrave
On Tue, Oct 20, 2015 at 3:14 AM, Kay Sievers <k...@vrfy.org> wrote:
> On Tue, Oct 20, 2015 at 6:46 AM, Jordan Hargrave <jhar...@gmail.com> wrote:
>> On Mon, Mar 2, 2015 at 1:17 PM, Tom Gundersen <t...@jklm.no> wrote:
>>> Hi Jordan,
>>>
>>> On Mon, Mar 2, 2015 at 4:45 PM, Jordan Hargrave <jhar...@gmail.com> wrote:
>>>> There are currently two competing naming mechanisms for network cards,
>>>> biosdevname and systemd.  Systemd currently has some limitations on naming
>>>> cards that use network partitioning or support SR-IOV.
>>>
>>> Could you point to an example so we can fix it? I thought all bug
>>> reports had been handled, but maybe I lost track of something.
>>>
>>
>> I have a quad-port NIC:
>> :40:00.0 = PCIE bridge (SMBIOS Slot 2)
>> :41:00.0 = Ethernet Device (port1)
>> :41:00.1 = Ethernet Device (port2)
>> :42:00.0 = Ethernet Device (port3)
>> :42:00.1 = Ethernet Device (port4)
>>
>> biosdevname would name these p2p1, p2p2, p2p3, p2p4 respectively.
>>
>> With systemd, it's ugly. I added the patch to get SMBIOS slot numbers
>> and I see systemd get RANDOM names depending on boot.
>>
>> Either:
>> s2f0 (p1)
>> s2f1 (p2)
>> p66s0f0 (p3)
>> p66s0f1 (p4)
>>
>> I also saw the opposite:
>> p65s0f0 (p1)
>> p65s0f1 (p2)
>> s2f0 (p3)
>> s2f1 (p4)
>
> That looks like an issue with the PCI hotplug drivers. You either need
> to load them early enough, or not at all. Or just disable the slot
> naming policy in a networkd link file.
>
>> Since systemd doesn't have a concept of a 'port', whichever devices
>> get named first (they are named in parallel, race conditions), the
>> other devices have name collision (function 0,1 are duplicate, but on
>> different bus).
>
> Systemd cannot have a concept of a port across otherwise independent
> devices. It would mean to mainain a counter across devices which
> again will depend and introduce names based on enumeration order.
>

Dell systems export a string as part of PCI VPD data that has a
mapping of which PCI B:D:F belongs to which port.  This  mainly is
used for mapping virtual/partition devices to the parent partition.
These devices show up as physical pci devices on the pci scan.  There
are also cards that support virtual SR-IOV devices.  The quad-port
example above was a special case, but here is another.  We have a
Mellanox card that implements two network devices under a single
B:D:F.  It also supports SR-IOV.  So a single PCI B:D:F maps to 16
network devices.  Systemd uses the sysfs dev_port/dev_id to identify
which actual device it is.

Systemd names these as:
p66s0f0
p66s0f0d1
p66s0f1
p66s0f1d1
p66s0f2
p66s0f2d1
p66s0f3
p66s0f3d1
p66s0f4
p66s0f4d1
etc.

Again, p66 doesn't tell the user anything about where the device is in
the system or which port the network cable is plugged into.
biosdevname looks up the 'physical' sr-iov device and SMBIOS slot
number and names them:

p2p1  (original device)
p2p1_0 (virtual)
p2p1_1
p2p1_2
p2p1_3
...
p2p2
p2p2_0 (virtual device)
p2p2_1
...
etc.

This feature is really what we would like to see implemented in
systemd.  First, name devices properly based on SMBIOS slot number.
Second, have physical name of NIC be the base name, along with virtual
index.  We use this when enabling bonding to warn if a bond is enabled
using the same physical cable.  The name can be stored in a separate
environment variable (ID_NET_NAME_BIOSDEVNAME or similar).

>>>> Proposal is to add
>>>> support for biosdevname-like names as part of systemd.  The names would be
>>>> created as a new environment variable ID_NET_NAME_BIOSDEVNAME.  This could
>>>> then be used in the udev rules scripts to replace the external biosdevname
>>>> handler.
>
> This is unlikely going to happen. Biosdevname "invents" counters which
> are unreliable and introduce inter-device probe-order depenedencies.
> It causes the same problem as the the kernel's ethX, just less likely.
> Systemd cannot do that.
>
It doesn't invent them if they are part of the DCM string in the PCI VPD.

>>> I don't think this makes much sense. If biosdevname had been
>>> acceptable, the udev naming scheme would not have been introduced in
>>> the first place.
>
> Right, the udev naming would not have been there or used the same
> names if biosdevname was reliable, which it unfortunately isn't for
> the above mentioned reasons,
>
>> biosdevname is going away in new version of RHEL, so we will lose the
>> capability to detect if two 'virtual' NICs are actually the same
>> physical NIC.  The naming in systemd

Re: [systemd-devel] Proposal: Add biosdevname naming scheme to systemd

2015-10-20 Thread Jordan Hargrave
On Tue, Oct 20, 2015 at 3:02 PM, Andrei Borzenkov <arvidj...@gmail.com> wrote:
> 20.10.2015 17:30, Jordan Hargrave пишет:
>
>> On Tue, Oct 20, 2015 at 1:15 AM, Andrei Borzenkov <arvidj...@gmail.com>
>> wrote:
>>>
>>> On Tue, Oct 20, 2015 at 7:46 AM, Jordan Hargrave <jhar...@gmail.com>
>>> wrote:
>>>>
>>>> On Mon, Mar 2, 2015 at 1:17 PM, Tom Gundersen <t...@jklm.no> wrote:
>>>>>
>>>>> Hi Jordan,
>>>>>
>>>>> On Mon, Mar 2, 2015 at 4:45 PM, Jordan Hargrave <jhar...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> There are currently two competing naming mechanisms for network cards,
>>>>>> biosdevname and systemd.  Systemd currently has some limitations on
>>>>>> naming
>>>>>> cards that use network partitioning or support SR-IOV.
>>>>>
>>>>>
>>>>> Could you point to an example so we can fix it? I thought all bug
>>>>> reports had been handled, but maybe I lost track of something.
>>>>>
>>>>
>>>> I have a quad-port NIC:
>>>> :40:00.0 = PCIE bridge (SMBIOS Slot 2)
>>>> :41:00.0 = Ethernet Device (port1)
>>>> :41:00.1 = Ethernet Device (port2)
>>>> :42:00.0 = Ethernet Device (port3)
>>>> :42:00.1 = Ethernet Device (port4)
>>>>
>>>> biosdevname would name these p2p1, p2p2, p2p3, p2p4 respectively.
>>>>
>>>
>>> How does it determine that 41 and 42 are the same device? I.e. how
>>> does it differ from real bridge with two independent two-port cards
>>> behind? Could you explain what information it is using? Is it exported
>>> in sysfs?
>>>
>>> ...
>>
>>
>> It knows they are on the same slot as the parent device has SMBIOS
>> Slot#2 (Type 9).  So all child devices of a physical slot are on the
>> same card.  I'm currently using a patch to systemd that reads SMBIOS
>> type 9.  There isn't a kernel sysfs variable that displays this.
>>
>
> This gives us slot ID, but how do we know which of function 0 on this slot
> ID is port 0 and which is port 2? There is nothing in SMBIOS description of
> Type 9 that answers it.

Dell cards have an encoding in PCI VPD (DCM) that maps a PCI DevFn to
port and instance.

Here's an example system with loaded Qlogic, Broadcom, Intel cards.
Intel supports SR-IOV

 DCM1001009d7F1402009d7F2101009d7F2502009d7F32010089154301008915
  :01:00.0 port=1 instance=1
  :01:00.4 port=1 instance=2
  :01:00.1 port=2 instance=1
  :01:00.5 port=2 instance=2
  :01:00.2 port=3 instance=1
  :01:00.3 port=4 instance=1
 
DCM100100015F120200015F140300015F160400015F210100015F230200015F250300015F270400015F
  :41:00.0 port=1 instance=1
  :41:00.2 port=1 instance=2
  :41:00.4 port=1 instance=3
  :41:00.6 port=1 instance=4
  :41:00.1 port=2 instance=1
  :41:00.3 port=2 instance=2
  :41:00.5 port=2 instance=3
  :41:00.7 port=2 instance=4
 DCM10010081D521010081D5
  :43:00.0 port=1 instance=1
  :43:00.1 port=2 instance=1

LSPCI has the following ethernet devices.
   :01:00.0 = Broadcom, port 1, instance 1
   :01:00.1 port 2, instance 1
   :01:00.2 port 3
   :01:00.3 port 4
   :01:00.4 port 1, instance 2
   :01:00.5 port 2, instance 2

   :41:00.0 = QLogic (PCIE slot 5), port 1, instance 1
   :41:00.1 = port 2, instance 1
   :41:00.2 = port 1, instance 2
   :41:00.3 = port 2, instance 2
   :41:00.4 = port 1, instance 3
   :41:00.5 = port 2, instance 3
   :41:00.6 = port 1, instance 4
   :41:00.7 = port 2, instance 4

   :43:00.0 = Intel (PCIE slot 2), port 1
   :43:00.1 port 2
   :43:10.0 vf port 1, instance 0
   :43:10.1 vf port 2, instance 0
   :43:10.2 vf port 1, instance 1
   :43:10.3 vf port 2, instance 1
   :43:10.4 vf port 1, instance 2
   :43:10.5 vf port 2, instance 2
   :43:10.6 vf port 1, instance 3
   :43:10.7 vf port 2, instance 3
   :43:11.0 vf port 1, instance 4
   :43:11.1 etc..
   :43:11.2
   :43:11.3
   :43:11.4
   :43:11.5
   :43:11.6
   :43:11.7
   :43:12.0
   :43:12.1
   :43:12.2
   :43:12.3
   :43:12.4
   :43:12.5
   :43:12.6
   :43:12.7
   :43:13.0
   :43:13.1
   :43:13.2
   :43:13.3
   :43:13.4
   :43:13.5
   :43:13.6
   :43:13.7
   :43:14.0
   :43:14.1
   :43:14.2
   :43:14.3
   :43:14.4
   :43:14.5
   :43:14.6
   :43:14.7
   :43:15.0
   :43:15.1
   :43:15.2
   :43:15.3
   :43:15.4
   :43:15.5
   :43:15.6
   :43:15.7
   :43:16.0
   :43:16.1
   :43:16.2
   :43:16.3
   :43:16.4
   :43:16.5
   :43:16.6
   :43:16.7
   :43:17.0
   :43:17.1
   :43:17.2
   :43:17.3
   :43:17.4
   :43:17.5
   :43:17.6
   :43:17.7
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Proposal: Add biosdevname naming scheme to systemd

2015-10-19 Thread Jordan Hargrave
On Mon, Mar 2, 2015 at 1:17 PM, Tom Gundersen <t...@jklm.no> wrote:
> Hi Jordan,
>
> On Mon, Mar 2, 2015 at 4:45 PM, Jordan Hargrave <jhar...@gmail.com> wrote:
>> There are currently two competing naming mechanisms for network cards,
>> biosdevname and systemd.  Systemd currently has some limitations on naming
>> cards that use network partitioning or support SR-IOV.
>
> Could you point to an example so we can fix it? I thought all bug
> reports had been handled, but maybe I lost track of something.
>

I have a quad-port NIC:
:40:00.0 = PCIE bridge (SMBIOS Slot 2)
:41:00.0 = Ethernet Device (port1)
:41:00.1 = Ethernet Device (port2)
:42:00.0 = Ethernet Device (port3)
:42:00.1 = Ethernet Device (port4)

biosdevname would name these p2p1, p2p2, p2p3, p2p4 respectively.

With systemd, it's ugly. I added the patch to get SMBIOS slot numbers
and I see systemd get RANDOM names depending on boot.

Either:
s2f0 (p1)
s2f1 (p2)
p66s0f0 (p3)
p66s0f1 (p4)

I also saw the opposite:
p65s0f0 (p1)
p65s0f1 (p2)
s2f0 (p3)
s2f1 (p4)

Since systemd doesn't have a concept of a 'port', whichever devices
get named first (they are named in parallel, race conditions), the
other devices have name collision (function 0,1 are duplicate, but on
different bus).


>> Proposal is to add
>> support for biosdevname-like names as part of systemd.  The names would be
>> created as a new environment variable ID_NET_NAME_BIOSDEVNAME.  This could
>> then be used in the udev rules scripts to replace the external biosdevname
>> handler.
>
> I don't think this makes much sense. If biosdevname had been
> acceptable, the udev naming scheme would not have been introduced in
> the first place.
>

biosdevname is going away in new version of RHEL, so we will lose the
capability to detect if two 'virtual' NICs are actually the same
physical NIC.  The naming in systemd doesn't have the capabilty of
showing relationship between physical/virtual (SR-IOV) NIC location
name.

>> At least on Dell systems, systemd generates unusable names (PCI B:D:F vs
>> Slot#) for add-in cards as our PCIe slots do not have the ACPI _SUN method,
>> but they do have a SMBIOS slot number.
>
> Wouldn't the better approach be to simply add SMBIOS support to udev
> then? I must admit I don't know what challenges that entails, but
> seems like a natural first step.
>

That could be possible. I've tried submitting a patch upstream for
kernel but hasn't been accepted yet.  So SMBIOS parsing would have to
be part of systemd.

> Cheers,
>
> Tom
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Proposal: Add Drive Enclosure/Slot mapping to systemd

2015-03-02 Thread Jordan Hargrave
It would be nice if systemd could discover and display enclosure/bay slot
mappings for drives in the system.  The /dev/disk/by-path method doesn't
quite work, for SAS drives the ID can change on hotplug.  The slot mapping
also doesn't handle PCIe SSD devices as they are bare block devices and
don't use SCSI midlayer.  Proposing to add support for something like
/dev/disk/by-enclosure/encl-XXX-slot-YYY symlink for block devices.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Proposal: Add biosdevname naming scheme to systemd

2015-03-02 Thread Jordan Hargrave
There are currently two competing naming mechanisms for network cards,
biosdevname and systemd.  Systemd currently has some limitations on naming
cards that use network partitioning or support SR-IOV.  Proposal is to add
support for biosdevname-like names as part of systemd.  The names would be
created as a new environment variable ID_NET_NAME_BIOSDEVNAME.  This could
then be used in the udev rules scripts to replace the external biosdevname
handler.

At least on Dell systems, systemd generates unusable names (PCI B:D:F vs
Slot#) for add-in cards as our PCIe slots do not have the ACPI _SUN method,
but they do have a SMBIOS slot number.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Proposal: Add Drive Enclosure/Slot mapping to systemd

2015-03-02 Thread Jordan Hargrave
On Mon, Mar 2, 2015 at 10:24 AM, Andrei Borzenkov arvidj...@gmail.com
wrote:

 В Mon, 2 Mar 2015 09:48:51 -0600
 Jordan Hargrave jhar...@gmail.com пишет:

  It would be nice if systemd could discover and display enclosure/bay slot
  mappings for drives in the system.  The /dev/disk/by-path method doesn't
  quite work, for SAS drives the ID can change on hotplug.  The slot
 mapping
  also doesn't handle PCIe SSD devices as they are bare block devices and
  don't use SCSI midlayer.  Proposing to add support for something like
  /dev/disk/by-enclosure/encl-XXX-slot-YYY symlink for block devices.

 How it should be discovered? Is there universal method that can be used
 on majority of systems?


For Dell system PCIE SSD it requires using ipmitool OEM command to get the
correct Bus:Device:Function to Enclosure:Slot mapping.  For devices using
SAS controllers it's a bit more tricky.  There is a sysfs variable for
enclosure/slot in sysfs for SAS devices with 8 drives on our servers.  The
driver for some reason doesn't export this info for Dell systems with  8
drives.

Ideally there should be an enclosure.c device defined for every device
type.  Currently only ses.c seems to use this, but SCSI enclosure services
aren't available on SSDs.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel