Re: [systemd-devel] Better systemd naming for Azure/MANA nic

2024-04-19 Thread Haiyang Zhang
+ 
systemd-devel@lists.freedesktop.org<mailto:systemd-devel@lists.freedesktop.org>
+ dimitri.led...@surgut.co.uk<mailto:dimitri.led...@surgut.co.uk>

From: Haiyang Zhang
Sent: Tuesday, April 16, 2024 5:59 PM
To: dimitri.led...@canonical.com
Cc: Jack Aboutboul ; Sharath George John 
; Luca Boccassi ; 
Partha Sarangam ; Paul Rosswurm 

Subject: Better systemd naming for Azure/MANA nic
Importance: High

Hi Dimitri,

During the meeting a few months ago, you mentioned we cannot set 
"net.if_name=0" due to impact of other device naming... We have recently fixed 
the Physical slot number of MANA NICs, could you change the naming scheme as 
discussed last time?

Currently the domain number is part of the name (and Physical Slot + dev_port), 
e.g. enP30832s1, enP30832s1d1, enP30832s1d2... But domain number is long, and 
may not be the same on different VMs.

As discussed previously, we prefer a short name based on the VF "Physical Slot" 
+ dev_port.
For VF the Physical Slot starts from 1, and increment by 1 for each additional 
VF device. (PF nic doesn't have the Physical Slot number, so you can continue 
to use "0" there).
The dev_port starts from 0, and increment by 1 for each additional dev_port 
(NIC).

Here is the logic we hope to have in the systemd: If a NIC's driver is "mana", 
use this naming scheme:
d

//During the meeting, we briefly talked about the prefix can be "enm" 
(enthernet, mana), so the names of two VF devices with 3 dev_ports (NICs) each, 
can be:

enm1  // omits the dev_port number if it's 0.
enm1d1
enm1d2

enm2
enm2d1
enm2d2



Here is the Physical Slot, dev_port info from a running VM:
root@lisa--500-e0-n1:/sys/class/net# lspci -v -s7870:00:00.0
7870:00:00.0 Ethernet controller: Microsoft Corporation Device 00ba
Subsystem: Microsoft Corporation Device 00b9
Physical Slot: 1
Flags: bus master, fast devsel, latency 0, NUMA node 0
Memory at fc200 (64-bit, prefetchable) [size=32M]
Memory at fc400 (64-bit, prefetchable) [size=32K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=1024 Masked-
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: mana

root@lisa--500-e0-n1:/sys/class/net# grep "" en*/dev_port
enP30832s1/dev_port:0
enP30832s1d1/dev_port:1
enP30832s1d2/dev_port:2

Thanks,

  *   Haiyang



Re: [systemd-devel] Better network naming on Hyper-V/Azure?

2020-01-16 Thread Haiyang Zhang



> -Original Message-
> From: Lennart Poettering 
> Sent: Monday, January 13, 2020 4:18 AM
> To: Haiyang Zhang 
> Cc: Stephen Hemminger ; systemd-
> de...@lists.freedesktop.org; Michael Kelley 
> Subject: Re: [systemd-devel] Better network naming on Hyper-V/Azure?
> 
> On Fr, 10.01.20 16:17, Haiyang Zhang (haiya...@microsoft.com) wrote:
> 
> > > My guess is that this is a lot like SR-IOV slot number that we can
> > > already use to name interfaces, right? If so, supporting things the
> > > same way sounds totally OK.
> >
> > Thanks for your explanation. We do want to use the ethN format, and want
> the
> > results to be the same between Async and sync probing
> 
> Then deal with it in the kernel. Allocating from the same ethN
> namespace is always going to be racy if both kernel and userspace do
> it.
> 
> That's why the names userspace generally picks for stable Ethernet
> interfaces start with "en" followed by some stable suffix of a kind,
> under the assumption the kernel will not allocate from that namespace.
> 
> > @Stephen Hemminger Since systemd needs to avoid stepping into the kernel
> > ethN formatting, should we do the synthetic NIC naming inside kernel (netvsc
> > driver)?
> 
> If you have any other driver register network interfaces on your
> kernel than your whole enumeration will go wrong though. If you
> tightly control which drivers exist in your environment you might get
> away with taking ownership of the ethN namespace entirely from your
> own driver and manage it fully.

Thanks for your suggestions!
So my implementation will keep the naming in kernel driver (netvsc). 

1) The netvsc's probe_type will be set to PROBE_DEFAULT_STRATEGY, 
so user can either continue with the current sync-probing by default, 
or use module/kernel cmdline option to enable Async-probing if other 
devices, such as DDA or SRIOV/VF NICs are configured to be named 
in different space (enP*, etc.) by systemd.

2) If Async-probing option is in use, netvsc driver will use the dev_num 
based on VMBus offer sequence. It will be the smallest available ethN 
format, which is the same result as the current sync-probing result.

3) My proposal is that Async probing has the same naming as sync 
probing. In case of hot add/remove, the names may be reused. The 
names may change after hot add/remove then reboot once. But the 
names will be stable in further reboots. It is the same behavior as 
current code (sync probing).

Thanks,
- Haiyang


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Better network naming on Hyper-V/Azure?

2020-01-10 Thread Haiyang Zhang



> -Original Message-
> From: Lennart Poettering 
> Sent: Friday, January 10, 2020 10:55 AM
> To: Haiyang Zhang 
> Cc: Stephen Hemminger ; systemd-
> de...@lists.freedesktop.org; Michael Kelley 
> Subject: Re: [systemd-devel] Better network naming on Hyper-V/Azure?
> 
> On Fr, 10.01.20 15:37, Haiyang Zhang (haiya...@microsoft.com) wrote:
> 
> > > Hyper-V offers netvsc devices (synthetic NICs) in the same sequence across
> > > reboots, so eth0 ... ethN names will associate to the same vNIC every time
> > > with Sync-probing currently.
> > >
> > > But if in the future, we enable Async-probing, the naming may not
> persistent
> > > across reboots. In my patch set (not yet upstream), I added a new 
> > > attribute
> > > (dev_num) in sysfs to keep track of the device channel offer sequence. So
> user
> > > mode program can have the option to use this attribute to name NICs, and
> > > generates the same results for Async-probing as Sync-probing does.
> >
> > Lennart and other systemd developers:
> >
> > Could you also comment on my proposal above? It's to keep the naming
> results
> > of Async-probing same as that of sync-probing.
> 
> I am not sure I follow fully, but if you intend to assign an index to
> each interface that the VM supervisor sets and that we should use to
> name the interface, then that sounds great to me.
> 
> However do note that we generally avoid stepping into the naming
> namespace of the kernel. i.e. if your intention to stabilize eth0,
> eth1, eth2 with that, we can't help you, that's generally racy since
> the kernel allocates other interfaces from that namespace too.
> 
> My guess is that this is a lot like SR-IOV slot number that we can
> already use to name interfaces, right? If so, supporting things the
> same way sounds totally OK.

Thanks for your explanation. We do want to use the ethN format, and want the 
results to be the same between Async and sync probing.

@Stephen Hemminger Since systemd needs to avoid stepping into the kernel 
ethN formatting, should we do the synthetic NIC naming inside kernel (netvsc 
driver)? 

In case of conflicting with other names, like DDA, or VF NICs, it will fall 
back to 
the first available ethN name. I know It's racy, but not worse than current 
situation -- even with sync-probing, the name may still be racing with DDA, or 
VF NICs. And this is already solvable by systemd which uses PCI slot naming, and
puts them into different naming formats.

Thanks,
- Haiyang

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Better network naming on Hyper-V/Azure?

2020-01-10 Thread Haiyang Zhang



> -Original Message-
> From: Haiyang Zhang
> Sent: Tuesday, January 7, 2020 11:01 AM
> To: Lennart Poettering ; Stephen Hemminger
> 
> Cc: systemd-devel@lists.freedesktop.org; Michael Kelley
> 
> Subject: RE: [systemd-devel] Better network naming on Hyper-V/Azure?

> Hyper-V offers netvsc devices (synthetic NICs) in the same sequence across
> reboots, so eth0 ... ethN names will associate to the same vNIC every time
> with Sync-probing currently.
> 
> But if in the future, we enable Async-probing, the naming may not persistent
> across reboots. In my patch set (not yet upstream), I added a new attribute
> (dev_num) in sysfs to keep track of the device channel offer sequence. So user
> mode program can have the option to use this attribute to name NICs, and
> generates the same results for Async-probing as Sync-probing does.

Lennart and other systemd developers:

Could you also comment on my proposal above? It's to keep the naming results
of Async-probing same as that of sync-probing.

Thanks,
- Haiyang

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Better network naming on Hyper-V/Azure?

2020-01-07 Thread Haiyang Zhang
(resending after subscribed to systemd-devel)

> -Original Message-
> From: Lennart Poettering 
> Sent: Tuesday, January 7, 2020 9:01 AM
> To: Stephen Hemminger 
> Cc: systemd-devel@lists.freedesktop.org; Haiyang Zhang
> 
> Subject: Re: [systemd-devel] Better network naming on Hyper-V/Azure?
> 
> On Mo, 06.01.20 15:36, Stephen Hemminger (step...@networkplumber.org)
> wrote:
> 
> > About a year ago there was some discussion on having persistent network
> names
> > on Hyper-V/Azure. Haiyang did some patches to add an attribute which
> > could be used by udev to do this. But there are some reluctance because
> > of how the channel id works.
> >
> > The motivation to provide network naming is to allow vmbus to change to
> parallel probing.
> > Right now probing is serialized so naming is always in same order.
> >
> > My question is what exactly does systemd/udev need to provide persistent
> > naming. The obvious ones are:
> >   1. Must be unique (although PCI slot isn't)
> 
> It's not unique per bus? huh?
> 
> >   2. Must be persistent across reboot.
> >   3. Must be stable if device is removed.
> 
> Yes, these three are the idea.
> 
> > There are more questions.
> >   1. Is there a particular ordering and non-reuse requirement.
> >  Obviously, names have to be 15 characters or less but what
> >   else.
> 
> No ordering or non-reuse requirements are made. I mean, the device
> path names are in particular defined so that they stable even if you
> replace your PCI network card by a different one, hence in that case
> absolutely are reused by a different device, and that's intentional.
> 
> Yeah must fit in IFNAMSIZ. And probably shouldn't include "/", control
> chars, whitespace and some other weird chars that apps don't like. But
> that's the same for all network interfaces, whether managed by udev
> predictable naming or not...
> 
> >   2. How to handle the device associated with Accelerated Networking?
> >  Do you want to hide or rename the VF that is associated with the
> >  virtual device?
> 
> I have no idea what that means, what is Accelerated Networking?

On Azure, "Accelerated Networking" means SRIOV / VF NICs.

> 
> > There are a couple of other quirks:
> >   1. The current cloudinit and other startup applications require eth0 as
> >  the administrative and always there interface, hard wired into the
> >  code. How to handle that?
> 
> if you have multiple devices and want a specific one to be named eth0
> then this is inhrently racy since we can't sensibly rename the device
> like that in userspace because we'd always race against the kernel's
> own naming regime.
> 
> Naming an interface "eth0" only really works if you have only one
> interface or if you don't care about the names at all. If you have
> multiple then pick different names outside of the ethX namespace the
> kernel allocates from.
> 
> In the case where you only have a single interface or don't care about
> the name then drop in a .link file that matches the interface
> generically and sets NamePolicy=kernel so that the kernel name is used
> as it is.
> 
> >   2. Hyper-V has the ability for host administrator to assign a name, but
> >  it is more of a free form string so it is used as default
> >  network description.
> 
> Current systemd git has support for assigning "alternative" ifnames to
> devices, using that new kernel feature. On kernels that support that
> we'll initialize the alternative ifnames to all names we could
> possibly come up with (i.e. so that an interface always be be referred
> to by its by-path, by-slot, by-mac name equally). Since the
> alternative ifnames are not IFNAMSIZ long (but 128 chars long) maybe
> they are suitable to use for these hyperv "free form string" if that
> makes sense given the charset restrictions.
> 
> >   3. Azure has names as part of the CLI for manipulating VM's but these
> >  are not currently exposed to guest. If this could happen would it help 
> > or
> >  hurt.
> 
> I mean, we are happy to make use of any names that make sense. Not
> sure why hyperv needs three different symbolic names for each
> interface, but if it is how it is, then we can toally expose them all
> ;-).

Hyper-V offers netvsc devices (synthetic NICs) in the same sequence across 
reboots, so eth0 ... ethN names will associate to the same vNIC every time 
with Sync-probing currently. 

But if in the future, we enable Async-probing, the naming may not persistent 
across reboots. In my patch set (not yet upstream), I added a new attribute 
(dev_num) in sysfs to keep track of the device channel offer sequence. So user 
mode program can have the option to use this attribute to name NICs, and  
generates the same results for Async-probing as Sync-probing does.

Thanks,
- Haiyang

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel