Re: Why NM seems to behave differently in initrd from in real root?

2021-10-15 Thread Coiby Xu via networkmanager-list

Hi Thomas,

On Thu, Oct 14, 2021 at 05:47:37PM +0200, Thomas Haller wrote:

On Thu, 2021-10-07 at 22:12 +0800, Coiby Xu via networkmanager-list
wrote:

Hi NM developers,

This is Coiby from the Red Hat Kernel Debug team who is responsible
for
Fedora/RHEL's kexec-tools. Currently, kexec-tools parses ifcfg-* or
.nmconnection to build up dracut cmdline parameter like ip= to set
up kdump initrd network which is tedious and error-prone. Recently,
I'm
implementing a different approach which is to set up kdump initrd
network
by copying connection profiles from real root to initrd directly.
However,
one unexpected thing is NM seems to behave differently in initrd from
in
real root and the same connection profiles copied from the real root
lead
to different result in kdump initrd. So is there a general reason why
NM
behaves differently in initrd and real root? Is it a better approach
that
kexec-tools sets up kdump initrd network by copying connection
profiles
from real root to kdump initrd? It will be appreciated if NM
developers
could provide answers or comments on these questions since you are
experts
on this type of problems.


NetworkManager should behave very similar in real-root and initrd.
Which probably is the point of NetworkManager in initrd in the first
place: to do the same everywhere.

The points you brought up, are special cases and configuration issues
Or even missing features, and we can find ways to make those usecases
work better. I replied to the rhbz and upstream issue, if that helps.


Copying connection profiles seems like a good idea. 


Thanks for confirming it's a good idea and finding ways to help me 
implement it!



But the real
problem is that you are writing a non-interactive tool, which is
confronted with some profiles on disk, and then automatically needs to
do the right thing. That is not possible in all cases, and that is
despite we even have an API to more conveniently parse the profile
files.

For example, if there are two profiles on disk that are both set to
autoconnect on the same device, then a generic, non-interactive tool
cannot understand which one to prefer or what even to do about that.
That is regardless whether you copy profiles or whether you parse and
syntesize new ones. The real solution is: the user must have
configuration that works for your tool first place.


Yes, I could understand this example. However it doesn't explain why 
a different connection profile is activated in initrd. But thanks to 
your reply [4] for [2], now it makes sense to me. This is because NM
would select a candidate based on timestamp in 
/var/lib/NetworkManager/timestamps. So connection profile alone doesn't

completely determine NM's behaviour.

[4] https://bugzilla.redhat.com/show_bug.cgi?id=2007563#c6





For the details of how NM behaves differently in kdump initrd, I've
reported some of the inconsistent behaviours as bugs [1] [2].
connection.wait-device-timeout=6000 and connection.autoconnect=false
could be used to bypass [1] and [2] respectively so the same
connections
could be brought up in initrd.


replied to both. Hope that helps. Let's discuss there.



 A third issue for which I haven't found a
workaround is the case of bridging network over VLAN network over
teaming
network where I create a teaming network interface which is used as
the
parent interface of a VLAN interface which is in turn a slave
interface
of network bridge. The problem is the network bridge sometimes gets
the
IP address belonging the VLAN subnet but sometimes not. Btw, the
third
issue is found on a physical machine and can't be reproduced on a VM.


Hard to say. Seems like a bug? Please report is, so we can track it and
discuss in detail.


Sure, thanks!





I've tested the modified kexec-tools [3] by setting up different
networks
including the aforementioned bridging network over VLAN network over
teaming network. Other tests including bridging network over physical
interface/bonding network/teaming network/VLAN network, VLAN network
over physical interface/bonding network/teaming work and etc. All
tests
have passed for VM. And except for the bridging network over VLAN
network
over teaming network, the tests have also passed for one physical
machine.


That sounds promising.


But I'm not sure if they are sufficient considering there is
machine-specific issue like znet network device. Any suggestion is
welcome.


there are countless of combinations. It's great you invest time in
considerable amount of testing!

znet seems difficult. I comment about that on [2].


Thanks for encouraging me to do more testings. Do you know if there are
other machine-specific issues like znet that I should pay attention to?









[1]
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/803
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2007563
[3]
https://src.fedoraproject.org/fork/coiby/rpms/kexec-tools/commits/direct_nm




best,
Thomas




--
Best regards,
Coiby


Re: Why NM seems to behave differently in initrd from in real root?

2021-10-14 Thread Thomas Haller via networkmanager-list
On Thu, 2021-10-07 at 22:12 +0800, Coiby Xu via networkmanager-list
wrote:
> Hi NM developers,
> 
> This is Coiby from the Red Hat Kernel Debug team who is responsible
> for 
> Fedora/RHEL's kexec-tools. Currently, kexec-tools parses ifcfg-* or 
> .nmconnection to build up dracut cmdline parameter like ip= to set 
> up kdump initrd network which is tedious and error-prone. Recently,
> I'm 
> implementing a different approach which is to set up kdump initrd
> network 
> by copying connection profiles from real root to initrd directly.
> However, 
> one unexpected thing is NM seems to behave differently in initrd from
> in 
> real root and the same connection profiles copied from the real root
> lead 
> to different result in kdump initrd. So is there a general reason why
> NM 
> behaves differently in initrd and real root? Is it a better approach
> that 
> kexec-tools sets up kdump initrd network by copying connection
> profiles 
> from real root to kdump initrd? It will be appreciated if NM
> developers 
> could provide answers or comments on these questions since you are
> experts 
> on this type of problems.

NetworkManager should behave very similar in real-root and initrd.
Which probably is the point of NetworkManager in initrd in the first
place: to do the same everywhere.

The points you brought up, are special cases and configuration issues
Or even missing features, and we can find ways to make those usecases
work better. I replied to the rhbz and upstream issue, if that helps.


Copying connection profiles seems like a good idea. But the real
problem is that you are writing a non-interactive tool, which is
confronted with some profiles on disk, and then automatically needs to
do the right thing. That is not possible in all cases, and that is
despite we even have an API to more conveniently parse the profile
files. 

For example, if there are two profiles on disk that are both set to
autoconnect on the same device, then a generic, non-interactive tool
cannot understand which one to prefer or what even to do about that.
That is regardless whether you copy profiles or whether you parse and
syntesize new ones. The real solution is: the user must have
configuration that works for your tool first place.

> 
> For the details of how NM behaves differently in kdump initrd, I've 
> reported some of the inconsistent behaviours as bugs [1] [2]. 
> connection.wait-device-timeout=6000 and connection.autoconnect=false
> could be used to bypass [1] and [2] respectively so the same
> connections 
> could be brought up in initrd.

replied to both. Hope that helps. Let's discuss there.


>  A third issue for which I haven't found a 
> workaround is the case of bridging network over VLAN network over
> teaming 
> network where I create a teaming network interface which is used as
> the 
> parent interface of a VLAN interface which is in turn a slave
> interface 
> of network bridge. The problem is the network bridge sometimes gets
> the 
> IP address belonging the VLAN subnet but sometimes not. Btw, the
> third
> issue is found on a physical machine and can't be reproduced on a VM.

Hard to say. Seems like a bug? Please report is, so we can track it and
discuss in detail.

> 
> I've tested the modified kexec-tools [3] by setting up different
> networks 
> including the aforementioned bridging network over VLAN network over 
> teaming network. Other tests including bridging network over physical
> interface/bonding network/teaming network/VLAN network, VLAN network 
> over physical interface/bonding network/teaming work and etc. All
> tests
> have passed for VM. And except for the bridging network over VLAN
> network 
> over teaming network, the tests have also passed for one physical
> machine. 

That sounds promising.

> But I'm not sure if they are sufficient considering there is 
> machine-specific issue like znet network device. Any suggestion is
> welcome.

there are countless of combinations. It's great you invest time in
considerable amount of testing!

znet seems difficult. I comment about that on [2].

> 


> [1]
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/803
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2007563
> [3]
> https://src.fedoraproject.org/fork/coiby/rpms/kexec-tools/commits/direct_nm
> 
> 

best,
Thomas


___
networkmanager-list mailing list
networkmanager-list@gnome.org
https://mail.gnome.org/mailman/listinfo/networkmanager-list