Re: [systemd-devel] soft-reboot and surviving it

2024-04-19 Thread Cristian Rodríguez
On Fri, Apr 19, 2024 at 6:17 AM Thorsten Kukuk  wrote:
>
> On Fri, Apr 19, 2024 at 11:48 AM Luca Boccassi  
> wrote:
>
> > However, logging should work out of the box as long as the journal is
> > used, what problem are you seeing exactly?
>
> Starting around the shutdown and new start of systemd-journald during
> soft-reboot, all writes to stderr will result in an error (or create
> SIGPIPE if this is not disabled).

ugh. That's bad.. Did you identify what component does not set MSG_NOSIGNAL ?
afaik nothing should ever be closing a running program's stderr and
anything that happens after is undefined .
Posix says one may only close it after the last atexit() handler is run.


Re: [systemd-devel] Better systemd naming for Azure/MANA nic

2024-04-19 Thread Haiyang Zhang
+ 
systemd-devel@lists.freedesktop.org
+ dimitri.led...@surgut.co.uk

From: Haiyang Zhang
Sent: Tuesday, April 16, 2024 5:59 PM
To: dimitri.led...@canonical.com
Cc: Jack Aboutboul ; Sharath George John 
; Luca Boccassi ; 
Partha Sarangam ; Paul Rosswurm 

Subject: Better systemd naming for Azure/MANA nic
Importance: High

Hi Dimitri,

During the meeting a few months ago, you mentioned we cannot set 
"net.if_name=0" due to impact of other device naming... We have recently fixed 
the Physical slot number of MANA NICs, could you change the naming scheme as 
discussed last time?

Currently the domain number is part of the name (and Physical Slot + dev_port), 
e.g. enP30832s1, enP30832s1d1, enP30832s1d2... But domain number is long, and 
may not be the same on different VMs.

As discussed previously, we prefer a short name based on the VF "Physical Slot" 
+ dev_port.
For VF the Physical Slot starts from 1, and increment by 1 for each additional 
VF device. (PF nic doesn't have the Physical Slot number, so you can continue 
to use "0" there).
The dev_port starts from 0, and increment by 1 for each additional dev_port 
(NIC).

Here is the logic we hope to have in the systemd: If a NIC's driver is "mana", 
use this naming scheme:
d

//During the meeting, we briefly talked about the prefix can be "enm" 
(enthernet, mana), so the names of two VF devices with 3 dev_ports (NICs) each, 
can be:

enm1  // omits the dev_port number if it's 0.
enm1d1
enm1d2

enm2
enm2d1
enm2d2



Here is the Physical Slot, dev_port info from a running VM:
root@lisa--500-e0-n1:/sys/class/net# lspci -v -s7870:00:00.0
7870:00:00.0 Ethernet controller: Microsoft Corporation Device 00ba
Subsystem: Microsoft Corporation Device 00b9
Physical Slot: 1
Flags: bus master, fast devsel, latency 0, NUMA node 0
Memory at fc200 (64-bit, prefetchable) [size=32M]
Memory at fc400 (64-bit, prefetchable) [size=32K]
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=1024 Masked-
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: mana

root@lisa--500-e0-n1:/sys/class/net# grep "" en*/dev_port
enP30832s1/dev_port:0
enP30832s1d1/dev_port:1
enP30832s1d2/dev_port:2

Thanks,

  *   Haiyang



Re: [systemd-devel] soft-reboot and surviving it

2024-04-19 Thread Thorsten Kukuk
On Fri, Apr 19, 2024 at 11:48 AM Luca Boccassi  wrote:
> On Fri, 19 Apr 2024 at 10:30, Thorsten Kukuk  wrote:

> > And now I started looking into how services can survive the
> > soft-reboot. I know the FOSDEM talk from Luca about this topic, but I
> > don't like to move the application into another image, as this would
> > only move the update problem to a different level and not solve it. So
> > I'm currently playing with it to find out if there isn't a better
> > option, especially with btrfs.

> It really needs to be a separate filesystem from a separate image, any
> ties back to the host OS and the service will be hopefully correctly
> stopped, or worse it will not be detected and it will leak the old
> filesystem, which means you'll silently leak memory, mounts, etc. I
> would strongly recommend to avoid fighting against this, and instead
> spend time solving the root cause.

I agree that you have a problem if you use something like a partition
A/B setup, where in worst case, you start from A, soft-reboot to B,
updates A and latest in this moment you will overwrite the code of the
running application.
I'm not sure if this is really a problem with btrfs and subvolumes, as
they stay mounted anyways in our setup.
I hope I find the time to discuss this next week with our btrfs developers.

> The best solution really is to figure out why there's a executable
> from the host OS permanently running in the podman container cgroup
> (what does it do, why it is necessary, why does it need to always run,
> etc), and try to refactor that away. Make it started on demand for
> example.

That's conmon, no idea why it needs to continue to run. But this would
only solve the problem for podman, I have several other use cases in
mind where containers are not involved.

  Thanks,
Thorsten

-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461
Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich (HRB
36809, AG Nürnberg)


Re: [systemd-devel] soft-reboot and surviving it

2024-04-19 Thread Thorsten Kukuk
On Fri, Apr 19, 2024 at 11:48 AM Luca Boccassi  wrote:

> However, logging should work out of the box as long as the journal is
> used, what problem are you seeing exactly?

Starting around the shutdown and new start of systemd-journald during
soft-reboot, all writes to stderr will result in an error (or create
SIGPIPE if this is not disabled).
I'm using systemd 255.4 from openSUSE Tumbleweed currently and the
service file is based on the example from the manual page, except that
I use "notify" and start my own applications and not sleep. I see this
with applications written in C and in golang.
And not only in the short time frame between shutdown and start, but
also long afterwards.

  Thorsten

-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461
Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich (HRB
36809, AG Nürnberg)


Re: [systemd-devel] soft-reboot and surviving it

2024-04-19 Thread Luca Boccassi
On Fri, 19 Apr 2024 at 10:30, Thorsten Kukuk  wrote:
>
> Hi,
>
> we finished the integration of soft-reboot into openSUSE Tumbleweed
> and MicroOS (transactional-update), and the major problems except
> firewalld+podman are solved. Now we only need to do all the "fine
> tuning".
> Is there meanwhile any reliable/official way to detect that this was a
> soft-reboot? This would be very helpful in some cases for post mortem
> analysis and support.
> I'm aware of the SoftRebootsCount property in systemd v256, so
> applications could query that and I assume if the count is >0 it was a
> soft-reboot? Couldn't test that yet.

Yes, that's the purpose of the counter, you can use it for that.

> And now I started looking into how services can survive the
> soft-reboot. I know the FOSDEM talk from Luca about this topic, but I
> don't like to move the application into another image, as this would
> only move the update problem to a different level and not solve it. So
> I'm currently playing with it to find out if there isn't a better
> option, especially with btrfs.
> Is there already some documentation somewhere, what are the
> limitations or best practices for an application for surviving a
> soft-reboot?

It really needs to be a separate filesystem from a separate image, any
ties back to the host OS and the service will be hopefully correctly
stopped, or worse it will not be detected and it will leak the old
filesystem, which means you'll silently leak memory, mounts, etc. I
would strongly recommend to avoid fighting against this, and instead
spend time solving the root cause.

The best solution really is to figure out why there's a executable
from the host OS permanently running in the podman container cgroup
(what does it do, why it is necessary, why does it need to always run,
etc), and try to refactor that away. Make it started on demand for
example.

> The main task for me currently is, to find out what such an
> application can do, what will not work, and what they should do in
> case of a reboot. I saw there is the PrepareForShutdownWithMetadata
> signal (I didn't got that working, but since it seems to work with
> busctl, the problem is most likely between chair and keyboard ;) ),
> but I'm more interested about file descriptors and pipes. Currently
> stderr will be redirected to journald, but this will of course no
> longer work after a soft-reboot. While I can adjust my application to
> use sd_journal_print() instead, errors written by libraries or
> something else to stderr will go lost or trigger SIGPIPE.. Any ideas
> on how to solve that?

The soft-reboot manpage is the best we got for now - and the
recordings of my talks might be of some help too. The main gotcha so
far is D-Bus, if you publish a service you need to be resilient
against D-Bus going away and coming back, which is never a thing
normally, so applications usually aren't coded for that, but it can be
done and the soft-reboot manpage has a self-contained example showing
how.

However, logging should work out of the box as long as the journal is
used, what problem are you seeing exactly?


[systemd-devel] soft-reboot and surviving it

2024-04-19 Thread Thorsten Kukuk
Hi,

we finished the integration of soft-reboot into openSUSE Tumbleweed
and MicroOS (transactional-update), and the major problems except
firewalld+podman are solved. Now we only need to do all the "fine
tuning".
Is there meanwhile any reliable/official way to detect that this was a
soft-reboot? This would be very helpful in some cases for post mortem
analysis and support.
I'm aware of the SoftRebootsCount property in systemd v256, so
applications could query that and I assume if the count is >0 it was a
soft-reboot? Couldn't test that yet.

And now I started looking into how services can survive the
soft-reboot. I know the FOSDEM talk from Luca about this topic, but I
don't like to move the application into another image, as this would
only move the update problem to a different level and not solve it. So
I'm currently playing with it to find out if there isn't a better
option, especially with btrfs.
Is there already some documentation somewhere, what are the
limitations or best practices for an application for surviving a
soft-reboot?

The main task for me currently is, to find out what such an
application can do, what will not work, and what they should do in
case of a reboot. I saw there is the PrepareForShutdownWithMetadata
signal (I didn't got that working, but since it seems to work with
busctl, the problem is most likely between chair and keyboard ;) ),
but I'm more interested about file descriptors and pipes. Currently
stderr will be redirected to journald, but this will of course no
longer work after a soft-reboot. While I can adjust my application to
use sd_journal_print() instead, errors written by libraries or
something else to stderr will go lost or trigger SIGPIPE.. Any ideas
on how to solve that?

Thanks,
Thorsten

-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461
Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich (HRB
36809, AG Nürnberg)