Re: [systemd-devel] Questions about systemd's "root storage daemon" concept

2021-01-25 Thread Martin Wilck
On Mon, 2021-01-25 at 18:33 +0100, Lennart Poettering wrote:
> 
> Consider using IgnoreOnIsolate=.
> 

I fail to make this work. Installed this to the initrd (note the
ExecStop "command"):

[Unit]
Description=NVMe Event Monitor for Automatical Subsystem Connection
Documentation=man:nvme-monitor(1)
DefaultDependencies=false
Conflicts=shutdown.target
Requires=systemd-udevd-kernel.socket
After=systemd-udevd-kernel.socket
Before=sysinit.target systemd-udev-trigger.service 
nvmefc-boot-connections.service
RequiresMountsFor=/sys
IgnoreOnIsolate=true

[Service]
Type=simple
ExecStart=/usr/sbin/nvme monitor $NVME_MONITOR_OPTIONS
ExecStop=-/usr/bin/systemctl show -p IgnoreOnIsolate %N
KillMode=mixed

[Install]
WantedBy=sysinit.target

I verified (in a pre-pivot shell) that systemd had seen the
IgnoreOnIsolate property. But when initrd-switch-root.target is
isolated, the unit is cleanly stopped nonethless.

[  192.832127] dolin systemd[1]: initrd-switch-root.target: Trying to enqueue 
job initrd-switch-root.target/start/isolate
[  192.836697] dolin systemd[1]: nvme-monitor.service: Installed new job 
nvme-monitor.service/stop as 98
[  193.027182] dolin systemctl[3751]: IgnoreOnIsolate=yes
[  193.029124] dolin systemd[1]: nvme-monitor.service: Changed running -> 
stop-sigterm
[  193.029353] dolin nvme[768]: monitor_main_loop: monitor: exit signal received
[  193.029535] dolin systemd[1]: Stopping NVMe Event Monitor for Automatical 
Subsystem Connection...
[  193.065746] dolin systemd[1]: Child 768 (nvme) died (code=exited, 
status=0/SUCCESS)
[  193.065905] dolin systemd[1]: nvme-monitor.service: Child 768 belongs to 
nvme-monitor.service
[  193.066073] dolin systemd[1]: nvme-monitor.service: Main process exited, 
code=exited, status=0/SUCCESS
[  193.066241] dolin systemd[1]: nvme-monitor.service: Changed stop-sigterm -> 
dead
[  193.066403] dolin systemd[1]: nvme-monitor.service: Job 
nvme-monitor.service/stop finished, result=done
[  193.066571] dolin systemd[1]: Stopped NVMe Event Monitor for Automatical 
Subsystem Connection.
[  193.500010] dolin systemd[1]: initrd-switch-root.target: Job 
initrd-switch-root.target/start finished, result=done
[  193.500188] dolin systemd[1]: Reached target Switch Root.

After boot, the service actually remains running when isolating e.g. 
"rescue.target". But when switching root,
it doesn't work.

dolin:~/:[141]# systemctl show -p IgnoreOnIsolate nvme-monitor.service
IgnoreOnIsolate=yes

Tested only with systemd-234 so far. Any ideas what I'm getting wrong?

Martin


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Questions about systemd's "root storage daemon" concept

2021-01-25 Thread Martin Wilck
On Mon, 2021-01-25 at 18:33 +0100, Lennart Poettering wrote:
> On Sa, 23.01.21 02:44, Martin Wilck (mwi...@suse.com) wrote:
> 
> > Hi
> > 
> > I'm experimenting with systemd's root storage daemon concept
> > (https://systemd.io/ROOT_STORAGE_DAEMONS/).
> > 
> > I'm starting my daemon from a service unit in the initrd, and
> > I set argv[0][0] to '@', as suggested in the text.
> > 
> > So far so good, the daemon isn't killed. 
> > 
> > But a lot more is necessary to make this actually *work*. Here's a
> > list
> > of issues I found, and what ideas I've had so far how to deal with
> > them. I'd appreciate some guidance.
> > 
> > 1) Even if a daemon is exempted from being killed by killall(), the
> > unit it belongs to will be stopped when initrd-switch-root.target
> > is
> > isolated, and that will normally cause the daemon to be stopped,
> > too.
> > AFAICS, the only way to ensure the daemon is not killed is by
> > setting
> > "KillMode=none" in the unit file. Right? Any other mode would send
> > SIGKILL sooner or later even if my daemon was smart enough to
> > ignore
> > SIGTERM when running in the intird.
> 
> Consider using IgnoreOnIsolate=.

Ah, thanks a lot. IIUC that would actually make systemd realize that
the unit continues to run after switching root, which is good.

Like I remarked for KillMode=none, IgnoreOnIsolate=true would be
suitable only for the "root storage daemon" instance, not for a
possible other instance serving data volumes only.
I suppose there's no way to make this directive conditional on being
run from the initrd, so I'd need two different unit files,
or use a drop-in in the initrd.

Is there any way for the daemon to get notified if root is switched?

> 
> > 3) The daemon that has been started in the initrd's root file
> > system
> > is unable to access e.g. the /dev file system after switching
> > root. I haven't yet systematically analyzed which file systems are
> > available.   I suppose this must be handled by creating bind
> > mounts,
> > but I need guidance how to do this. Or would it be
> > possible/advisable for the daemon to also re-execute itself under
> > the real root, like systemd itself? I thought the root storage
> > daemon idea was developed to prevent exactly that.
> 
> Not sure why it wouldn't be able to access /dev after switching. We
> do
> not allocate any new instance of that, it's always the same devtmpfs
> instance.

I haven't digged deeper yet, I just saw "No such file or directory"
error messages trying to access device nodes that I knew existed, so I
concluded there were issues with /dev.

> Do not reexec onto the host fs, that's really not how this should be
> done.

Would there be a potential security issue because the daemon keeps a
reference to the intird root FS?

> 
> > 4) Most daemons that might qualify as "root storage daemon" also
> > have
> > a "normal" mode, when the storage they serve is _not_ used as root
> > FS,
> > just for data storage. In that case, it's probably preferrable to
> > run
> > them from inside the root FS rather than as root storage daemon.
> > That
> > has various advantages, e.g. the possibility to update the sofware
> > without rebooting. It's not clear to me yet how to handle the two
> > options (root and non-root) cleanly with unit files.
> 
> option one: have two unit files? i.e. two instances of the subsystem,
> one managing the root storage, and one the rest.

Hm, that looks clumsy to me. It could be done e.g. for multipath by
using separate configuration files and setting up appropriate
blacklists, but it would cause a lot of work to be done twice. e.g.
uevents would be received by both daemons and acted upon
simultaneously. Generally ruling out race conditions wouldn't be easy.

Imagine two parallel instances of systemd-udevd (IMO there are reasons
to handle it like a "root storage daemon" in some distant future).

> option two: if you cannot have multiple instances of your subsystem,
> then the only option is to make the initrd version manage
> everything. But of course, that sucks, but there's little one can do
> about that.

Why would it be so bad? I would actually prefer a single instance for
most subsystems. But maybe I'm missing something.

Thanks,
Martin

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Questions about systemd's "root storage daemon" concept

2021-01-25 Thread Lennart Poettering
On Sa, 23.01.21 02:44, Martin Wilck (mwi...@suse.com) wrote:

> Hi
>
> I'm experimenting with systemd's root storage daemon concept
> (https://systemd.io/ROOT_STORAGE_DAEMONS/).
>
> I'm starting my daemon from a service unit in the initrd, and
> I set argv[0][0] to '@', as suggested in the text.
>
> So far so good, the daemon isn't killed. 
>
> But a lot more is necessary to make this actually *work*. Here's a list
> of issues I found, and what ideas I've had so far how to deal with
> them. I'd appreciate some guidance.
>
> 1) Even if a daemon is exempted from being killed by killall(), the
> unit it belongs to will be stopped when initrd-switch-root.target is
> isolated, and that will normally cause the daemon to be stopped, too.
> AFAICS, the only way to ensure the daemon is not killed is by setting
> "KillMode=none" in the unit file. Right? Any other mode would send
> SIGKILL sooner or later even if my daemon was smart enough to ignore
> SIGTERM when running in the intird.

Consider using IgnoreOnIsolate=.

> 3) The daemon that has been started in the initrd's root file system
> is unable to access e.g. the /dev file system after switching
> root. I haven't yet systematically analyzed which file systems are
> available.   I suppose this must be handled by creating bind mounts,
> but I need guidance how to do this. Or would it be
> possible/advisable for the daemon to also re-execute itself under
> the real root, like systemd itself? I thought the root storage
> daemon idea was developed to prevent exactly that.

Not sure why it wouldn't be able to access /dev after switching. We do
not allocate any new instance of that, it's always the same devtmpfs
instance.

Do not reexec onto the host fs, that's really not how this should be
done.

> 4) Most daemons that might qualify as "root storage daemon" also have
> a "normal" mode, when the storage they serve is _not_ used as root FS,
> just for data storage. In that case, it's probably preferrable to run
> them from inside the root FS rather than as root storage daemon. That
> has various advantages, e.g. the possibility to update the sofware
> without rebooting. It's not clear to me yet how to handle the two
> options (root and non-root) cleanly with unit files.

option one: have two unit files? i.e. two instances of the subsystem,
one managing the root storage, and one the rest.

option two: if you cannot have multiple instances of your subsystem,
then the only option is to make the initrd version manage
everything. But of course, that sucks, but there's little one can do
about that.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Why systemd-nspawn is slower than docker, podman and qemu?! how to Improve nspawn performance?

2021-01-25 Thread Greg KH
On Mon, Jan 25, 2021 at 11:56:09AM +0100, Badr Elmers wrote:
> Hi,
> Why nspawn is slow compared to docker podman and even qemu?!
> CPU tasks take twice of the time it takes in docker, podman or qemu
> 
> here I filled a request to improve nspawn performance which contain the
> steps and the full test result:
> https://github.com/systemd/systemd/issues/18370
> 
> Do you know why systemd-nspawn is slower? how can I improve it?

As I pointed out in the above "issue", the benchmark isn't measuring
what anyone thinks it is measuring and should not be treated as a
reliable indication of anything.

thanks,

greg k-h
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Why systemd-nspawn is slower than docker, podman and qemu?! how to Improve nspawn performance?

2021-01-25 Thread Reindl Harald
there is a difference between theoretical academic benchmarks and real 
world load - if your workload isn't affected it's pointless


Am 25.01.21 um 14:00 schrieb Badr Elmers:


  Tomasz Torcz

In fact I m just comparing containers, I have no need yet for context 
switch, but I hope to understand why nspawn is slower and if there is 
something I can do to improve it, for example disabling spectre/meltdown 
mitigations improved nspawn a lot, so I was wondering if there is 
something else I can do to make nspawn as quick as podman/docker/qemu.



  Mantas Mikulėnas

I tested with  Export SYSTEMD_SECCOMP=0
no improvement, I still get the same result
thank you,
badr

On Mon, Jan 25, 2021 at 1:40 PM Badr Elmers > wrote:


I tested with Export SYSTEMD_SECCOMP=0
no improvement, I still get the same result
thank you,
badr

On Mon, Jan 25, 2021 at 1:14 PM Mantas Mikulėnas mailto:graw...@gmail.com>> wrote:

On Mon, Jan 25, 2021, 12:56 Badr Elmers mailto:badrelm...@gmail.com>> wrote:

Hi,
Why |nspawn| is slow compared to |docker||podman| and even
|qemu|?!
CPU tasks take twice of the time it takes in docker, podman
or qemu

here I filled a request to improve nspawn performance which
contain the steps and the full test result:
https://github.com/systemd/systemd/issues/18370


Do you know why systemd-nspawn is slower? how can I improve it?

thank you



Have you tried completely *disabling* the syscall filtering and
all other seccomp-based features? Export SYSTEMD_SECCOMP=0
before running nspawn and check if it makes any difference...


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Why systemd-nspawn is slower than docker, podman and qemu?! how to Improve nspawn performance?

2021-01-25 Thread Badr Elmers
Tomasz Torcz
In fact I m just comparing containers, I have no need yet for context
switch, but I hope to understand why nspawn is slower and if there is
something I can do to improve it, for example disabling spectre/meltdown
mitigations improved nspawn a lot, so I was wondering if there is something
else I can do to make nspawn as quick as podman/docker/qemu.
Mantas Mikulėnas
I tested with  Export SYSTEMD_SECCOMP=0
no improvement, I still get the same result
thank you,
badr

On Mon, Jan 25, 2021 at 1:40 PM Badr Elmers  wrote:

> I tested with  Export SYSTEMD_SECCOMP=0
> no improvement, I still get the same result
> thank you,
> badr
>
> On Mon, Jan 25, 2021 at 1:14 PM Mantas Mikulėnas 
> wrote:
>
>> On Mon, Jan 25, 2021, 12:56 Badr Elmers  wrote:
>>
>>> Hi,
>>> Why nspawn is slow compared to docker podman and even qemu?!
>>> CPU tasks take twice of the time it takes in docker, podman or qemu
>>>
>>> here I filled a request to improve nspawn performance which contain the
>>> steps and the full test result:
>>> https://github.com/systemd/systemd/issues/18370
>>>
>>> Do you know why systemd-nspawn is slower? how can I improve it?
>>>
>>> thank you
>>>
>>>
>>>
>> Have you tried completely *disabling* the syscall filtering and all other
>> seccomp-based features? Export SYSTEMD_SECCOMP=0 before running nspawn and
>> check if it makes any difference...
>>
>
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Why systemd-nspawn is slower than docker, podman and qemu?! how to Improve nspawn performance?

2021-01-25 Thread Mantas Mikulėnas
On Mon, Jan 25, 2021, 12:56 Badr Elmers  wrote:

> Hi,
> Why nspawn is slow compared to docker podman and even qemu?!
> CPU tasks take twice of the time it takes in docker, podman or qemu
>
> here I filled a request to improve nspawn performance which contain the
> steps and the full test result:
> https://github.com/systemd/systemd/issues/18370
>
> Do you know why systemd-nspawn is slower? how can I improve it?
>
> thank you
>
>
>
Have you tried completely *disabling* the syscall filtering and all other
seccomp-based features? Export SYSTEMD_SECCOMP=0 before running nspawn and
check if it makes any difference...
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Why systemd-nspawn is slower than docker, podman and qemu?! how to Improve nspawn performance?

2021-01-25 Thread Tomasz Torcz
On Mon, Jan 25, 2021 at 11:56:09AM +0100, Badr Elmers wrote:
> Hi,
> Why nspawn is slow compared to docker podman and even qemu?!
> CPU tasks take twice of the time it takes in docker, podman or qemu
> 
> here I filled a request to improve nspawn performance which contain the
> steps and the full test result:
> https://github.com/systemd/systemd/issues/18370
> 
> Do you know why systemd-nspawn is slower? how can I improve it?

  Your benchmark measures context switch speed. Is it really important
in your workload?  I somehow doubt this is worth improving.


-- 
Tomasz Torcz“Funeral in the morning, IDE hacking
to...@pipebreaker.pl in the afternoon and evening.” - Alan Cox

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Why systemd-nspawn is slower than docker, podman and qemu?! how to Improve nspawn performance?

2021-01-25 Thread Badr Elmers
Hi,
Why nspawn is slow compared to docker podman and even qemu?!
CPU tasks take twice of the time it takes in docker, podman or qemu

here I filled a request to improve nspawn performance which contain the
steps and the full test result:
https://github.com/systemd/systemd/issues/18370

Do you know why systemd-nspawn is slower? how can I improve it?

thank you
badr
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel