Re: [systemd-devel] "primary" Condition for drbd?

2024-06-14 Thread Harald Dunkel

On 2024-06-13 16:08:03, Lennart Poettering wrote:


Youre are assuming that people know what drbd is and does, and what
"primary" or "secondary" means in the context. I certainly have no
clue.



You are  right, sorry about that.

drbd is a scheme to mirror a block device via network, providing
a virtual block device (a "resource"). Each host providing the
resource has a local replica, holding the data and some meta
information. Usually all replicas of a resource have the same
size. It is fine to have several resources in parallel on the
involved hosts.

There are 2 modes of a drbd replica: In primary mode the host
performs all read and write operations, similar to other local
block devices. It forwards all write operations on block level
to all secondary replicas (write through).

In secondary mode a replica must not be used. It just receives
the changes from the primary and updates the local block dev
accordingly.

If the replicas of a resource are in sync and if the primary
replica is not mounted, activated or "used somehow", then the
primary replica can be set to secondary mode, and another replica
can take over, becoming the new primary for the resource. This
can be done to shut down a host, for example. On the next reboot
the replicas are synced again, even if the secondary stays
secondary.

Usually there are just 2 replicas forming a resource: 1 primary
and 1 secondary. (Newer drbd implementations support n>2. This
new mode is not included in the kernel yet.)

I would like to run several systemd services making use of the
resource only, if the local replica is in primary mode. This
could be mounting the storage, running a database in an LXC
container, and configuring a dedicated IP address, for example.
For the transition to secondary mode of the replica these
services have to be stopped in the right sequence again. Usual-
ly this is done with pacemaker, as Andrei mentioned, but it
should be possible to describe the dependencies between the
units without pacemaker. They are all local, and running
services on remote systems is not systemd's business.

Problem is, systemd provides several conditionals to check if
a block device exists, if a power supply is attached, if a
block device has been mounted, etc., but there is no condi-
tional to check if a replica stored on a real block device is
in primary mode.


Does systemd allow to run a program to evaluate a custom
condition in a unit file? Maybe I am too blind to see, but
I haven't found it mentioned on https://www.freedesktop.org/\
software/systemd/man/latest/systemd.unit.html.


There's ExecCondition=. But what would you use it for?

ExecCondition= is intended for quick checks.



Not documented on 
https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html,
as it seems.



I have no idea what this all means, but I have the suspicion you
actually want a generator, i.e. a plugin to systemd that adds some
deps depending on external configuration files.

i.e. implement this stuff:

https://www.freedesktop.org/software/systemd/man/latest/systemd.generator.html



I will check.


Regards
Harri


Re: [systemd-devel] "primary" Condition for drbd?

2024-06-13 Thread Harald Dunkel

On 2024-06-13 11:32:47, Andrei Borzenkov wrote:


What you are looking for is the pacemaker.



Not exactly. I am looking for pacemaker-like functionality
provided by systemd. A manual fail-over is fine with me, for
example.

I tried pacemaker before (years ago), but I wasn't happy with
it.

Regards

Harri


Re: [systemd-devel] "primary" Condition for drbd?

2024-06-13 Thread Harald Dunkel

I missed to mention, drbdadm does know:

# drbdadm role space
Primary/Secondary

meaning "space" is primary on this host. You can also look
at /proc/drbd:

# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: 19D914EA50F713FCCE48607

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:4044664 nr:0 dw:4044664 dr:726481 al:188 bm:0 lo:0 pe:0 ua:0 
ap:0 ep:1 wo:f oos:0
 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-
ns:0 nr:2228224 dw:2228224 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 
wo:f oos:0
 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:699492 nr:0 dw:699492 dr:266197 al:75 bm:0 lo:0 pe:0 ua:0 ap:0 
ep:1 wo:f oos:0

Problem is, it doesn't say "/dev/drbd1". "space" and "1" are
defined in drbd's config files, mapping it to "/dev/drbd1". To
evaluate the condition you have to perform a pretty complex
task.

Does systemd allow to run a program to evaluate a custom
condition in a unit file? Maybe I am too blind to see, but
I haven't found it mentioned on https://www.freedesktop.org/\
software/systemd/man/latest/systemd.unit.html.

The context is: I want to mount/activate /dev/drbd1, if it is
primary. (If it is not, then /dev/drbd1 should be silently
ignored.) Next I want to start the LXC containers provided by
the virtual block device, and setup networking and storage
accordingly. The same should happen for /dev/dbrd{2..n}.

Maybe this is beyond the scope of systemd.


Regards

Harri


[systemd-devel] "primary" Condition for drbd?

2024-06-11 Thread Harald Dunkel

Hi folks,

would it be possible to add a Condition to check if a drbd
resource ( a virtual block device with replication via network,
see https://linbit.com/drbd/) is primary? I checked

/sys/devices/virtual/block/drbd1

for example, but I haven't found a way to distinguish primary
from secondary block devices yet. Apparently there is no directory
element showing up only for primary mode.


Regards

Harri



[systemd-devel] show container limits?

2022-07-04 Thread Harald Dunkel

Hi folks,

systemctl status does a nice job showing LXC containers and their
process trees, but I wonder if it could show memory and cpu limits,
memory utilization, swap, etc as well, even if the LXC or docker or
whatever container wasn't started by systemd? cgroup1 and unified,
if possible.

I would like identify esp the CPU and memory hogs.


Regards
Harri


Re: [systemd-devel] eth2: Failed to rename network interface 6 from 'eth2' to 'eno1': File exists

2022-01-06 Thread Harald Dunkel

On 2022-01-06 13:23:37, Michael Biebl wrote:

Am Do., 6. Jan. 2022 um 10:00 Uhr schrieb Mantas Mikulėnas :


Grep your entire /etc for those interface names (starting with /etc/udev), find 
out where they're defined, and remove them


Please also make sure to rebuild your initramfs after doing that.
Files from /etc/udev are usually embedded in the initramfs.



The initrd has been rebuilt and eth4 and eth5 are gone.

# ls -l /sys/class/net/*
lrwxrwxrwx 1 root root 0 Jan  6 15:06 /sys/class/net/eno1 -> 
../../devices/pci:00/:00:01.1/:02:00.0/net/eno1
lrwxrwxrwx 1 root root 0 Jan  6 15:07 /sys/class/net/enp2s0f1 -> 
../../devices/pci:00/:00:01.1/:02:00.1/net/enp2s0f1
lrwxrwxrwx 1 root root 0 Jan  6 15:07 /sys/class/net/enp2s0f2 -> 
../../devices/pci:00/:00:01.1/:02:00.2/net/enp2s0f2
lrwxrwxrwx 1 root root 0 Jan  6 15:07 /sys/class/net/enp2s0f3 -> 
../../devices/pci:00/:00:01.1/:02:00.3/net/enp2s0f3
lrwxrwxrwx 1 root root 0 Jan  6 15:07 /sys/class/net/enp5s0 -> 
../../devices/pci:00/:00:1c.4/:05:00.0/net/enp5s0
lrwxrwxrwx 1 root root 0 Jan  6 15:05 /sys/class/net/eth3 -> 
../../devices/pci:00/:00:19.0/net/eth3
lrwxrwxrwx 1 root root 0 Jan  6 15:07 /sys/class/net/lo -> 
../../devices/virtual/net/lo


eno1 and eth3 are still present, i.e. the original problem (2
onboard interfaces supposed to be renamed to eno1) is still
unresolved. The workaround is to kick out "onboard" in the
NamePolicy, e.g.:


# cat /etc/systemd/network/98-ignore-onboard.link
[Match]
OriginalName=*

[Link]
NamePolicy=keep kernel database slot path
AlternativeNamesPolicy=database onboard slot path
MACAddressPolicy=persistent


(https://www.freedesktop.org/software/systemd/man/systemd.link.html)


Do you think it would be reasonable for udev to use the next option
from the name policy list instead of falling back to the "kernel"
policy, if there is a conflict?


Thank you very much for your support and your patience

Regards
Harri


Re: [systemd-devel] eth2: Failed to rename network interface 6 from 'eth2' to 'eno1': File exists

2022-01-06 Thread Harald Dunkel

On 2022-01-05 21:48:11, Michael Biebl wrote:

Am Mi., 5. Jan. 2022 um 13:50 Uhr schrieb Mantas Mikulėnas :

It does, yes, but note this part:

Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2 eth4: renamed 
from eth2
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3 eth5: renamed 
from eth3

Here the kernel-assigned names (eth2, eth3) are being renamed to custom names (eth4, eth5). That's 
not something systemd or udev does by default. It suggests that you likely have old 
"70-persistent-net" udev rules (or something similar) that assign custom eth* names 
separately from the slot-based "predictable" naming – perhaps a leftover from Debian 7.

These interfaces aren't being skipped due to an earlier conflict – they are intentionally 
skipped by 80-net-setup-link.rules because they already have a custom 'NAME=' assigned by 
an earlier rule, so the "predictable" name is not applied to avoid breaking 
existing configuration.


Yes, please check if you have a leftover file
/etc/udev/rules.d/70-persistent-net.rules
See also the relevant NEWS entry in /usr/share/doc/udev/NEWS.Debian.gz



You are right about Debian 7 wrt 70-persistent-net.rules, but for Debian
10 I had used net.ifnames=0. It was changed to 1 after(!) the upgrade to
Debian 11:


root@nasl002b:/etc/default# git diff 0b18eef098908adf5a5478c2938c0228f971494d 
0bcaa7872b54f31bfe0f1ed8cf403a7990844746 -- grub
diff --git a/default/grub b/default/grub
index c723e1f..5b5e60b 100644
--- a/default/grub
+++ b/default/grub
@@ -6,8 +6,8 @@
 GRUB_DEFAULT=0
 GRUB_TIMEOUT=5
 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
-GRUB_CMDLINE_LINUX_DEFAULT="net.ifnames=0 vsyscall=emulate quiet"
-GRUB_CMDLINE_LINUX="net.ifnames=0 vsyscall=emulate 
nfs.nfs4_unique_id=43705d58-c986-4b4b-9205-50040cbdd9c6"
+GRUB_CMDLINE_LINUX_DEFAULT="net.ifnames=1 vsyscall=emulate quiet"
+GRUB_CMDLINE_LINUX="net.ifnames=1 vsyscall=emulate 
nfs.nfs4_unique_id=43705d58-c986-4b4b-9205-50040cbdd9c6"

 # Uncomment to enable BadRAM filtering, modify to suit your needs
 # This works with Linux (no patch required) and with any kernel that obtains



Regards

Harri


Re: [systemd-devel] eth2: Failed to rename network interface 6 from 'eth2' to 'eno1': File exists

2022-01-05 Thread Harald Dunkel

On 2022-01-05 13:50:29, Mantas Mikulėnas wrote:

On Wed, Jan 5, 2022 at 9:46 AM Harald Dunkel 


AFAICS the kernel of today still assigns the "legacy" interface names,
which are renamed by udev later. I would suggest to improve conflict



It does, yes, but note this part:

Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2 eth4: renamed
from eth2
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3 eth5: renamed
from eth3




BTW:

# lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 Processor Family DRAM 
Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core 
Processor Family PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core 
Processor Family PCI Express Root Port (rev 09)
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network 
Connection (Lewisville) (rev 05)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 1 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 5 (rev b5)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI 
Express Root Port 6 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family 
USB Enhanced Host Controller #1 (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation C204 Chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 
6 port Desktop SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus 
Controller (rev 05)
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
02:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
02:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection 
(rev 01)
04:00.0 RAID bus controller: 3ware Inc 9750 SAS2/SATA-II RAID PCIe (rev 05)
05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
06:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e 
[Pilot] ServerEngines (SEP1) (rev 04)

Sorry, I should have included this list right from the start.


Regards
Harri


Re: [systemd-devel] eth2: Failed to rename network interface 6 from 'eth2' to 'eno1': File exists

2022-01-05 Thread Harald Dunkel

On 2022-01-05 11:17:20, Martin Wilck wrote:


This is default behavior. To disable it, you need to use
"net.ifnames=0". If you see the same value multiple times for either
"acpi_index" or "index", it'd be a firmware problem. I suppose it can
happen that one device has acpi_index==1 and another one has index==1
(IIRC the first is derived from ACPI _DSM, the second from SMBIOS / DMI
type 41).



I wonder where I can find these acpi_index and index attributes?

Regards
Harri


Re: [systemd-devel] eth2: Failed to rename network interface 6 from 'eth2' to 'eno1': File exists

2022-01-04 Thread Harald Dunkel

On 2022-01-04 16:14:16, Andrei Borzenkov wrote:


You have two interfaces which export the same onboard interface index.
There is not much udev can do here; the only option is to disable
onboard interface name policy. The attributes that are used by udev
are "acpi_index"  and "index". Check values of these attributes for
all interfaces.



I will check, but please note that I didn't enable this. AFAIU Debian
uses the settings according to the guidelines of upstream.



As is obvious from the log you provided, they did not "keep" their
names but were renamed. Whether this is correct depends on rules your
distribution is using.



AFAICS the kernel of today still assigns the "legacy" interface names,
which are renamed by udev later. I would suggest to improve conflict
handling in udev, e.g. by falling back to the "non-onboard" interface
names for renaming, instead of an exit(1). This makes sure that only
the on-board interfaces are affected, not a random set of interfaces
depending upon the discovery sequence.


Regards

Harri


[systemd-devel] eth2: Failed to rename network interface 6 from 'eth2' to 'eno1': File exists

2022-01-04 Thread Harald Dunkel

Hi folks,

after the upgrade from Buster to Bullseye (including the migration from
sysv init to systemd) the network interface names were messed up on
several hosts. Apparently udev stumbles over a naming conflict:

# journalctl -b | egrep -i e1000e\|igb\|rename\|eth\enp\|eno
Jan 03 11:30:14 nasl002b.example.com kernel: ACPI: Added 
_OSI(Linux-Lenovo-NV-HDMI-Audio)
Jan 03 11:30:14 nasl002b.example.com kernel: igb: Intel(R) Gigabit Ethernet 
Network Driver
Jan 03 11:30:14 nasl002b.example.com kernel: igb: Copyright (c) 2007-2014 Intel 
Corporation.
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e: Intel(R) PRO/1000 Network 
Driver
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e: Copyright(c) 1999 - 2015 
Intel Corporation.
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :00:19.0: Interrupt 
Throttling Rate (ints/sec) set to dynamic conservative mode
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.0: added PHC on eth0
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.0: Intel(R) Gigabit 
Ethernet Network Connection
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.0: eth0: 
(PCIe:5.0Gb/s:Width x4) a0:36:9f:00:06:1c
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.0: eth0: PBA No: 
G15139-001
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.0: Using MSI-X 
interrupts. 8 rx queue(s), 8 tx queue(s)
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.1: added PHC on eth1
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.1: Intel(R) Gigabit 
Ethernet Network Connection
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.1: eth1: 
(PCIe:5.0Gb/s:Width x4) a0:36:9f:00:06:1d
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.1: eth1: PBA No: 
G15139-001
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.1: Using MSI-X 
interrupts. 8 rx queue(s), 8 tx queue(s)
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :00:19.0 :00:19.0 
(uninitialized): registered PHC clock
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2: added PHC on eth2
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2: Intel(R) Gigabit 
Ethernet Network Connection
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2: eth2: 
(PCIe:5.0Gb/s:Width x4) a0:36:9f:00:06:1e
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2: eth2: PBA No: 
G15139-001
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2: Using MSI-X 
interrupts. 8 rx queue(s), 8 tx queue(s)
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3: added PHC on eth3
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3: Intel(R) Gigabit 
Ethernet Network Connection
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3: eth3: 
(PCIe:5.0Gb/s:Width x4) a0:36:9f:00:06:1f
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3: eth3: PBA No: 
G15139-001
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3: Using MSI-X 
interrupts. 8 rx queue(s), 8 tx queue(s)
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.2 eth4: renamed 
from eth2
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :00:19.0 eth2: (PCI 
Express:2.5GT/s:Width x1) 00:1e:67:19:34:6d
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :00:19.0 eth2: Intel(R) 
PRO/1000 Network Connection
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :00:19.0 eth2: MAC: 10, 
PHY: 11, PBA No: 0100FF-0FF
Jan 03 11:30:14 nasl002b.example.com kernel: igb :02:00.3 eth5: renamed 
from eth3
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :05:00.0: Interrupt 
Throttling Rate (ints/sec) set to dynamic conservative mode
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :05:00.0 :05:00.0 
(uninitialized): registered PHC clock
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :05:00.0 eth3: (PCI 
Express:2.5GT/s:Width x1) 00:1e:67:19:34:6c
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :05:00.0 eth3: Intel(R) 
PRO/1000 Network Connection
Jan 03 11:30:14 nasl002b.example.com kernel: e1000e :05:00.0 eth3: MAC: 3, 
PHY: 8, PBA No: 1000FF-0FF
Jan 03 11:30:15 nasl002b.example.com kernel: igb :02:00.1 enp2s0f1: renamed 
from eth1
Jan 03 11:30:15 nasl002b.example.com kernel: igb :02:00.0 eno1: renamed 
from eth0
Jan 03 11:30:15 nasl002b.example.com kernel: e1000e :05:00.0 enp5s0: 
renamed from eth3
Jan 03 11:30:15 nasl002b.example.com systemd-udevd[416]: eth2: Failed to rename 
network interface 6 from 'eth2' to 'eno1': File exists
Jan 03 11:30:20 nasl002b.example.com kernel: e1000e :05:00.0 enp5s0: NIC 
Link is Up 1000 Mbps Full Duplex, Flow Control: None


Apparently udev stops renaming interfaces after the first conflict. eth4 and 
eth5
have kept their names, too, even thought there wouldn't be any further conflict:

# ip l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1:  mtu 15

Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-14 Thread Harald Dunkel

On 8/13/20 11:03 AM, Lennart Poettering wrote:


Is it possible the container and the host run in the very same cgroup
hierarchy?

If that's the case (and it looks like it): this is not
supported. Please file a bug against LXC, it's very clearly broken.



FYI: https://github.com/lxc/lxc/issues/3520

Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-13 Thread Harald Dunkel

On 8/13/20 11:07 AM, Lennart Poettering wrote:


No! It's a bug. Not in systemd, but LXC. But generating errors in such
a borked setup is *good*, not bad, and certainly nothing to hide.



Surely its not a bug in systemd, but ignoring unreasonable data (maybe with
a warning, if necessary) has a proud tradition in computing. Not to mention
that systemd ignores duplicate PIDs in the same context as well, AFAICS.

Ignoring PID == 0 wouldn't be unreasonable, regardless whose bug this is.


Just my $0.02. Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-13 Thread Harald Dunkel

On 8/13/20 9:05 AM, Andrei Borzenkov wrote:


systemd should really clearly log this (invalid PID and and in which
cgroup it was). Returning generic error message without any indication
what caused this error is not useful at all.


Do you think it would be reasonable to silently ignore the PID = 0
in cg_read_pid() and maybe others?


Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-12 Thread Harald Dunkel

On 8/12/20 2:16 PM, Andrei Borzenkov wrote:

12.08.2020 14:03, Harald Dunkel пишет:

See attachment. Hope this helps
Harri




1 openat(AT_FDCWD, 
"/sys/fs/cgroup/unified/system.slice/rsyslog.service/cgroup.procs", 
O_RDONLY|O_CLOEXEC) = 24
1 read(24, "0\n1544456\n", 4096)= 10



kernel returns "0" as process number in this cgroup which results in EIO
returned by systemd.


Indeed. This is kernel 4.19.132-1, as provided by Debian 10. Upgrading
to kernel 5.6.14-2~bpo10+1 and lxc 4.0.4 doesn't help, same problem.

And now its getting weird: I found a few ghost services in some LXC
containers with *just* zeros, e.g. the zabbix-agent:

# cat /sys/fs/cgroup/unified/system.slice/zabbix-agent.service/cgroup.procs
0
0
0
0
0
0

Zabbix-agent isn't even installed in the container. Its installed on
the host system only.

I will check on the LXC mailing list. Maybe somebody is able to
reproduce this problem.


I highly appreciate your support on this
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-12 Thread Harald Dunkel

On 8/12/20 1:03 PM, Harald Dunkel wrote:

See attachment. Hope this helps
Harri


PS:

# ls -al /sys/fs/cgroup/unified/system.slice/rsyslog.service
total 0
drwxr-xr-x  2 root root 0 Jun 20 17:40 .
drwxr-xr-x 53 root root 0 Aug 12 13:30 ..
-r--r--r--  1 root root 0 Aug 12 13:05 cgroup.controllers
-r--r--r--  1 root root 0 Aug 12 13:05 cgroup.events
-rw-r--r--  1 root root 0 Aug 12 13:05 cgroup.max.depth
-rw-r--r--  1 root root 0 Aug 12 13:05 cgroup.max.descendants
-rw-r--r--  1 root root 0 Aug 12 13:05 cgroup.procs
-r--r--r--  1 root root 0 Aug 12 13:05 cgroup.stat
-rw-r--r--  1 root root 0 Aug 12 13:05 cgroup.subtree_control
-rw-r--r--  1 root root 0 Aug 12 13:05 cgroup.threads
-rw-r--r--  1 root root 0 Aug 12 13:05 cgroup.type
-r--r--r--  1 root root 0 Aug 12 13:05 cpu.stat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-12 Thread Harald Dunkel

See attachment. Hope this helps
Harri
1 epoll_wait(4, [{EPOLLIN, {u32=3589379376, u64=94720503158064}}], 36, -1) 
= 1
1 clock_gettime(CLOCK_BOOTTIME, {tv_sec=4562316, tv_nsec=425983895}) = 0
1 recvmsg(29, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="WATCHDOG=1\n", iov_len=4096}], msg_iovlen=1, 
msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, 
cmsg_data={pid=75, uid=0, gid=0}}], msg_controllen=32, 
msg_flags=MSG_CMSG_CLOEXEC}, MSG_TRUNC|MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 11
1 openat(AT_FDCWD, "/proc/75/cgroup", O_RDONLY|O_CLOEXEC) = 23
1 fstat(23, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
1 read(23, "13:name=systemd:/system.slice/in"..., 1024) = 290
1 read(23, "", 1024)= 0
1 close(23) = 0
1 timerfd_settime(25, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, 
tv_nsec=0}, it_value={tv_sec=4562321, tv_nsec=67759}}, NULL) = 0
1 epoll_wait(4, [{EPOLLIN, {u32=3588589904, u64=94720502368592}}], 36, -1) 
= 1
1 clock_gettime(CLOCK_BOOTTIME, {tv_sec=4562316, tv_nsec=795336025}) = 0
1 accept4(17, NULL, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 23
1 getsockopt(23, SOL_SOCKET, SO_PEERCRED, {pid=1613131, uid=0, gid=0}, 
[12]) = 0
1 fcntl(23, F_GETFL)= 0x802 (flags O_RDWR|O_NONBLOCK)
1 fcntl(23, F_GETFD)= 0x1 (flags FD_CLOEXEC)
1 fstat(23, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
1 getsockopt(23, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
1 setsockopt(23, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
1 getsockopt(23, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
1 setsockopt(23, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
1 getsockopt(23, SOL_SOCKET, SO_PEERCRED, {pid=1613131, uid=0, gid=0}, 
[12]) = 0
1 getsockopt(23, SOL_SOCKET, SO_PEERSEC, 0x5625d5f8beb0, [64]) = -1 
ENOPROTOOPT (Protocol not available)
1 getsockopt(23, SOL_SOCKET, SO_PEERGROUPS, [0], [256->4]) = 0
1 fstat(23, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
1 getsockopt(23, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
1 getsockname(23, {sa_family=AF_UNIX, sun_path="/run/systemd/private"}, 
[128->23]) = 0
1 recvmsg(23, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH 
EXTERNAL 30\r\nNEGOTIATE_UNI"..., iov_len=256}], msg_iovlen=1, 
msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 
45
1 epoll_ctl(4, EPOLL_CTL_ADD, 23, {0, {u32=3589061440, 
u64=94720502840128}}) = 0
1 epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLIN|EPOLLOUT, {u32=3589061440, 
u64=94720502840128}}) = 0
1 epoll_wait(4, [{EPOLLOUT, {u32=3589061440, u64=94720502840128}}], 39, -1) 
= 1
1 clock_gettime(CLOCK_BOOTTIME, {tv_sec=4562316, tv_nsec=801992152}) = 0
1 sendmsg(23, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="OK 
7afb570fdfeb4f7584b0d7b5c98ae"..., iov_len=52}, {iov_base=NULL, iov_len=0}, 
{iov_base=NULL, iov_len=0}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, 
MSG_DONTWAIT|MSG_NOSIGNAL) = 52
1 epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLIN, {u32=3589061440, 
u64=94720502840128}}) = 0
1 epoll_wait(4, [{EPOLLIN, {u32=3589061440, u64=94720502840128}}], 39, -1) 
= 1
1 clock_gettime(CLOCK_BOOTTIME, {tv_sec=4562316, tv_nsec=802457504}) = 0
1 epoll_wait(4, [{EPOLLIN, {u32=3589061440, u64=94720502840128}}], 39, -1) 
= 1
1 clock_gettime(CLOCK_BOOTTIME, {tv_sec=4562316, tv_nsec=802669666}) = 0
1 recvmsg(23, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\4\1 
\0\0\0\1\0\0\0\241\0\0\0\1\1o\0\31\0\0\0", iov_len=24}], msg_iovlen=1, 
msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 
24
1 recvmsg(23, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., iov_len=192}], 
msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 192
1 getuid()  = 0
1 kill(1544456, SIGHUP) = 0
1 openat(AT_FDCWD, 
"/sys/fs/cgroup/unified/system.slice/rsyslog.service/cgroup.procs", 
O_RDONLY|O_CLOEXEC) = 24
1 fstat(24, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
1 read(24, "0\n1544456\n", 4096)= 10
1 close(24) = 0
1 openat(AT_FDCWD, "/sys/fs/cgroup/unified/system.slice/rsyslog.service", 
O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 24
1 fstat(24, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
1 getdents64(24, /* 12 entries */, 32768) = 432
1 getdents64(24, /* 0 entries */, 32768) = 0
1 close(24) = 0
1 sendmsg(23, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\3\1\1\27\0\0\0\1\0\0\0g\0\0\0\5\1u\0\1\0\0\0\4\1s\0\"\0\0\0"...,
 iov_len=120}, {iov_base="\22\0\0\0Input/output error\0", iov_len=23}], 
msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 143
1 epoll_wait(4, [{EPOLLIN|EPOLLHUP, {u32=3589061440, u64=94720502840128}}], 
39, -1) = 1
1 clock_gettime(CLOCK_BOOTTIME, {

Re: [systemd-devel] Antw: [EXT] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-12 Thread Harald Dunkel

On 8/12/20 10:32 AM, Ulrich Windl wrote:


As you found out the details already, maybe you could have added some strace
output, especially after the kill() is returning...



See attachment. Hope this helps
Harri
44504 execve("/bin/systemctl", ["systemctl", "kill", "-s", "HUP", 
"rsyslog.service"], 0x7ffcae899d08 /* 25 vars */) = 0
44504 brk(NULL) = 0x55dbbdeff000
44504 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/haswell/avx512_1/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/haswell/avx512_1/x86_64", 0x7ffde718d280) = -1 
ENOENT (No such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/haswell/avx512_1/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/haswell/avx512_1", 0x7ffde718d280) = -1 ENOENT (No 
such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/haswell/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/haswell/x86_64", 0x7ffde718d280) = -1 ENOENT (No 
such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/haswell/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/haswell", 0x7ffde718d280) = -1 ENOENT (No such 
file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/avx512_1/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/avx512_1/x86_64", 0x7ffde718d280) = -1 ENOENT (No 
such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/avx512_1/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/avx512_1", 0x7ffde718d280) = -1 ENOENT (No such 
file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls/x86_64", 0x7ffde718d280) = -1 ENOENT (No such file 
or directory)
44504 openat(AT_FDCWD, "/lib/systemd/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
44504 stat("/lib/systemd/tls", 0x7ffde718d280) = -1 ENOENT (No such file or 
directory)
44504 openat(AT_FDCWD, "/lib/systemd/haswell/avx512_1/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/haswell/avx512_1/x86_64", 0x7ffde718d280) = -1 ENOENT 
(No such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/haswell/avx512_1/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/haswell/avx512_1", 0x7ffde718d280) = -1 ENOENT (No 
such file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/haswell/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/haswell/x86_64", 0x7ffde718d280) = -1 ENOENT (No such 
file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/haswell", 0x7ffde718d280) = -1 ENOENT (No such file or 
directory)
44504 openat(AT_FDCWD, "/lib/systemd/avx512_1/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/avx512_1/x86_64", 0x7ffde718d280) = -1 ENOENT (No such 
file or directory)
44504 openat(AT_FDCWD, "/lib/systemd/avx512_1/libc.so.6", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/avx512_1", 0x7ffde718d280) = -1 ENOENT (No such file 
or directory)
44504 openat(AT_FDCWD, "/lib/systemd/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (No such file or directory)
44504 stat("/lib/systemd/x86_64", 0x7ffde718d280) = -1 ENOENT (No such file or 
directory)
44504 openat(AT_FDCWD, "/lib/systemd/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
44504 stat("/lib/systemd", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
44504 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
44504 fstat(3, {st_mode=S_IFREG|0644, st_size=43597, ...}) = 0
44504 mmap(NULL, 43597, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa75e41b000
44504 close(3)  = 0
44504 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7fa75e419000
44504 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 
3
44504 read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832
44504 fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0
44504 mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7fa75e258000
44504 mprotect(0x7fa75e27a000, 1658880, PROT_NONE) = 0
44504 mmap(0x7fa75e27a000, 1343488, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7fa75e27a000
44504 mmap(0x7fa75e3c2000, 311296, PROT_READ, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7fa75e3c2000
44504 mmap(0x7fa

Re: [systemd-devel] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-11 Thread Harald Dunkel

On 8/11/20 2:27 PM, Lennart Poettering wrote:


Can you run systemctl with SYSTEMD_LOG_LEVEL debug? Anything
interesting in the debug output it generates then? I wonder where the
I/O error comes from...



Sure:

# export SYSTEMD_LOG_LEVEL=debug
# systemctl kill -s HUP rsyslog.service
Bus n/a: changing state UNSET → OPENING
Bus n/a: changing state OPENING → AUTHENTICATING
Bus n/a: changing state AUTHENTICATING → RUNNING
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 
path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager 
member=KillUnit cookie=1 reply_cookie=0 signature=ssi error-name=n/a 
error-message=n/a
Got message type=error sender=org.freedesktop.systemd1 destination=n/a path=n/a 
interface=n/a member=n/a cookie=1 reply_cookie=1 signature=s 
error-name=org.freedesktop.DBus.Error.IOError error-message=Input/output error
Failed to kill unit rsyslog.service: Input/output error
Bus n/a: changing state RUNNING → CLOSED


i.e. question is if the error is client side. If it's coming from the
server side, then the next thing to try would be to turn on debug
logging with "systemd-analyze log-level debug" and reproduce the
issue, then check if there's anything interesting in the logs.

Please provide the relevant log excerpts here then.



/var/log/debug:

Aug 12 08:30:40 srvvm01 systemd[1]: Sent message type=method_return 
sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a 
member=n/a cookie=1 reply_cookie=1 signature=n/a error-name=n/a 
error-message=n/a
Aug 12 08:30:40 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
RUNNING → CLOSING
Aug 12 08:30:40 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
CLOSING → CLOSED
Aug 12 08:30:40 srvvm01 systemd[1]: Got disconnect on private connection.
Aug 12 08:30:42 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:44 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:47 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:49 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:52 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:54 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:57 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:30:59 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:02 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:04 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:05 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
UNSET → OPENING
Aug 12 08:31:05 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
OPENING → AUTHENTICATING
Aug 12 08:31:05 srvvm01 systemd[1]: Accepted new private connection.
Aug 12 08:31:05 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
AUTHENTICATING → RUNNING
Aug 12 08:31:05 srvvm01 systemd[1]: Got message type=method_call sender=n/a 
destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 
interface=org.freedesktop.systemd1.Manager member=KillUnit cookie=1 
reply_cookie=0 signature=ssi error-name=n/a error-message=n/a
Aug 12 08:31:05 srvvm01 systemd[1]: Sent message type=error 
sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a 
member=n/a cookie=1 reply_cookie=1 signature=s 
error-name=org.freedesktop.DBus.Error.IOError error-message=Input/output error
Aug 12 08:31:05 srvvm01 systemd[1]: Failed to process message type=method_call 
sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 
interface=org.freedesktop.systemd1.Manager member=KillUnit cookie=1 
reply_cookie=0 signature=ssi error-name=n/a error-message=n/a: Input/output 
error
Aug 12 08:31:05 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
RUNNING → CLOSING
Aug 12 08:31:05 srvvm01 systemd[1]: Bus private-bus-connection: changing state 
CLOSING → CLOSED
Aug 12 08:31:05 srvvm01 systemd[1]: Got disconnect on private connection.
Aug 12 08:31:07 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:09 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:12 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:14 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:17 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:19 srvvm01 systemd[1]: inetd.service: Got notification message 
from PID 292 (WATCHDOG=1)
Aug 12 08:31:22 sr

[systemd-devel] I/O error on "systemctl kill -s HUP rsyslog.service"

2020-08-11 Thread Harald Dunkel

Hi folks,

sending a HUP to rsyslog using the "systemd way" gives me an error:

# systemctl kill -s HUP rsyslog.service
Failed to kill unit rsyslog.service: Input/output error

rsyslog receives the signal, but the exit value of systemctl indicates
an error, affecting the logrotate service.

This is a LXC container (lxc 4.0.2). Host and container are running
Debian 10, esp. systemd 241-7~deb10u4. See https://bugs.debian.org/968049


Every helpful hint is highly appreciated.

Harri

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Harald Dunkel

On 4/5/19 12:21 PM, Lennart Poettering wrote:

On Fr, 05.04.19 11:53, Harald Dunkel (harald.dun...@aixigo.de) wrote:



This is a VNC session, started via crontab @reboot.


IIRC debian/ubuntu do not have pam-systemd in their PAM configuration
for cron, which means these services are not tracked by
logind/systemd, and hence only killed when crond likes to do that.

It's a configuration bug in debian/ubuntu.



No, it was just a sample. Surely systemd is sufficiently stable
to recover from some lost processes?

The point is that rpcbind (and maybe others) are stopped before
the NFS umounts come up. Hopefully you agree that this is
unrelated to some cron jobs?


Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Harald Dunkel

Hi Lennart,

On 4/5/19 10:28 AM, Lennart Poettering wrote:


For some reason a number of X session processes stick around to the
very end and thus keep your /home busy.

[82021.052357] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[82021.101976] systemd-shutdown[1]: Sending SIGKILL to PID 2513 (gpg-agent).
[82021.130507] systemd-shutdown[1]: Sending SIGKILL to PID 2886 (xstartup).
[82021.158510] systemd-shutdown[1]: Sending SIGKILL to PID 2896 (xstartup).
[82021.186052] systemd-shutdown[1]: Sending SIGKILL to PID 2959 (xterm).
[82021.213129] systemd-shutdown[1]: Sending SIGKILL to PID 2960 (xterm).
[82021.239971] systemd-shutdown[1]: Sending SIGKILL to PID 2961 (xterm).
[82021.266285] systemd-shutdown[1]: Sending SIGKILL to PID 2966 (bash).
[82021.292234] systemd-shutdown[1]: Sending SIGKILL to PID 2967 (bash).
[82021.318061] systemd-shutdown[1]: Sending SIGKILL to PID 9146 (utempter).
[82021.343331] systemd-shutdown[1]: Sending SIGKILL to PID 9147 (utempter).

The question is how though. How do you start your X session? gdm?
startx from the console?



This is a VNC session, started via crontab @reboot.

Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] umount NFS problem

2019-04-05 Thread Harald Dunkel

On 4/5/19 8:45 AM, Mantas Mikulėnas wrote:


Normally I'd expect user sessions (user-*.slice, session-*.scope, 
user@*.service) to be killed before mount units are stopped; I wonder how 
random gpg-agent processes have managed to escape that. (Actually, doesn't 
Debian now manage gpg-agent via user@.service? That *really* should be cleaning 
up everything properly...)



Probably a remote login.

@Michael, libpam-systemd is installed. UsePAM is enabled, too.


You might also try to enable [Mount] LazyUnmount= for home.mount so that 
umounts appear to succeed immediately and the kernel cleans them up when it 
can. It mostly just hides the problem though.



Looking at the log file I have the impression that rpcbind has
been stopped even before the first umount attempt of /home. Can
you confirm this?


Regards
Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] umount NFS problem

2019-04-04 Thread Harald Dunkel

Hi folks,

I've got a device-busy-problem with /home, mounted via NFS.
Shutdown of the host takes more than 180 secs. See attached
log file.

Apparently the umount of /home at 81925.154995 failed, (device
busy, in my case it was a lost gpg-agent). This error was
ignored, the NFS framework was shut down, the network was
stopped, and then it was too late to properly handle the /home
mount point.

AFAIK the mount units are generated from /etc/fstab, so I wonder
if this could be improved?

The hosts (about 50 developer PCs) are running Debian 9, systemd
232-25+deb9u9. Unfortunately we are bound to this platform at
least for another year.


Every helpful hint is highly appreciated.

Harri
--
aixigo AG, Karl-Friedrich-Strasse 68, 52072 Aachen, Germany
phone: +49 241 559709-79, fax: +49 241 559709-99
eMail: harald.dun...@aixigo.de, web: http://www.aixigo.de
Amtsgericht Aachen - HRB 8057, Vorstand: Erich Borsch, Christian Friedrich, 
Tobias Haustein, Vors. des Aufsichtsrates: Prof. Dr. Ruediger von Nitzsch


shutdown-log.txt.gz
Description: application/gzip
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] looking for help to resolve shutdown problem

2019-04-04 Thread Harald Dunkel

Hi folks,

I am looking for help how to track down a delay of 90 secs at
shutdown time. I suspect that there is a problem with umounting
the /home directory tree (mounted via NFS).

Apparently it comes up after journal has been stopped, so I
tried the procedure described on

https://freedesktop.org/wiki/Software/systemd/Debugging/

The promised /shutdown-log.txt file was not created (or I was too
blind to see).


Every helpful hint is highly appreciated.

Harri
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel