Re: [systemd-devel] systemd user instance not working in only one account, XDG_RUNTIME_DIR not being set

2023-04-10 Thread Mantas Mikulėnas
On Tue, Apr 11, 2023, 03:41 Chandler  wrote:

> systemd has been working great here, system-wide as well as in all user
> instances except one.  I'm not exactly sure what all the steps are in
> the process to get a systemd user instance running.  The directory
> /run/user/$UID was not being created, though.
>
> I made some progress by running `systemctl start
> user@.service` and the /run/user/$UID was created.
>
> `systemctl --user status` returns `Failed to connect to bus: No such
> file or directory`.  XDG_RUNTIME_DIR is not being set, but a command
> like `XDG_RUNTIME_DIR=/run/user/$UID systemctl --user status` runs
> successfully, so I think it's down to this last piece.
>

The same pam_systemd module registers a "session" with logind (which
triggers the creation of runtime directory as well as the startup of
user@.service;
note: *not* user@) and sets XDG_RUNTIME_DIR after the session has
been registered. Check whether your tty or display is shown in the
`loginctl` session list.

Note that logind does not allow registering sessions from within another
session, so tools like `su` won't be able to do that (except for some
situations where they can but you wouldn't want them to) – only a fresh
login gets you a session. So usually step 1 is to not use `su` or `sudo`
here – run `machinectl shell foo@` if you need a shell for a local user.


[systemd-devel] systemd user instance not working in only one account, XDG_RUNTIME_DIR not being set

2023-04-10 Thread Chandler
systemd has been working great here, system-wide as well as in all user
instances except one.  I'm not exactly sure what all the steps are in
the process to get a systemd user instance running.  The directory
/run/user/$UID was not being created, though.

I made some progress by running `systemctl start
user@.service` and the /run/user/$UID was created.

`systemctl --user status` returns `Failed to connect to bus: No such
file or directory`.  XDG_RUNTIME_DIR is not being set, but a command
like `XDG_RUNTIME_DIR=/run/user/$UID systemctl --user status` runs
successfully, so I think it's down to this last piece.

Apparently XDG_RUNTIME_DIR is related to PAM?  I don't think I should
have to change PAM settings or anything like that since systemd is
working fine already in the other user accounts.  I couldn't find any
info on how to fix this part... What do you experts think?

The computer is stuck with CentOS 8 unfortunately and old v239 of
systemd, but I don't think that's the issue either since systemd is
working fine in all other cases.  I just wanted you to know in case
there are some deprecated commands or config needed to fix the problem.

Thanks


[systemd-devel] sharing of D-Bus connection between systemd PAM modules causes problems

2023-04-10 Thread Norbert Braun

Hi all,

I recently ran into a problem on Arch Linux ARM (32 bit) where logging 
in as root on the console would often, but not always, fail (much like 
in https://github.com/systemd/systemd/issues/17266). While investigating 
the problem, I found the following:


Systemd ships with two PAM modules, pam_systemd.so and 
pam_systemd_home.so. Both of these use pam_acquire_bus_connection to 
open a connection to the system bus. pam_acquire_bus_connection opens a 
connection on the first call, then uses pam_set_data and pam_get_data to 
cache the connection object for subsequent calls. Since the namespace 
for pam_set_data/pam_get_data is shared between all PAM modules, it can 
happen that one PAM module opens the connection and another one uses it. 
In my case, pam_systemd_home.so opens the connection and sends the Hello 
message. If the root user attempts to log in, pam_systemd_home.so exits 
early and leaves the connection open, to be re-used by pam_systemd.so.


This is problematic because struct sd_bus contains OrderedHashmap 
*reply_callbacks, and OrderedHashmap internally uses a global variable 
shared_hash_key. The PAM modules are statically linked with libsystemd, 
so this variable effectively exists twice in each of the two PAM 
modules. Since it is initialized to a random value, the value differs 
between the PAM modules. In the scenario above, it therefore differs 
between the sending of the Hello message and the processing of the 
reply. Thus, when the reply to the Hello message arrives, process_reply 
effectively looks for the reply cookie in a random hash bucket, and may 
or may not find it. In the latter case, this eventually leads to the 
somewhat cryptic error message: "pam_systemd(login:session): Failed to 
create session: Input/output error".


The problem is hidden on 64 bit systems, because the sizes of struct 
ordered_hashmap_entry and struct indirect_storage are such that an 
OrderedHashmap with direct storage only has a single bucket, and the 
value of shared_hash_key is therefore irrelevant. On a 32 bit system, 
however, the sizes are such that there are two buckets, and root login 
fails, with 50% probability, with the error message mentioned above. As 
expected from the above, it is possible to cause the problem to appear 
on a 64 bit system by changing "uint8_t _pad[3];" to "uint8_t _pad[19];" 
in struct indirect_storage (in src/basic/hashmap.c).


After the above, another problem surfaces during cleanup: bus_free calls 
ordered_hashmap_free_free(b->reply_callbacks), which calls free on each 
value in the hashmap. However, the struct reply_callback that 
sd_bus_call_async puts into the hashmap was not individually allocated, 
but part of a larger struct sd_bus_slot. free is unhappy about that, and 
the login process finally dies with a segmentation fault, aborting the 
login attempt entirely. This problem is normally hidden by the fact that 
reply_callbacks is empty by the time that bus_free is called.


Best regards,

Norbert


Re: [systemd-devel] systemctl daemon-reexec forgets running services and starts everything new

2023-04-10 Thread Michael Biebl
Am Mo., 10. Apr. 2023 um 09:46 Uhr schrieb Mantas Mikulėnas :
>
> On Tue, Apr 4, 2023 at 11:33 AM Wasser, Erik  wrote:
>>
>> # Some details to the hardware #
>>
>> Our metal runs OpenVZ/Virtuozzo with this kernel (without any problems):
>>
>> > Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20 21:47:55 
>> > MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> The container with the `systemctl daemon-reexec` problem reports the
>> following kernel:
>>
>> Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64
>> x86_64 x86_64 GNU/Linux
>
>
> Hold on a moment – if it is actually an *OpenVZ container*, not a VM, how/why 
> is it even reporting a different kernel than the host OS? Isn't the entire 
> point of OpenVZ to share a single kernel with the guest containers? Is it 
> actually 3.10 *pretending* to be 5.4 just to make it pass systemd's kernel 
> version checks?


Re: [systemd-devel] systemctl daemon-reexec forgets running services and starts everything new

2023-04-10 Thread Mantas Mikulėnas
On Tue, Apr 4, 2023 at 11:33 AM Wasser, Erik  wrote:

> # Some details to the hardware #
>
> Our metal runs OpenVZ/Virtuozzo with this kernel (without any problems):
>
> > Linux FQDN_REDACTED 3.10.0-1127.18.2.vz7.163.46 #1 SMP Fri Nov 20
> 21:47:55 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> The container with the `systemctl daemon-reexec` problem reports the
> following kernel:
>
> Linux FQDN_REDACTED 5.4.0 #1 SMP Thu Apr 22 16:18:59 MSK 2021 x86_64
> x86_64 x86_64 GNU/Linux
>

Hold on a moment – if it is actually an *OpenVZ container*, not a VM,
how/why is it even reporting a different kernel than the host OS? Isn't the
entire point of OpenVZ to share a single kernel with the guest containers?
Is it actually 3.10 **pretending** to be 5.4 just to make it pass systemd's
kernel version checks?

-- 
Mantas Mikulėnas