Re: [systemd-devel] How to figure out what's causing systemd to start printing messages partway through boot?

2020-04-30 Thread McKay, Sean
Dug a little deeper and found the "(Enabling | Disabling) showing of status." 
debug log, which shows up exactly once in my journalctl output... but only on 
one type of system. I don't see it on the other (still trying to figure out if 
I could be hitting some sort of rate limiting).

Nothing around it seems like a failure, but I'm still digging.

Any suggestions are still welcome

Thanks!
-Sean

From: systemd-devel  On Behalf Of 
McKay, Sean
Sent: Wednesday, April 29, 2020 4:26 PM
To: systemd-devel@lists.freedesktop.org
Subject: Re: [systemd-devel] How to figure out what's causing systemd to start 
printing messages partway through boot?

I apparently failed to finish a sentence. Fixed. My apologies!

From: McKay, Sean
Sent: Wednesday, April 29, 2020 4:15 PM
To: 
systemd-devel@lists.freedesktop.org
Subject: How to figure out what's causing systemd to start printing messages 
partway through boot?

Hi all,

Hopefully quick question:
Do you have any guidance on what I should look at to determine what is causing 
the show_status variable in systemd to get flipped to true without any apparent 
unit failures? Is there an easy way to find that information?

We have a few systems that start printing the manager status messages halfway 
through bootup (that aren't supposed to). They have 'quiet' set on the kernel 
command line and they appear to be following that for a good chunk of the boot, 
but partway through I start getting '[  OK ] Started...' messages without any 
failure showing up first (which is what I expect and have seen in other 
scenarios). Running 'systemctl show' after the system has booted, shows that 
the variable has indeed been flipped.  From looking through the code, it 
appears the only way this is possible should be for a unit to have hit some 
status other than "OK" (JOB_DONE internally, it would appear) but I don't see 
any of those in the output. And I'm not sure if there's an equivalent in the 
journal to search for.

I've poked through the journal near where things start showing up and there 
doesn't appear to be anything in particular that sticks out... so I turned on 
debug logging and there's a ton of it. And I haven't yet had any luck figuring 
out what I'm looking for from digging through the source code. So I figured I'd 
ask.

Thanks!
-Sean McKay

P.S. We're still waaay back on 229. Frantically trying to get things 
upgraded, but not there yet. I don't think that's relevant (I doubt this is a 
systemd bug), but it might come up if there's more tooling/verbosity in newer 
versions.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Requested transaction contradicts existing jobs: start is destructive

2020-04-30 Thread Kumar Kartikeya Dwivedi
On Thu, 30 Apr 2020 at 23:43, Uoti Urpala  wrote:
>
> On Thu, 2020-04-30 at 22:18 +0530, Kumar Kartikeya Dwivedi wrote:
> > waiting for the stop request initiated previously to finish). Even if
> > you use fail as the job mode, the error you get back is "Transaction
> > is destructive" or so, not this.
>
> IIRC the patch he mentioned in his mail was one that changed the error
> message from "Transaction is destructive" to this.
>

I see, it seems I overlooked that part. Thanks for correcting me.

Then this is indeed the stage where it tries to apply the transaction,
rather than conflicting jobs in the same transaction, so what I said
above probably does not apply at all.

I used a test case locally to try and reproduce the problem, and it
only manifests when the JOB_FAIL job mode is used, otherwise the stop
job on the unit is cancelled and replaced with the start job.

Something like this:

# anchor.service
[Unit]
Requires=that.slice
After=that.slice
[Service]
ExecStart=/bin/true

# other.service
[Service]
ExecStart=/bin/sleep infinity
/* delay the stop a bit */
ExecStop=/bin/sleep 10

Now, for
$ systemctl start other.service --no-block; systemctl --user stop
that.slice --no-block;

$ systemctl start anchor.service --job-mode fail
fails, while
$ systemctl start anchor.service --job-mode replace

Either that or the stop job installed is irreversible (can emulate
that by using --job-mode replace-irreversibly when stopping
that.slice). So these are the only possibilities. The code hasn't
changed at all since v219 too.

My educated guess is that the session slice is taking time to clean up
because of the default stop timeout due to processes in the session
not responding to SIGTERM (it is waiting for dependent units to stop
before stopping itself by virtue of being ordered after them), and in
the meantime a new login is initiated. Now, the stop job on the user
slice is still waiting, and looking at the code in logind,
manager_start_scope uses the job mode fail, which means the start job
for the user slice won't be able to cancel the sitting stop job, hence
the transaction cannot be applied.

--
Kartikeya
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Requested transaction contradicts existing jobs: start is destructive

2020-04-30 Thread Uoti Urpala
On Thu, 2020-04-30 at 22:18 +0530, Kumar Kartikeya Dwivedi wrote:
> waiting for the stop request initiated previously to finish). Even if
> you use fail as the job mode, the error you get back is "Transaction
> is destructive" or so, not this.

IIRC the patch he mentioned in his mail was one that changed the error
message from "Transaction is destructive" to this.


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Requested transaction contradicts existing jobs: start is destructive

2020-04-30 Thread Kumar Kartikeya Dwivedi
On Thu, 30 Apr 2020 at 16:56, Mark Bannister  wrote:
>
> The documentation says that the default option for --job-mode is "replace", 
> i.e.
> any conflicting pending job will be replaced, as necessary.
>
> How does it make sense for systemd to allow stop and start requests to
> contradict anyway?  This is the bit I'm finding very hard to understand.
> I don't think your explanation can be right.  Surely, if a start command is
> issued while a previous stop is still being processed, that is not an error.
> The start request should be followed as soon as the stop is finished.

Yes, this error is not raised by external jobs (when you use the
replace job mode, the job sitting in the unit's job slot is always
unseated if possible, and the other job either becomes 'runnable'
instantly, or in some cases waits for the other job to finish before
becoming runnable, emulating a queue (e.g. service unit's start job
waiting for the stop request initiated previously to finish). Even if
you use fail as the job mode, the error you get back is "Transaction
is destructive" or so, not this.

The problem you're seeing is caused when the same transaction pulls
two conflicting/unmergeable jobs, for example a unit being started
adds a start job and some unit in the dependency chain has Conflicts
on it, which would pull in a stop job for the same unit. These are
contradictory state change requests, and there's no easy way to
resolve this automatically (unlike merging where jobs that make the
unit reach a compatible state are merged into the same job).

This is has been reported before:
* The first one explains in detail when and why this happens:
https://github.com/systemd/systemd/issues/11440
* This one is more of a real life example:
https://github.com/systemd/systemd/issues/2830

The easiest way to debug problems like these is to enable debug
logging for the service manager, examine the transaction in the logs,
and inspect your units in the dependency chain to pin down the culprit.

--
Kartikeya
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Requested transaction contradicts existing jobs: start is destructive

2020-04-30 Thread Mark Bannister
> On Mon, 2020-04-27 at 11:40 +0100, Mark Bannister wrote:

> > One of the error messages I've been trying to explain is reported like this:
> > 2020-04-22T00:45:47.047710-04:00 jupiter systemd[1]: Requested transaction
> > contradicts existing jobs: Transaction for session-752473.scope/start is
> > destructive (user-16027.slice has 'stop' job queued, but 'start' is
> > included in transaction).

> > This also coincides with a message from sshd:
> > 2020-04-22T00:45:47.049949-04:00 jupiter sshd[20984]: 
> > pam_unix(sshd:session): session opened for user userlp by (uid=0)
> > 2020-04-22T00:45:51.181665-04:00 jupiter sshd[20984]: 
> > pam_unix(sshd:session): session closed for user userlp

> > I'm not sure where to begin to troubleshoot the problem.  The systemd
> > error is occurring sporadically and I haven't yet found a way to
> > reproduce it.  I also can't say I actually understand the error
> > message or what I'm supposed to do about it.  How is it even possible
> > for stop and start jobs to clash like this?

On Mon Apr 27 19:59:11 UTC 2020, Uoti Urpala wrote;
> See the systemctl documentation for the --job-mode option. Basically
> this should be due to an external request from somewhere to start the
> unit while systemd is in the middle of processing a previously started
> stop command for it.

The documentation says that the default option for --job-mode is "replace", i.e.
any conflicting pending job will be replaced, as necessary.

How does it make sense for systemd to allow stop and start requests to
contradict anyway?  This is the bit I'm finding very hard to understand.
I don't think your explanation can be right.  Surely, if a start command is
issued while a previous stop is still being processed, that is not an error.
The start request should be followed as soon as the stop is finished.

Also I'm unclear on what stop/start job is involved here, it's not sshd.service
that's going to stop and start when a session is opened for a user.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-hostnamed/hostnamectl and transient hostname change

2020-04-30 Thread Stef Walter
On 30.04.20 10:43, Zbigniew Jędrzejewski-Szmek wrote:
> On Mon, Apr 27, 2020 at 12:00:36PM +0200, Thomas HUMMEL wrote:
>> On 4/27/20 11:51 AM, Mantas Mikulėnas wrote:
>>
>> Hello, thanks for your answer.
>>
>>> On Mon, Apr 20, 2020 at 6:17 PM Thomas HUMMEL
>>> mailto:thomas.hum...@pasteur.fr>>
>>> wrote:
>>>
>>>1. why does the transient hostname change while I stated --static only
>>>while running hostnamectl ?
>>>
>>>2. why does the change take some time to appear on dbus ?
>>>
>>>
>>
>>> Hostnamed does not implement receiving hostname change
>>> notifications from the kernel, so it always reports you the same
>>> hostname that it has seen on startup.
>>
>> That was my understanding as well.
> 
> Lennart opened a PR to remove the caching:
> https://github.com/systemd/systemd/pull/15624.
> 
>>> You're only seeing changes because hostnamed /exits when idle/ --
>>> the next time you're actually talking to a brand new instance of
>>> hostnamed, which has seen the new hostname.
>>
>> But this does not explain why the transient hostname is changed as I
>> only changed the static one, does it ? Unless this new instance sets
>> it from the static one when it starts ? I mean something has to call
>> sethostname(2) to set the transient to the new static one, right ?
> 
> The documentation is wrong. The code in hostnamed sets the kernel
> hostname when setting the static one. This was changed in
> https://github.com/systemd/systemd/commit/c779a44222:
> 
> commit c779a44222161155c039a7fd2fd304c006590ac7
> Author: Stef Walter 
> Date:   Wed Feb 12 09:46:31 2014 +0100
> 
>hostnamed: Fix the way that static and transient host names interact
>
>It is almost always incorrect to allow DHCP or other sources of
>transient host names to override an explicitly configured static host
>name.
>
>This commit changes things so that if a static host name is set, this
>will override the transient host name (eg: provided via DHCP). Transient
>host names can still be used to provide host names for machines that have
>not been explicitly configured with a static host name.
>
>The exception to this rule is if the static host name is set to
>"localhost". In those cases we act as if no
>static host name has been explicitly set.
> 
> We need to reconcile the code and the docs. I'd go for updating the docs
> to match the code, because this is a long-standing behaviour and people
> haven't been complaining about it. (I'm assuming you're not unhappy, just
> confused by the unexpected results...). Opinions?

I'd support that. Obviously I should have included a docs patch in my
pull request :S

Stef

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-hostnamed/hostnamectl and transient hostname change

2020-04-30 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Apr 27, 2020 at 12:00:36PM +0200, Thomas HUMMEL wrote:
> On 4/27/20 11:51 AM, Mantas Mikulėnas wrote:
> 
> Hello, thanks for your answer.
> 
> >On Mon, Apr 20, 2020 at 6:17 PM Thomas HUMMEL
> >mailto:thomas.hum...@pasteur.fr>>
> >wrote:
> >
> >1. why does the transient hostname change while I stated --static only
> >while running hostnamectl ?
> >
> >2. why does the change take some time to appear on dbus ?
> >
> >
> 
> >Hostnamed does not implement receiving hostname change
> >notifications from the kernel, so it always reports you the same
> >hostname that it has seen on startup.
> 
> That was my understanding as well.

Lennart opened a PR to remove the caching:
https://github.com/systemd/systemd/pull/15624.

> >You're only seeing changes because hostnamed /exits when idle/ --
> >the next time you're actually talking to a brand new instance of
> >hostnamed, which has seen the new hostname.
> 
> But this does not explain why the transient hostname is changed as I
> only changed the static one, does it ? Unless this new instance sets
> it from the static one when it starts ? I mean something has to call
> sethostname(2) to set the transient to the new static one, right ?

The documentation is wrong. The code in hostnamed sets the kernel
hostname when setting the static one. This was changed in
https://github.com/systemd/systemd/commit/c779a44222:

commit c779a44222161155c039a7fd2fd304c006590ac7
Author: Stef Walter 
Date:   Wed Feb 12 09:46:31 2014 +0100

   hostnamed: Fix the way that static and transient host names interact
   
   It is almost always incorrect to allow DHCP or other sources of
   transient host names to override an explicitly configured static host
   name.
   
   This commit changes things so that if a static host name is set, this
   will override the transient host name (eg: provided via DHCP). Transient
   host names can still be used to provide host names for machines that have
   not been explicitly configured with a static host name.
   
   The exception to this rule is if the static host name is set to
   "localhost". In those cases we act as if no
   static host name has been explicitly set.

We need to reconcile the code and the docs. I'd go for updating the docs
to match the code, because this is a long-standing behaviour and people
haven't been complaining about it. (I'm assuming you're not unhappy, just
confused by the unexpected results...). Opinions?

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemctl restart changes permission.

2020-04-30 Thread Lennart Poettering
On Do, 30.04.20 09:35, Kaushal Shriyan (kaushalshri...@gmail.com) wrote:
65;6001;1c
> Hi,
>
> I am running CentOS Linux release 7.8.2003 (Core)
> with php72u-fpm-7.2.30-1.el7.ius.x86_64 version. I am facing the below
> permission denied issue. I also did the below steps
> #cd /run
> #chown -Rc nginx.nginx php-fpm
> changed ownership of ‘php-fpm/php-fpm.pid’ from root:root to nginx:nginx
> changed ownership of ‘php-fpm’ from root:root to nginx:nginx
> #systemctl restart php-fpm again changes it from nginx.nginx to root.root
> user.

Typically, these per-service dirs in /run/ get recreated during
restart, by the service in question. In addition, if RuntimeDirectory=
is used in the service, systemd will automatically clear the specified
directory when a service goes down. It's generally assumed to be the
job of the daemon in question to create and make sure things are
properly owned on next start if it uses different users and drops
privs on its own.

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Crash and size problem with systemd-repart

2020-04-30 Thread Emmanuel Garette

Le 30/04/2020 à 09:56, Lennart Poettering a écrit :

On So, 26.04.20 17:12, Emmanuel Garette (egare...@cadoles.com) wrote:


Hi,

I'm experimenting with the new tool systemd-repart and I think I have found
a bug.

Sorry for the late reply.


The problem is in line "xsz = partition_max_size(a->after);"

xsz is the max size of "a->after" instead current partition (so "p").

Am I right? I'll make a PR if yes.

Yes, this is correct. To make things quick I turned this into a PR for
you, but given that you found it, tracked it down and proposed a fix I
took the liberty to attribute the git commit to you, so that you get
attribution.

I hope that's OK?


Everything is ok for me.

regards,



https://github.com/systemd/systemd/pull/15639

Lennart

--
Lennart Poettering, Berlin


--
Emmanuel Garette
 Ingénieur en informatique libre
 
 Cadoles (http://www.cadoles.com)

 Experts EOLE, Envole et logiciels libres

<>___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Crash and size problem with systemd-repart

2020-04-30 Thread Lennart Poettering
On So, 26.04.20 17:12, Emmanuel Garette (egare...@cadoles.com) wrote:

> Hi,
>
> I'm experimenting with the new tool systemd-repart and I think I have found
> a bug.

Sorry for the late reply.

> The problem is in line "xsz = partition_max_size(a->after);"
>
> xsz is the max size of "a->after" instead current partition (so "p").
>
> Am I right? I'll make a PR if yes.

Yes, this is correct. To make things quick I turned this into a PR for
you, but given that you found it, tracked it down and proposed a fix I
took the liberty to attribute the git commit to you, so that you get
attribution.

I hope that's OK?

https://github.com/systemd/systemd/pull/15639

Lennart

--
Lennart Poettering, Berlin
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel