Re: [CentOS] dbus/systemd failure on startup (CentOS 7.7)

2020-02-25 Thread Johnny Hughes
On 1/23/20 9:33 AM, James Pearson wrote:
> Simon Matter wrote:
>>
>>> However, we would still like to know what the issue is and get a 'real'
>>> fix - I guess we could try creating a bug report with Redhat ...
>>
>> By bug report you mean BZ or a support request as paying RHEL customer?
> 
> A BZ ...
> 
>> Unfortunately I'm not too happy anymore with how BZs are handled these
>> days. Am I alone with this feeling?
> 
> I've had mixed results with BZs - it appears if a bug 'tickles the
> fancy' of someone a Redhat that sees the ticket, then you can get good
> results - otherwise, they just sit there until the release goes out of
> support and they get dropped :-)
> 

Starting with CentOS-8 Stream, you will be able to fix this issues like
this yourself and then submit a pull request for review to get it rolled
into CentOS Stream and then into RHEL proper.

Also, you can figure out what is wrong and submit the fix WITH the BZ ..
i mean, that is why the CentOS community exists .. to submit community
fixes.



signature.asc
Description: OpenPGP digital signature
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] dbus/systemd failure on startup (CentOS 7.7)

2020-02-05 Thread James Pearson

James Pearson wrote:


We are seeing a problem that occurs ~5% of the time when rebooting
CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just
after the D-Bus service starts - from 'journalctl -x' :

...
Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System
Message Bus.
-- Subject: Unit dbus.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dbus.service has finished starting up.
--
-- The start-up result is done.
Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match
for Disconnected message: Connection timed out
Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize
D-Bus connection: Connection timed out
...


We've managed to work out what the problem is - it is the same issue as 
given in https://bugzilla.redhat.com/show_bug.cgi?id=1531486


We have a legacy use of NIS for groups - which can cause a boot time 
deadlock:


 systemd->dbus->nis(glibc)->rpcbind->systemd

A workaround is given in https://access.redhat.com/solutions/3900301 
(account needed to view) - but it is just essentially reverting the 
changes made to /usr/lib/systemd/system/rpcbind.socket between 7.5 and 7.6


James Pearson
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] dbus/systemd failure on startup (CentOS 7.7)

2020-01-23 Thread James Pearson

Simon Matter wrote:



However, we would still like to know what the issue is and get a 'real'
fix - I guess we could try creating a bug report with Redhat ...


By bug report you mean BZ or a support request as paying RHEL customer?


A BZ ...


Unfortunately I'm not too happy anymore with how BZs are handled these
days. Am I alone with this feeling?


I've had mixed results with BZs - it appears if a bug 'tickles the 
fancy' of someone a Redhat that sees the ticket, then you can get good 
results - otherwise, they just sit there until the release goes out of 
support and they get dropped :-)


James Pearson
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] dbus/systemd failure on startup (CentOS 7.7)

2020-01-23 Thread Simon Matter via CentOS
> Simon Matter via CentOS wrote:
>>
>>> We are seeing a problem that occurs ~5% of the time when rebooting
>>
>> I see such issues on a quite large multi user system but when this
>> happens, after forced restarts for kernel updates, I usually don't have
>> the time to analyze and play doctor on it. My "solution" now is to
>> simply
>> reboot the server again in such a case, AKA the systemd way :-)
>>
>>> CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just
>>> after the D-Bus service starts - from 'journalctl -x' :
>>>
>>> ...
>>> Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System
>>> Message Bus.
>>> -- Subject: Unit dbus.service has finished start-up
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit dbus.service has finished starting up.
>>> --
>>> -- The start-up result is done.
>>> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match
>>> for Disconnected message: Connection timed out
>>> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize
>>> D-Bus connection: Connection timed out
>>> ...
>>>
>>> This then has a knock-on effect that causes other services to fail -
>>> e.g.
>>>
>>> -- Unit gdm.service has begun starting up.
>>> Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating
>>> systemd to hand-off: service name='org.freedesktop.login1'
>>> unit='dbus-org.freedesktop.login1.service'
>>> Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to
>>> activate service 'org.freedesktop.systemd1': timed out
>>> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to
>>> enable subscription: Failed to activate service
>>> 'org.freedesktop.systemd1': timed out
>>> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to
>>> fully
>>> start up daemon: Connection timed out
>>> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service:
>>> main process exited, code=exited, status=1/FAILURE
>>> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login
>>> Service.
>>> -- Subject: Unit systemd-logind.service has failed
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit systemd-logind.service has failed.
>>> --
>>> -- The result is failed.
>>>
>>> Whatever the issue is, it appears that polkit might be involved - if we
>>> restart the polkit service, things appear to return to normal (e.g. gdm
>>> starts up etc)
>>>
>>> We can't find any similar reports of this happening elsewhere with
>>> CentOS 7.7 - but we were wondering if anyone else had come across a
>>> problem like this?
>>
>> I think the root of the problem is that there are missing definitions in
>> some of the systemd scripts. They allow things to work in 95% or greater
>> of the cases but this happens by chance, not because of perfect process
>> handling and system control. Small delays somewhere or uncommon system
>> environments then lead to intermittent failures which are difficult to
>> diagnose - at least for me.
>>
>> The good news is that you can just fiddle with the systemd scripts the
>> same way we fiddled with init scripts in the past. That way you can try
>> and error until you find a solution. Doesn't sound like being in full
>> control of things but better than not finding a solution at all.
>
> Yeah, we found that by introducing a small delay before the ExecStart in
> the dbus.service unit - even a delay of just 0.01 seconds (via
> 'ExecStartPre=/usr/bin/sleep 0.01') _seems_ to workaround the issue ...

Nice that you found at least a workaround. I think I remember that dbus is
quite special here because systemd starts it but also depends on it. At
least I remember cases where dbus got crazy for whatever reason: the
result was that systemd became completely unresponsive and unmanageable
and the whole system went down the drain, slowly but steady. Ever tried to
shutdown a box if systemd doesn't listen to you anymore? The perfect
Windows experience on Linux ;-)

> However, we would still like to know what the issue is and get a 'real'
> fix - I guess we could try creating a bug report with Redhat ...

By bug report you mean BZ or a support request as paying RHEL customer?

Unfortunately I'm not too happy anymore with how BZs are handled these
days. Am I alone with this feeling?

Regards,
Simon

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] dbus/systemd failure on startup (CentOS 7.7)

2020-01-23 Thread James Pearson

Simon Matter via CentOS wrote:



We are seeing a problem that occurs ~5% of the time when rebooting


I see such issues on a quite large multi user system but when this
happens, after forced restarts for kernel updates, I usually don't have
the time to analyze and play doctor on it. My "solution" now is to simply
reboot the server again in such a case, AKA the systemd way :-)


CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just
after the D-Bus service starts - from 'journalctl -x' :

...
Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System
Message Bus.
-- Subject: Unit dbus.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dbus.service has finished starting up.
--
-- The start-up result is done.
Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match
for Disconnected message: Connection timed out
Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize
D-Bus connection: Connection timed out
...

This then has a knock-on effect that causes other services to fail - e.g.

-- Unit gdm.service has begun starting up.
Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating
systemd to hand-off: service name='org.freedesktop.login1'
unit='dbus-org.freedesktop.login1.service'
Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to
activate service 'org.freedesktop.systemd1': timed out
Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to
enable subscription: Failed to activate service
'org.freedesktop.systemd1': timed out
Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to fully
start up daemon: Connection timed out
Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service:
main process exited, code=exited, status=1/FAILURE
Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login
Service.
-- Subject: Unit systemd-logind.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-logind.service has failed.
--
-- The result is failed.

Whatever the issue is, it appears that polkit might be involved - if we
restart the polkit service, things appear to return to normal (e.g. gdm
starts up etc)

We can't find any similar reports of this happening elsewhere with
CentOS 7.7 - but we were wondering if anyone else had come across a
problem like this?


I think the root of the problem is that there are missing definitions in
some of the systemd scripts. They allow things to work in 95% or greater
of the cases but this happens by chance, not because of perfect process
handling and system control. Small delays somewhere or uncommon system
environments then lead to intermittent failures which are difficult to
diagnose - at least for me.

The good news is that you can just fiddle with the systemd scripts the
same way we fiddled with init scripts in the past. That way you can try
and error until you find a solution. Doesn't sound like being in full
control of things but better than not finding a solution at all.


Yeah, we found that by introducing a small delay before the ExecStart in 
the dbus.service unit - even a delay of just 0.01 seconds (via 
'ExecStartPre=/usr/bin/sleep 0.01') _seems_ to workaround the issue ...


However, we would still like to know what the issue is and get a 'real' 
fix - I guess we could try creating a bug report with Redhat ...


Thanks

James Pearson
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] dbus/systemd failure on startup (CentOS 7.7)

2020-01-23 Thread Simon Matter via CentOS
> We are seeing a problem that occurs ~5% of the time when rebooting

I see such issues on a quite large multi user system but when this
happens, after forced restarts for kernel updates, I usually don't have
the time to analyze and play doctor on it. My "solution" now is to simply
reboot the server again in such a case, AKA the systemd way :-)

> CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just
> after the D-Bus service starts - from 'journalctl -x' :
>
> ...
> Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System
> Message Bus.
> -- Subject: Unit dbus.service has finished start-up
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit dbus.service has finished starting up.
> --
> -- The start-up result is done.
> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match
> for Disconnected message: Connection timed out
> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize
> D-Bus connection: Connection timed out
> ...
>
> This then has a knock-on effect that causes other services to fail - e.g.
>
> -- Unit gdm.service has begun starting up.
> Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating
> systemd to hand-off: service name='org.freedesktop.login1'
> unit='dbus-org.freedesktop.login1.service'
> Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to
> activate service 'org.freedesktop.systemd1': timed out
> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to
> enable subscription: Failed to activate service
> 'org.freedesktop.systemd1': timed out
> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to fully
> start up daemon: Connection timed out
> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service:
> main process exited, code=exited, status=1/FAILURE
> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login
> Service.
> -- Subject: Unit systemd-logind.service has failed
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit systemd-logind.service has failed.
> --
> -- The result is failed.
>
> Whatever the issue is, it appears that polkit might be involved - if we
> restart the polkit service, things appear to return to normal (e.g. gdm
> starts up etc)
>
> We can't find any similar reports of this happening elsewhere with
> CentOS 7.7 - but we were wondering if anyone else had come across a
> problem like this?

I think the root of the problem is that there are missing definitions in
some of the systemd scripts. They allow things to work in 95% or greater
of the cases but this happens by chance, not because of perfect process
handling and system control. Small delays somewhere or uncommon system
environments then lead to intermittent failures which are difficult to
diagnose - at least for me.

The good news is that you can just fiddle with the systemd scripts the
same way we fiddled with init scripts in the past. That way you can try
and error until you find a solution. Doesn't sound like being in full
control of things but better than not finding a solution at all.

Regards,
Simon

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos