Hey everyone,

         I figured it out. It was a faulty SFP that caused a bottleneck of IOPS 
so VRs could not write in the log dir which cascaded into DHCP outage.

Best regards,
Jordan  

-----Original Message-----
From: Yordan Kostov <yord...@nsogroup.com> 
Sent: 09 август 2021 г. 14:50
To: users@cloudstack.apache.org
Subject: slow vm start and dhcp log full?


[X] This message came from outside your organization


Hello everyone,

                Cloudstack 4.15 + XCP-NG 82 + Virtual router template 4.15. We 
got just about 15 VMs or so running. Mostly doing some backup tests or people 
trying it out.

                Recently I noticed quite some sluggishness on our environment. 
It took about 5-10 mins to create a new VM or start existing one.
                One of our networks stopped creating VMs where it seems the 
Virtual router was not giving addresses.

After some troubleshooting  I found the following issues:

  *   The Virtual router that did not give IP addresses had his 
/run/log/journal directory fill in the whole /run partition with logs.  It 
seems when this happen the Router stops giving IP addresses.
  *   The same Virtual router + one more were putting heavy load on the storage 
(20-25 MB/s) squeezing all the IOPS they can get.


Lets say issue number one is by design. What causes issue number 2?
VR logs  ( journalctl -p 3 -x --file 
/run/log/journal/5212989feea04bb6b13843e7b0c9d2b3/system.journal )  show this 
issue repeating:

Aug 09 11:41:22 r-39-VM systemd[1]: Failed to start User Manager for UID 0.
-- Subject: A start job for unit user@0.service has failed
-- Defined-By: systemd
-- Support: 
https://urldefense.com/v3/__https://www.debian.org/support__;!!A6UyJA!wCf6hAHLa6AftXnrRfqcu9NkyxpVWGHy_xO0Bxz2lPUzny2fOmjNxxkOFmN4WsBnk9u5yxTvRxGj$
--
-- A start job for unit user@0.service has finished with a failure.
--
-- The job identifier is 588 and the job result is failed.
Aug 09 11:41:29 r-39-VM systemd[1607]: PAM _pam_load_conf_file: unable to open 
config for /etc/pam.d/null Aug 09 11:41:29 r-39-VM systemd[1607]: PAM error 
loading (null) Aug 09 11:41:29 r-39-VM systemd[1607]: PAM _pam_init_handlers: 
error reading /etc/pam.d/systemd-user Aug 09 11:41:29 r-39-VM systemd[1607]: 
PAM _pam_init_handlers: [Critical error - immediate abort] Aug 09 11:41:29 
r-39-VM systemd[1607]: PAM error reading PAM configuration file Aug 09 11:41:29 
r-39-VM systemd[1607]: PAM pam_start: failed to initialize handlers Aug 09 
11:41:29 r-39-VM systemd[1607]: PAM failed: Critical error - immediate abort 
Aug 09 11:41:29 r-39-VM systemd[1607]: user@0.service: Failed to set up PAM 
session: Operation not permitted Aug 09 11:41:29 r-39-VM systemd[1607]: 
user@0.service: Failed at step PAM spawning /lib/systemd/systemd: Operation not 
permitted
-- Subject: Process /lib/systemd/systemd could not be executed
-- Defined-By: systemd
-- Support: 
https://urldefense.com/v3/__https://www.debian.org/support__;!!A6UyJA!wCf6hAHLa6AftXnrRfqcu9NkyxpVWGHy_xO0Bxz2lPUzny2fOmjNxxkOFmN4WsBnk9u5yxTvRxGj$
--
-- The process /lib/systemd/systemd could not be executed and failed.
--
-- The error number returned by this process is ERRNO.

                After rebooting the VMs things are back to normal, at least for 
now.
                Any advice on why VRs behave like that and why PAM is 
complaining ?

Best regards,
Jordan

Reply via email to