the postscripts and postbootscripts are invoked on the [start] of a system service named "xcatpostinit1.service", since systemd start the services in parallel with consideration on dependency. There might be some kind of deadlock while trying to restart a service in another service. Someone has reported the similar situation https://unix.stackexchange.com/questions/359941/starting-systemd-service-inside-systemd-service-causes-deadlock . What did you see on `
journalctl -u xcatpostinit1
` when it is hang?Back to your scenario @Brian,
>The ganglia packages get installed in otherpkgs, and the ganglia postscript edits the /etc/ganglia/gmond.conf file with our custom cluster info and attempts to enable and start the service.
For diskless node, it is not a good practice to install and configure packages with `otherpkgs` and `ospkgs` , the package should be installed in rootimg directory with `otherpkgdir` and `otherpkglist` attributes, and configured with `postinstall` scripts on genimage.
------------------------------------------------------------------------------
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
----- Original message -----
From: Michael Robbert <mrobb...@mines.edu>
To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] Diskless postboot script unable to start gmond service
Date: Tue, Mar 5, 2019 8:28 AM
I'll note that I have seen similar problems with physical hosts and running a diskfull install. If I put a "systemctl start $servicename" in any of my postscripts they will hang during the postinstall process. I have taken to removing the start step and just enabling the service, then once I've determined that the postinstall is complete I just reboot the node and all services start as expected. I have seen it with gmond and slurmd so I know that it isn't specific to Ganglia.
The other work around that I'm working on implementing is to move all the things that postscripts are doing to Ansible.
Mike
On 3/4/19 4:42 PM, Brian Joiner wrote:We're deploying diskless nodes in Vsphere and installing Ganglia monitoring tools.The ganglia packages get installed in otherpkgs, and the ganglia postscript edits the /etc/ganglia/gmond.conf file with our custom cluster info and attempts to enable and start the service.systemctl enable gmond workssystemctl start gmond causes the script to hang, indefinitely, until I log into the node and kill it. Then the script completes and allows other postbootscripts to run.Why is systemctl hanging on service start? If we remove that command from the script, it completes but the service doesn't auto start, so manual intervention is required. Is this unique to a diskless install? We got around it by creating the gmond.service file and symlink in the rootimg dir of the diskless image, but were wondering if there's a way to get a service to start the normal way.HOST: Vsphere, diskless,Centos 7.5Ganglia 3.7xCAT 2.14--Thanks,Brian Joiner_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user