YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
----- Original message -----
From: David Johnson <david_john...@brown.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master
Date: Tue, Jul 3, 2018 3:48 AM
Well, I went through and restored all updated files in /install/postscripts back to previous version, and nowmy node boots again. Something in one of the scripts probably depends on something in the newer binariesthat were not updated. Thanks for the debugging tips.— ddjOn Jul 2, 2018, at 3:24 PM, David Johnson <david_john...@brown.edu> wrote:...Running command on mgt5.oscar.ccv.brown.edu: chmod -R a+r /install/postscripts 2>&1mgt5.oscar.ccv.brown.edu: Internal call command: xdsh node552 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 1 -m 172.20.0.6 'setupntp' --tftp /tftpboot --installdir /install --nfsv4 no -c -VRunning command on mgt5.oscar.ccv.brown.edu: /bin/hostname 2>&1Running command on mgt5.oscar.ccv.brown.edu: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1Running command on mgt5.oscar.ccv.brown.edu: hostname 2>&1Running command on mgt5.oscar.ccv.brown.edu: /opt/xcat/bin/pping node552 2>&1node552: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).Error: node552 remote shell had error code: 255[root@mgt5 xcat]#On Jul 2, 2018, at 3:20 PM, David Johnson <david_john...@brown.edu> wrote:After the chdef command and reboot the message in /var/log/xcat/xcat.log is exactly the same[root@node552 xcat]# cat xcat.logMon Jul 2 15:18:10 EDT 2018 [info]: xcat.xcatdsklspost: trying to download postscripts...Mon Jul 2 15:18:26 EDT 2018 [err]: xcat.xcatdsklspost: failed to download the postscripts from the xCAT server for node node552.oscar.ccv.brown.edu[root@mgt5 xcat]# tabdump site | grep debug"xcatdebugmode","1",,[root@mgt5 xcat]#On Jul 2, 2018, at 3:13 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:------------------------------------------------------------------------------updatenode will issue command from management node and process command (postscripts) on the compute node after node is booted.
postscripts on the compute node /xcatpost/ will be downloaded again from MN:/install/postscripts (not sure if you want to do this)
]# updatenode mid21tor24cn01 setupntp -V
[boston01]: Running command on boston01: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1
[boston01]: Running command on boston01: chmod -R a+r /install/postscripts 2>&1
[boston01]: boston01: Internal call command: xdsh mid21tor24cn01 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 1 -m 172.16.37.1 'setupntp' --tftp /tftpboot --installdir /install --nfsv4 no -c -V
[boston01]: Running command on boston01: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1
[boston01]: Running command on boston01: hostname 2>&1
[boston01]: Running command on boston01: /opt/xcat/bin/pping mid21tor24cn01 2>&1
[boston01]: mid21tor24cn01: Running /tmp/filez8tn6J.dsh 1 -m 172.16.37.1 setupntp --tftp /tftpboot --installdir /install --nfsv4 no -c -V
[boston01]: mid21tor24cn01: trying to download postscripts...
[boston01]: mid21tor24cn01: trying to download postscripts from http://172.16.37.1/install/postscripts/
[boston01]: mid21tor24cn01: postscripts are downloaded from 172.16.37.1 successfully.
[boston01]: mid21tor24cn01: postscripts downloaded successfully
[boston01]: mid21tor24cn01: trying to get mypostscript from 172.16.37.1...
[boston01]: mid21tor24cn01: trying to download http://172.16.37.1/tftpboot/mypostscripts/mypostscript.mid21tor24cn01...
[boston01]: mid21tor24cn01: mypostscript.mid21tor24cn01 is downloaded successfully.
[boston01]: mid21tor24cn01: Running //xcatpost/mypostscript
[boston01]: mid21tor24cn01: Mon Jul 2 14:39:39 EDT 2018 Running postscript: setupntp
[boston01]: mid21tor24cn01: Failed to set time zone: Invalid time zone 'America/New_York'
[boston01]: mid21tor24cn01: inactive
[boston01]: mid21tor24cn01: syncing the clock ...
[boston01]: mid21tor24cn01: WARNING: NTP Sync Failed before timeout. ntp server will try to sync...
[boston01]: mid21tor24cn01: Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.
[boston01]: mid21tor24cn01: postscript: setupntp exited with code 0
[boston01]: mid21tor24cn01: //xcatpost/mypostscript return with 0
[boston01]: mid21tor24cn01: Running of postscripts has completed.
-V options will show more debug information, hopefully we can see why failed to downloaded the postscripts.
by the way, after you rebooted the node, did u see any error message from /var/log/xcat/xcat.log?
...................................................................
Casandra Hong Qiu
Phone: (845) 433-9291, t/l 293-9291
Office: Building 8, 3-B-04
cxh...@us.ibm.com
<graycol.gif>David Johnson ---07/02/2018 02:41:17 PM---I have never used updatenode. Just reboot. Would that help debug? > On Jul 2, 2018, at 2:16 PM, Cas
From: David Johnson <david_john...@brown.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date: 07/02/2018 02:41 PM
Subject: Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master
I have never used updatenode. Just reboot. Would that help debug?------------------------------------------------------------------------------
- On Jul 2, 2018, at 2:16 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:
on the site table, turn on the "xcatdebugmode"
chdef -t site xcatdebugmode=1
Did u try to run "updatenode" again for the current compute node? add verbose "-V" to the option, should able to get more logs.
Thanks,
Casandra Qiu
...................................................................
Casandra Hong Qiu
Phone: (845) 433-9291, t/l 293-9291
Office: Building 8, 3-B-04
cxh...@us.ibm.com
<graycol.gif>David Johnson ---07/02/2018 11:37:15 AM---Thanks, it’s good to know I’ve got a hundred or so copies with changes back to 9/2017 on our compute
From: David Johnson <david_john...@brown.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date: 07/02/2018 11:37 AM
Subject: Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master
Thanks, it’s good to know I’ve got a hundred or so copies with changes back to 9/2017 on our compute nodes.
However even after restoring postscripts/hostkeys, postscripts/_xcat and postscripts/_ssh and checking the
files we created are still there, the diskless boot still fails:
Mon Jul 2 10:27:35 EDT 2018 [info]: xcat.xcatdsklspost: trying to download postscripts...
Mon Jul 2 10:28:14 EDT 2018 [err]: xcat.xcatdsklspost: failed to download the postscripts from the xCAT server for node node552.oscar.ccv.brown.edu
I will try restoring all the changed scripts, but would like to know at what point it decided it was a failure.
All the /xcatboot files seem to be there. I tried prepopulating the /etc/ssh directory in the image, but that didn’t help.
Where do I turn up the debugging?
Thanks!
― ddj------------------------------------------------------------------------------
- On Jul 2, 2018, at 5:29 AM, Song BJ Yang <yang...@cn.ibm.com> wrote:
hi David Johnson,
In the scenario you described, I think xCAT installation will affect the stuff under `/install/postscripts`
1) files with the same name with the files under `/install/postscripts` shipped by xCAT will be overwritten
2) the credentials under "/install/postscripts/hostkeys", "/install/postscripts/_xcat" and "/install/postscripts/_ssh/" and "/install/postscripts/ca/" will be overwritten
You can restore the original "/install/postscritps" directory with the "/xcatpost" directory on the compute nodes(non-hierarchy cluster) or service nodes(hierarchy cluster) which are not provisioned after xCAT reinstallation with `go-xcat` accidentally.
good luck
------------------------------------------------------------------------------
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193
----- Original message -----
From: "Victor Hu" <v...@us.ibm.com>
To: xcat-user@lists.sourceforge.net
Cc:
Subject: Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master
Date: Fri, Jun 29, 2018 11:11 PM
Hi Dave,
Thanks for reporting this, can you open a issue here? https://github.com/xcat2/xcat-core/issues
If it's inconvenient, please let me know and I'll open one on your behalf.
Thanks,
Victor
----- Original message -----
From: David Johnson <david_john...@brown.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Cc:
Subject: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master
Date: Thu, Jun 28, 2018 3:05 PM
I’m trying to set up a separate cluster with a new xcat master node, and downloaded and ran the go-xcat script.
I realized afterwards that it had updated a whole bunch of files in /install, which I had forgotten was nfs-mounted
from our main cluster’s master node.
Right off the bat, I’m thinking there might be a problem with ssh keys, and ca.pem.
Any other likely trouble spots? I put all our netboot stuff in /install/custom, and we
have added postinstall scripts rather than customizing ones from the distribution.
I can probably go out to any of the nodes still running and grab individual files back.
Thanks,
― ddj
Dave Johnson
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user