Dario & Brodsky, Running into the same problem here on a fresh install of CentOS5.6, with xCAT-2.6.8-snap201109140900 installed from the xcat-core and xcat-deps repos on sourceforge. It also seems that DNS is correct, I can resolve both the master and node host names from the node undergoing installation. I added some code to my pre.rh script to enable ssh during the install. After the installation has finished, the system just sits there and never reboots.
I see messages like this in the logs on the master node. Oct 17 15:39:39 itrc-adm2 xCAT: start up sshd Oct 17 15:39:39 itrc-adm2 xCAT: /xcatpost/syncfiles: there is no sync file template for the node Oct 17 15:39:39 itrc-adm2 xcat: addsiteyum: repos/centos5.7/x86_64 is not a directory Oct 17 15:39:39 itrc-adm2 xcat: Retrying flag update Oct 17 15:40:19 itrc-adm2 last message repeated 4 times Oct 17 15:41:29 itrc-adm2 last message repeated 7 times I logged into the node itrc-adm2 and ran ps to see what was running and I came across this: 3609 root 1004 S /bin/awk -f /xcatpost/updateflag.awk 10.250.19.1 3002 -------------> the ip address shown is correct for the master node. Which is interesting because the directory /xcatpost doesn't exist on itrc-adm2. I looked around and there are xcatpost files in /mnt/sysimage/xcatpost/. If you need me to post anymore information please let me know. Regards, Jamie I. Fargen Systems Administrator Research Computing University of South Florida [email protected] 813-974-4108 ________________________________________ From: [email protected] [[email protected]] Sent: Monday, October 17, 2011 1:17 PM To: [email protected] Subject: xCAT-user Digest, Vol 26, Issue 10 Send xCAT-user mailing list submissions to [email protected] To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/xcat-user or, via email, send a message with subject or body 'help' to [email protected] You can reach the person managing the list at [email protected] When replying, please edit your Subject line so it is more specific than "Re: Contents of xCAT-user digest..." Today's Topics: 1. updateflag.awk hangs forever? (Dario Dorella) 2. Re: updateflag.awk hangs forever? (Russell Jones) 3. Re: updateflag.awk hangs forever? (Brodsky Denis-RM08520) 4. Re: updateflag.awk hangs forever? (Lissa Valletta) 5. updateflag.awk hangs forever? (Dario Dorella) 6. updateflag.awk hangs forever? (Dario Dorella) 7. updateflag.awk hangs forever? (Dario Dorella) 8. Re: updateflag.awk hangs forever? (Lissa Valletta) ---------------------------------------------------------------------- Message: 1 Date: Sun, 16 Oct 2011 11:46:07 +0200 From: Dario Dorella <[email protected]> Subject: [xcat-user] updateflag.awk hangs forever? To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello list, I am trying to install a CentOS5 cluster using xCAT, and looking at what happens during installation it seems that the updateflag.awk never receives its "done" message from xcatd and keeps looping. Has anybody an idea on what I might be doing wrong and on how to debug this? Thx, Dario ------------------------------ Message: 2 Date: Sun, 16 Oct 2011 20:11:33 -0500 From: Russell Jones <[email protected]> Subject: Re: [xcat-user] updateflag.awk hangs forever? To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sounds like perhaps a postscript may be hanging. Do you have any custom postscripts? Any interesting messages in the other terminal windows during install? Are you truly doing diskfull node installation or are these diskless nodes? There's a known issue with CentOS 5.5 and the xcatdsklspost script that will cause a stateless node to hang during boot. On 10/16/2011 4:46 AM, Dario Dorella wrote: > Hello list, > > I am trying to install a CentOS5 cluster using xCAT, and looking at > what happens during installation it seems that the updateflag.awk never > receives its "done" message from xcatd and keeps looping. > > Has anybody an idea on what I might be doing wrong and on how to debug this? > > > Thx, > Dario > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > ------------------------------ Message: 3 Date: Mon, 17 Oct 2011 06:35:21 +0000 From: Brodsky Denis-RM08520 <[email protected]> Subject: Re: [xcat-user] updateflag.awk hangs forever? To: xCAT Users Mailing list <[email protected]> Message-ID: <cecdca94ddf5884a84ea374a38c714a301042...@039-sn1mpn1-003.039d.mgd.msft.net> Content-Type: text/plain; charset="us-ascii" Hello, I have same problem, still no fix -----Original Message----- From: Dario Dorella [mailto:[email protected]] Sent: Sunday, October 16, 2011 11:46 To: [email protected] Subject: [xcat-user] updateflag.awk hangs forever? Hello list, I am trying to install a CentOS5 cluster using xCAT, and looking at what happens during installation it seems that the updateflag.awk never receives its "done" message from xcatd and keeps looping. Has anybody an idea on what I might be doing wrong and on how to debug this? Thx, Dario ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------ Message: 4 Date: Mon, 17 Oct 2011 07:29:58 -0400 From: Lissa Valletta <[email protected]> Subject: Re: [xcat-user] updateflag.awk hangs forever? To: xCAT Users Mailing list <[email protected]> Cc: [email protected] Message-ID: <of1822366a.668d3d32-on8525792c.003eff15-8525792c.003f2...@us.ibm.com> Content-Type: text/plain; charset=US-ASCII Many times it is because at that point during the install, the node cannot contact the Management Server by the provided ip address. Check site table master attribute ( ip address as known by the node ) and /etc/resolv.conf on the node. Lissa K. Valletta 2-3/T12 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: Russell Jones <[email protected]> To: [email protected] Date: 10/16/2011 09:14 PM Subject: Re: [xcat-user] updateflag.awk hangs forever? Sounds like perhaps a postscript may be hanging. Do you have any custom postscripts? Any interesting messages in the other terminal windows during install? Are you truly doing diskfull node installation or are these diskless nodes? There's a known issue with CentOS 5.5 and the xcatdsklspost script that will cause a stateless node to hang during boot. On 10/16/2011 4:46 AM, Dario Dorella wrote: > Hello list, > > I am trying to install a CentOS5 cluster using xCAT, and looking at > what happens during installation it seems that the updateflag.awk never > receives its "done" message from xcatd and keeps looping. > > Has anybody an idea on what I might be doing wrong and on how to debug this? > > > Thx, > Dario > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------ Message: 5 Date: Mon, 17 Oct 2011 15:56:55 +0200 From: Dario Dorella <[email protected]> Subject: [xcat-user] updateflag.awk hangs forever? To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed I looked in the installing node, under /tmp/ and in ~root for log files and messages. The problem seems to be that xCATd never answer with "ready", all I can get is an empty response from $MASTER:3002. I was able to replicate the problem on a working environment by screwing name resolution, but seems that on the machines where this happens uncalled for node resolution is fine. Is there any way I can trace what's happening from the xCATd point of view? I want to know why when it receives the call on 3002 it answers with an empty string. Thx, Dario ------------------------------ Message: 6 Date: Mon, 17 Oct 2011 16:02:42 +0200 From: Dario Dorella <[email protected]> Subject: [xcat-user] updateflag.awk hangs forever? To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Lissa, I don't think this is network related: "tcpdump" and "netstat" showed the right stuff. Again, how can I get an insight on what xCATd is thinking? Thx, Dario ------------------------------ Message: 7 Date: Mon, 17 Oct 2011 15:47:54 +0200 From: Dario Dorella <[email protected]> Subject: [xcat-user] updateflag.awk hangs forever? To: [email protected] Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed I looked in the installing node, under /tmp/ and in ~root for log files and messages. The problem seems to be that xCATd never answer with "ready", all I can get is an empty response from $MASTER:3002. I was able to replicate the problem on a working environment by screwing name resolution, but seems that on the machines where this happens uncalled for node resolution is fine. Is there any way I can trace what's happening from the xCATd point of view? I want to know why when it receives the call on 3002 it answers with an empty string. Thx, Dario ------------------------------ Message: 8 Date: Mon, 17 Oct 2011 13:06:54 -0400 From: Lissa Valletta <[email protected]> Subject: Re: [xcat-user] updateflag.awk hangs forever? To: xCAT Users Mailing list <[email protected]> Cc: [email protected] Message-ID: <of698cbd29.781aaf5b-on8525792c.005d7216-8525792c.005e0...@us.ibm.com> Content-Type: text/plain; charset=US-ASCII If you look on the node, you will see in /tmp/mypostscript , this is the script that runs after install. The last thing that is done is to run updateflag.awk $MASTER 3002 "installstatus booted" which is going to send the status over port 3002 to $MASTER. $MASTER should be defined above in the script in an export, like below. If for some reason we cannot contact the Management Node by the address that is there, then the booted status never gets set. For example: /tmp/mypostscript: . . MASTER=10.16.0.103 export MASTER . . . updateflag.awk $MASTER 3002 "installstatus booted" You could check /var/log/xcat/xcat.log on the node also. Lissa K. Valletta 2-3/T12 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: Dario Dorella <[email protected]> To: [email protected] Date: 10/17/2011 10:13 AM Subject: [xcat-user] updateflag.awk hangs forever? Hi Lissa, I don't think this is network related: "tcpdump" and "netstat" showed the right stuff. Again, how can I get an insight on what xCATd is thinking? Thx, Dario ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------ ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ------------------------------ _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user End of xCAT-user Digest, Vol 26, Issue 10 ***************************************** ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
