You are correct . Everything in /install/postscripts and it subdirectories must be world readable and at least executable by root. If not the wget of all the postscripts fails. For you information when you run updatenode , one of the first things it does is the following to make sure the directory permissions are correct. "chmod -R a+r /install/postscripts"
You probably would have see the error in /tmp/wget.log on the node. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> Date: 07/22/2014 01:33 PM Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) Classification: UNCLASSIFIED Caveats: NONE Thanks so much for your suggestions. I did check httpd log and found one of the new postscripts that I created was getting permission denied. My default umask is set restrictive so that script (/install/postscripts/install_nvidia_driver) was set to 700. (NOTE: this script wasn't currently listed in the postscripts section as I wasn't ready to add it ... ) Anyways, I changed the script permissions to 755 and kicked off another rebuild, which completed successfully. So... it appears all /install/postscripts/* need to be readable by world?? -nate -----Original Message----- From: Linda Mellor [mailto:mel...@us.ibm.com] Sent: Tuesday, July 22, 2014 11:58 AM To: xCAT Users Mailing list Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) Ok. So, after install, it looks like the node is not able to communicate at all with the xcatd daemon on your management node. On your management node, the /install/autoinst/gpu4 file (your kickstart config file for the node) has a '%post' section. When this runs on your node, it will do something along the lines of (this may not be exactly the same for your version of xCAT, but should be somewhat similar): - set your node's hostname ==> did that work? - wget a copy of all the postscripts from your xCAT MN and put them in the /xcatpost directory ===> do you have a full set of postscripts? check /tmp on the failing node for wget errors. check /var/log/httpd/error_log and access_log on the MN for responses to the wget request - depending on a few different factors, this may bring down an initial copy of /xcatpost/mypostscript. If it does not, it will run /xcatpost/getpostscript.awk to bring down a copy, in which case you should see a msg in /var/log/messages on the MN about xcatd receiving the getpostscript request from that node. - if the mypostscript came down correctly, the file /opt/xcat/xcatinfo should be created on the node, and contain your xCAT MN as XCATSERVER=... ===> did you get this far?? is there anything else in your xcatinfo file? - adds in a bunch of subroutines, etc., to /xcatpost/mypostscript ===> you probably see all of those reqardless of what went on before - creates /etc/init.d/xcatpostinit1 to run postscripts after the reboot ===> I'm sure this got created and was run on reboot based on where you saw failures happening - creates /opt/xcat/xcatinstallpost which gets run by xcatpostinit1 ===> again, sure this got created and run based on your failures - adds the 'updatefag.awk $MASTER 3002 installstatus booted' cmd to /xcatpost/mypostscript.post <<at this point it is the literal string '$MASTER', not the substituted value>> - creates /opt/xcat/xcatdsklspost - runs /xcatpost/mypostscript to run your node's postscripts. In your case it should be: syslog,remoteshell,syncfiles,otherpkgs,setupntp Depending on what did not work correctly above, a few common things to check: - make sure your node's networking is set up correctly and it can ping your MN as you expect - can you ssh to the node from the MN without a passord? (i.e. the remoteshell postscript ran correctly) - is name resolution working well? on the node, can you resolve the MN hostname and get the correct IP? You can try to run updateflag manually to see where things may be breaking down: - on the node: /xcatpost/updateflag.awk <your MN IP> 3002 installstatus testing Finally, we did notice that you are installing RHELS 6.5 using xCAT 2.8.1. We did not get our formal xCAT support for RH 6.5 in until our latest xCAT 2.8.4, so there may be some glitches you are running into with that. Linda Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014 02:52:43 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel B CTR (US)" ---07/21/2014 02:52:43 PM---Classification: UNCLASSIFIED Caveats: NONE From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> Date: 07/21/2014 02:52 PM Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) ________________________________ Classification: UNCLASSIFIED Caveats: NONE No, on failing node - /var/log/xcat doesn't exist and no errors for xcatd on management node. After the kickstart and reboot, it errors on /xcatpost/updateflag.awk (and doesn't run any of the postscripts to complete the setup) Initially, gpu4 wasn't defined in the postscripts table. For testing, I disabled the gpu group (that gpu4 is a member of) and manually added a record to the table for gpu4. Was that the correct way to do that? The very first rebuild to 6.5 actually worked and ran the postscripts, but when I kicked it off a second time with some additional postscripts for testing, it started failing with this error. [root@admin ~]# nodels gpu4 groups gpu4: ipmi,gpu,all (tabdump of gpu related postscripts) #node,postscripts,postbootscripts,comments,disable "xcatdefaults","syslog,remoteshell,syncfiles,otherpkgs",,, "gpu","nvidia_drvr_install,setupntp","mlnxofed_ib_install,configiba,gpfs_upd ates",,"1" "gpu4","setupntp","gpfs_yum_update,install_torque_client,install_ofed_driver s,configiba,gpfs_setup,csa_harden,syncfiles",, -----Original Message----- From: Linda Mellor [mailto:mel...@us.ibm.com] Sent: Monday, July 21, 2014 2:24 PM To: xCAT Users Mailing list Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) On the failing node, do you see anything in /var/log/xcat/xcat.log that might give you some clues? Also, on the management node, any xcatd errors showing up in /var/log/messages? Linda Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014 02:02:12 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel B CTR (US)" ---07/21/2014 02:02:12 PM---Classification: UNCLASSIFIED Caveats: NONE From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> Date: 07/21/2014 02:02 PM Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) ________________________________ Classification: UNCLASSIFIED Caveats: NONE I've primarily been using the documentation from the wiki site as a reference and then the man pages. Master is defined in the site table, and I've rebuilt other nodes previous to this without making any changes to that table. I've noticed on the failing node: /xcatpost/mypostscript.post doesn't list any postscripts or environment variables (like MASTER) [root@admin ~]# lsxcatd -a Version 2.8.1 (svn r16213, built Tue May 7 22:55:07 EDT 2013) This is a Management Node dbengine=SQLite [root@admin ~]# lsdef gpu4 Object name: gpu4 arch=x86_64 bmc=gpu4-imm chain=runcmd=bmcsetup,shell currchain=boot currstate=install rhels6.5-x86_64-gpu.rhels6-5 groups=ipmi,gpu,all initrd=xcat/osimage/gpu-6.5/initrd.img installnic=mac kcmdline=quiet repo=http://10.10.100.254:80/install/rhels6.5/x86_64 ks=http://10.10.100.254:80/install/autoinst/gpu4 ksdevice=40:f2:e9:03:a2:b4 cmdline console=tty0 console=ttyS0,115200n8r kernel=xcat/osimage/gpu-6.5/vmlinuz mac=40:f2:e9:03:a2:b4 mgt=ipmi mtm=7912AC1 netboot=xnba nodetype=osi ondiscover=nodediscover os=rhels6.5 postbootscripts=gpfs_yum_update,install_torque_client,install_ofed_drivers,c onfigiba,gpfs_setup,csa_harden,syncfiles postscripts=syslog,remoteshell,syncfiles,otherpkgs,setupntp profile=gpu.rhels6-5 provmethod=gpu-6.5 serial=KQ9MN99 serialflow=hard serialport=0 serialspeed=115200 status=installing statustime=07-21-2014 08:35:35 switch=bnt8000 switchport=11 updatestatus=synced updatestatustime=07-08-2014 12:20:11 xcatmaster=10.10.100.254 -----Original Message----- From: Lissa Valletta [mailto:lis...@us.ibm.com] Sent: Monday, July 21, 2014 1:22 PM To: xCAT Users Mailing list Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) What level of xCAT are you using. Give us an lsdef <nodename> of one of the nodes that fails. Many of the environment variables like MASTER come from the site table. tabdump site and see if master is defined. It should be the ip or hostname as known by the node you are installing. What document have you been following to setup xCAT? Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014 12:45:41 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel B CTR (US)" ---07/21/2014 12:45:41 PM---Classification: UNCLASSIFIED Caveats: NONE From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil> To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net> Cc: "Motsko, Mark J CTR USARMY ARL \(US\)" <mark.j.motsko....@mail.mil> Date: 07/21/2014 12:45 PM Subject: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (UNCLASSIFIED) ________________________________ Classification: UNCLASSIFIED Caveats: NONE Hi All. I'm a new xcat user (please note this) and have run into an issue that's I'm unsure how to resolve. I've been rebuilding nodes successfully (upgrading to rhels6.5) and then for one of the rebuilds I get the error: awk: /xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid (It consistently happens now and I suspect it has something to do with a modification to the postscripts table, which I've been doing for other successfully rebuilds. I've backed out that change, but it hasn't resolved the issue for me) Anyways, I've traced this down to the /xcatpost/mypostscript.post file missing any references to MASTER (plus other environment variables) and this causes the postscripts to fail to execute. Lsdef still shows a status of "installing"... Would anyone have some advice on what to check? Thanks in advance, Nate Matter, ARL/DSRC Local Support Team Lead Lockheed Martin, IS&GS - Defense US Army Research Lab DOD Supercomputing Resource Center Aberdeen Proving Ground, MD 21005 410-278-6942 (Office) 717-546-4043 (Cell) 410-278-8799 (Fax) nathaniel.b.mat...@lmco.com; nathaniel.b.matter....@mail.mil Classification: UNCLASSIFIED Caveats: NONE [attachment "smime.p7s" deleted by Lissa Valletta/Poughkeepsie/IBM] ---------------------------------------------------------------------------- -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user Classification: UNCLASSIFIED Caveats: NONE [attachment "smime.p7s" deleted by Linda Mellor/Poughkeepsie/IBM] ---------------------------------------------------------------------------- -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user Classification: UNCLASSIFIED Caveats: NONE [attachment "smime.p7s" deleted by Linda Mellor/Poughkeepsie/IBM] ---------------------------------------------------------------------------- -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user Classification: UNCLASSIFIED Caveats: NONE (See attached file: smime.p7s) ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
smime.p7s
Description: Binary data
------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user