Ok. So, after install, it looks like the node is not able to communicate
at all with the xcatd daemon on your management node.
On your management node, the /install/autoinst/gpu4 file (your kickstart
config file for the node) has a '%post' section. When this runs on your
node, it will do something along the lines of (this may not be exactly the
same for your version of xCAT, but should be somewhat similar):
- set your node's hostname ==> did that work?
- wget a copy of all the postscripts from your xCAT MN and put them in
the /xcatpost directory ===> do you have a full set of postscripts?
check /tmp on the failing node for wget errors.
check /var/log/httpd/error_log and access_log on the MN for responses to
the wget request
- depending on a few different factors, this may bring down an initial
copy of /xcatpost/mypostscript. If it does not, it will
run /xcatpost/getpostscript.awk to bring down a copy, in which case you
should see a msg in /var/log/messages on the MN about xcatd receiving the
getpostscript request from that node.
- if the mypostscript came down correctly, the file /opt/xcat/xcatinfo
should be created on the node, and contain your xCAT MN as XCATSERVER=...
===> did you get this far?? is there anything else in your xcatinfo file?
- adds in a bunch of subroutines, etc., to /xcatpost/mypostscript ===>
you probably see all of those reqardless of what went on before
- creates /etc/init.d/xcatpostinit1 to run postscripts after the reboot
===> I'm sure this got created and was run on reboot based on where you saw
failures happening
- creates /opt/xcat/xcatinstallpost which gets run by xcatpostinit1 ===>
again, sure this got created and run based on your failures
- adds the 'updatefag.awk $MASTER 3002 installstatus booted' cmd
to /xcatpost/mypostscript.post <<at this point it is the literal string
'$MASTER', not the substituted value>>
- creates /opt/xcat/xcatdsklspost
- runs /xcatpost/mypostscript to run your node's postscripts. In your case
it should be: syslog,remoteshell,syncfiles,otherpkgs,setupntp
Depending on what did not work correctly above, a few common things to
check:
- make sure your node's networking is set up correctly and it can ping your
MN as you expect
- can you ssh to the node from the MN without a passord? (i.e. the
remoteshell postscript ran correctly)
- is name resolution working well? on the node, can you resolve the MN
hostname and get the correct IP?
You can try to run updateflag manually to see where things may be breaking
down:
- on the node:
/xcatpost/updateflag.awk <your MN IP> 3002 installstatus testing
Finally, we did notice that you are installing RHELS 6.5 using xCAT 2.8.1.
We did not get our formal xCAT support for RH 6.5 in until our latest xCAT
2.8.4, so there may be some glitches you are running into with that.
Linda
From: "Matter, Nathaniel B CTR (US)"
<nathaniel.b.matter....@mail.mil>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date: 07/21/2014 02:52 PM
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host
and port information (3002, installstatus booted) invalid
(UNCLASSIFIED)
Classification: UNCLASSIFIED
Caveats: NONE
No, on failing node - /var/log/xcat doesn't exist and no errors for xcatd
on
management node.
After the kickstart and reboot, it errors on /xcatpost/updateflag.awk (and
doesn't run any of the postscripts to complete the setup)
Initially, gpu4 wasn't defined in the postscripts table. For testing, I
disabled the gpu group (that gpu4 is a member of) and manually added a
record to the table for gpu4. Was that the correct way to do that? The
very first rebuild to 6.5 actually worked and ran the postscripts, but when
I kicked it off a second time with some additional postscripts for testing,
it started failing with this error.
[root@admin ~]# nodels gpu4 groups
gpu4: ipmi,gpu,all
(tabdump of gpu related postscripts)
#node,postscripts,postbootscripts,comments,disable
"xcatdefaults","syslog,remoteshell,syncfiles,otherpkgs",,,
"gpu","nvidia_drvr_install,setupntp","mlnxofed_ib_install,configiba,gpfs_upd
ates",,"1"
"gpu4","setupntp","gpfs_yum_update,install_torque_client,install_ofed_driver
s,configiba,gpfs_setup,csa_harden,syncfiles",,
-----Original Message-----
From: Linda Mellor [mailto:mel...@us.ibm.com]
Sent: Monday, July 21, 2014 2:24 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host
and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)
On the failing node, do you see anything in /var/log/xcat/xcat.log that
might give you some clues?
Also, on the management node, any xcatd errors showing up in
/var/log/messages?
Linda
Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014
02:02:12 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel
B
CTR (US)" ---07/21/2014 02:02:12 PM---Classification: UNCLASSIFIED Caveats:
NONE
From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date: 07/21/2014 02:02 PM
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host
and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)
________________________________
Classification: UNCLASSIFIED
Caveats: NONE
I've primarily been using the documentation from the wiki site as a
reference and then the man pages.
Master is defined in the site table, and I've rebuilt other nodes previous
to this without making any changes to that table.
I've noticed on the failing node: /xcatpost/mypostscript.post doesn't list
any postscripts or environment variables (like MASTER)
[root@admin ~]# lsxcatd -a
Version 2.8.1 (svn r16213, built Tue May 7 22:55:07 EDT 2013) This is a
Management Node dbengine=SQLite
[root@admin ~]# lsdef gpu4
Object name: gpu4
arch=x86_64
bmc=gpu4-imm
chain=runcmd=bmcsetup,shell
currchain=boot
currstate=install rhels6.5-x86_64-gpu.rhels6-5
groups=ipmi,gpu,all
initrd=xcat/osimage/gpu-6.5/initrd.img
installnic=mac
kcmdline=quiet repo=http://10.10.100.254:80/install/rhels6.5/x86_64
ks=http://10.10.100.254:80/install/autoinst/gpu4 ksdevice=40:f2:e9:03:a2:b4
cmdline console=tty0 console=ttyS0,115200n8r
kernel=xcat/osimage/gpu-6.5/vmlinuz
mac=40:f2:e9:03:a2:b4
mgt=ipmi
mtm=7912AC1
netboot=xnba
nodetype=osi
ondiscover=nodediscover
os=rhels6.5
postbootscripts=gpfs_yum_update,install_torque_client,install_ofed_drivers,c
onfigiba,gpfs_setup,csa_harden,syncfiles
postscripts=syslog,remoteshell,syncfiles,otherpkgs,setupntp
profile=gpu.rhels6-5
provmethod=gpu-6.5
serial=KQ9MN99
serialflow=hard
serialport=0
serialspeed=115200
status=installing
statustime=07-21-2014 08:35:35
switch=bnt8000
switchport=11
updatestatus=synced
updatestatustime=07-08-2014 12:20:11
xcatmaster=10.10.100.254
-----Original Message-----
From: Lissa Valletta [mailto:lis...@us.ibm.com]
Sent: Monday, July 21, 2014 1:22 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host
and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)
What level of xCAT are you using.
Give us an lsdef <nodename> of one of the nodes that fails.
Many of the environment variables like MASTER come from the site table.
tabdump site and see if master is defined. It should be the ip or
hostname
as known by the node you are installing.
What document have you been following to setup xCAT?
Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102
Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014
12:45:41 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel
B
CTR (US)" ---07/21/2014 12:45:41 PM---Classification: UNCLASSIFIED Caveats:
NONE
From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil>
To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
Cc: "Motsko, Mark J CTR USARMY ARL \(US\)" <mark.j.motsko....@mail.mil>
Date: 07/21/2014 12:45 PM
Subject: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)
________________________________
Classification: UNCLASSIFIED
Caveats: NONE
Hi All.
I'm a new xcat user (please note this) and have run into an issue that's
I'm
unsure how to resolve.
I've been rebuilding nodes successfully (upgrading to rhels6.5) and then
for
one of the rebuilds I get the error:
awk: /xcatpost/updateflag.awk:22: fatal: remote host and port information
(3002, installstatus booted) invalid
(It consistently happens now and I suspect it has something to do with a
modification to the postscripts table, which I've been doing for other
successfully rebuilds. I've backed out that change, but it hasn't resolved
the issue for me)
Anyways, I've traced this down to the /xcatpost/mypostscript.post file
missing any references to MASTER (plus other environment variables) and
this
causes the postscripts to fail to execute. Lsdef still shows a status of
"installing"...
Would anyone have some advice on what to check?
Thanks in advance,
Nate Matter, ARL/DSRC Local Support Team Lead Lockheed Martin, IS&GS -
Defense US Army Research Lab DOD Supercomputing Resource Center Aberdeen
Proving Ground, MD 21005
410-278-6942 (Office)
717-546-4043 (Cell)
410-278-8799 (Fax)
nathaniel.b.mat...@lmco.com; nathaniel.b.matter....@mail.mil
Classification: UNCLASSIFIED
Caveats: NONE
[attachment "smime.p7s" deleted by Lissa Valletta/Poughkeepsie/IBM]
----------------------------------------------------------------------------
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck Code
Sight
- the same software that powers the world's largest code search on Ohloh,
the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Classification: UNCLASSIFIED
Caveats: NONE
[attachment "smime.p7s" deleted by Linda Mellor/Poughkeepsie/IBM]
----------------------------------------------------------------------------
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck Code
Sight
- the same software that powers the world's largest code search on Ohloh,
the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
Classification: UNCLASSIFIED
Caveats: NONE
[attachment "smime.p7s" deleted by Linda Mellor/Poughkeepsie/IBM]
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user