Classification: UNCLASSIFIED
Caveats: NONE

Thanks so much for your suggestions.  I did check httpd log and found one of
the new postscripts that I created was getting permission denied.  My
default umask is set restrictive so that script
(/install/postscripts/install_nvidia_driver) was set to 700.  (NOTE: this
script wasn't currently listed in the postscripts section as I wasn't ready
to add it ... )

Anyways, I changed the script permissions to 755 and kicked off another
rebuild, which completed successfully.

So... it appears all /install/postscripts/* need to be readable by world?? 

-nate

-----Original Message-----
From: Linda Mellor [mailto:mel...@us.ibm.com] 
Sent: Tuesday, July 22, 2014 11:58 AM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)

Ok.  So, after install, it looks like the node is not able to communicate at
all with the xcatd daemon on your management node.  

On your management node, the /install/autoinst/gpu4 file (your kickstart
config file for the node) has a '%post' section.  When this runs on your
node, it will do something along the lines of (this may not be exactly the
same for your version of xCAT, but should be somewhat similar):

- set your node's hostname ==> did that work?
- wget a copy of all the postscripts from your xCAT MN and put them in the
/xcatpost directory  ===> do you have a full set of postscripts?  check /tmp
on the failing node for wget errors.  check /var/log/httpd/error_log and
access_log on the MN for responses to the wget request
-  depending on a few different factors, this may bring down an initial copy
of /xcatpost/mypostscript.  If it does not, it will run
/xcatpost/getpostscript.awk to bring down a copy, in which case you should
see a msg in /var/log/messages on the MN about xcatd receiving the
getpostscript request from that node.
- if the mypostscript came down correctly, the file /opt/xcat/xcatinfo
should be created on the node, and contain your xCAT MN as XCATSERVER=...
===> did you get this far??  is there anything else in your xcatinfo file?
- adds in a bunch of subroutines, etc., to /xcatpost/mypostscript   ===> you
probably see all of those reqardless of what went on before
- creates /etc/init.d/xcatpostinit1  to run postscripts after the reboot
===> I'm sure this got created and was run on reboot based on where you saw
failures happening
- creates /opt/xcat/xcatinstallpost  which gets run by xcatpostinit1  ===>
again, sure this got created and run based on your failures
- adds the 'updatefag.awk $MASTER 3002 installstatus booted' cmd to
/xcatpost/mypostscript.post   <<at this point it is the literal string
'$MASTER', not the substituted value>>
- creates /opt/xcat/xcatdsklspost
- runs /xcatpost/mypostscript to run your node's postscripts.  In your case
it should be:  syslog,remoteshell,syncfiles,otherpkgs,setupntp


Depending on what did not work correctly above, a few common things to
check:
- make sure your node's networking is set up correctly and it can ping your
MN as you expect
- can you ssh to the node from the MN without a passord?  (i.e. the
remoteshell postscript ran correctly)
- is name resolution working well?   on the node, can you resolve the MN
hostname and get the correct IP?

You can try to run updateflag manually to see where things may be breaking
down: 

        - on the node: 

                /xcatpost/updateflag.awk <your MN IP> 3002 installstatus
testing



Finally, we did notice that you are installing RHELS 6.5 using xCAT 2.8.1.
We did not get our formal xCAT support for RH 6.5 in until our latest xCAT
2.8.4, so there may be some glitches you are running into with that.


Linda

Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014
02:52:43 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel B
CTR (US)" ---07/21/2014 02:52:43 PM---Classification: UNCLASSIFIED Caveats:
NONE

From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date: 07/21/2014 02:52 PM
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)

________________________________




Classification: UNCLASSIFIED
Caveats: NONE

No, on failing node - /var/log/xcat doesn't exist and no errors for xcatd on
management node.

After the kickstart and reboot, it errors on /xcatpost/updateflag.awk (and
doesn't run any of the postscripts to complete the setup)

Initially, gpu4 wasn't defined in the postscripts table.  For testing, I
disabled the gpu group (that gpu4 is a member of) and manually added a
record to the table for gpu4.  Was that the correct way to do that?  The
very first rebuild to 6.5 actually worked and ran the postscripts, but when
I kicked it off a second time with some additional postscripts for testing,
it started failing with this error.

[root@admin ~]# nodels gpu4 groups
gpu4: ipmi,gpu,all

(tabdump of gpu related postscripts)
#node,postscripts,postbootscripts,comments,disable
"xcatdefaults","syslog,remoteshell,syncfiles,otherpkgs",,,
"gpu","nvidia_drvr_install,setupntp","mlnxofed_ib_install,configiba,gpfs_upd
ates",,"1"
"gpu4","setupntp","gpfs_yum_update,install_torque_client,install_ofed_driver
s,configiba,gpfs_setup,csa_harden,syncfiles",,

-----Original Message-----
From: Linda Mellor [mailto:mel...@us.ibm.com]
Sent: Monday, July 21, 2014 2:24 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)

On the failing node, do you see anything in /var/log/xcat/xcat.log that
might give you some clues?

Also, on the management node, any xcatd errors showing up in
/var/log/messages?


Linda


Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014
02:02:12 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel B
CTR (US)" ---07/21/2014 02:02:12 PM---Classification: UNCLASSIFIED Caveats:
NONE

From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date: 07/21/2014 02:02 PM
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)

________________________________




Classification: UNCLASSIFIED
Caveats: NONE

I've primarily been using the documentation from the wiki site as a
reference and then the man pages.

Master is defined in the site table, and I've rebuilt other nodes previous
to this without making any changes to that table.

I've noticed on the failing node: /xcatpost/mypostscript.post doesn't list
any postscripts or environment variables (like MASTER)

[root@admin ~]# lsxcatd -a
Version 2.8.1 (svn r16213, built Tue May  7 22:55:07 EDT 2013) This is a
Management Node dbengine=SQLite

[root@admin ~]# lsdef gpu4
Object name: gpu4
  arch=x86_64
  bmc=gpu4-imm
  chain=runcmd=bmcsetup,shell
  currchain=boot
  currstate=install rhels6.5-x86_64-gpu.rhels6-5
  groups=ipmi,gpu,all
  initrd=xcat/osimage/gpu-6.5/initrd.img
  installnic=mac
  kcmdline=quiet repo=http://10.10.100.254:80/install/rhels6.5/x86_64
ks=http://10.10.100.254:80/install/autoinst/gpu4 ksdevice=40:f2:e9:03:a2:b4
cmdline console=tty0 console=ttyS0,115200n8r
  kernel=xcat/osimage/gpu-6.5/vmlinuz
  mac=40:f2:e9:03:a2:b4
  mgt=ipmi
  mtm=7912AC1
  netboot=xnba
  nodetype=osi
  ondiscover=nodediscover
  os=rhels6.5

postbootscripts=gpfs_yum_update,install_torque_client,install_ofed_drivers,c
onfigiba,gpfs_setup,csa_harden,syncfiles
  postscripts=syslog,remoteshell,syncfiles,otherpkgs,setupntp
  profile=gpu.rhels6-5
  provmethod=gpu-6.5
  serial=KQ9MN99
  serialflow=hard
  serialport=0
  serialspeed=115200
  status=installing
  statustime=07-21-2014 08:35:35
  switch=bnt8000
  switchport=11
  updatestatus=synced
  updatestatustime=07-08-2014 12:20:11
  xcatmaster=10.10.100.254



-----Original Message-----
From: Lissa Valletta [mailto:lis...@us.ibm.com]
Sent: Monday, July 21, 2014 1:22 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)

What level of xCAT are you using. 
Give us an lsdef <nodename>  of one of the nodes that fails. 

Many of the environment variables like MASTER come from the site table. 
tabdump site and see if master is defined.   It should be the ip or hostname
as known by the node you are installing. 

What document have you been following to setup xCAT?



Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



Inactive hide details for "Matter, Nathaniel B CTR (US)" ---07/21/2014
12:45:41 PM---Classification: UNCLASSIFIED Caveats: NONE"Matter, Nathaniel B
CTR (US)" ---07/21/2014 12:45:41 PM---Classification: UNCLASSIFIED Caveats:
NONE

From: "Matter, Nathaniel B CTR (US)" <nathaniel.b.matter....@mail.mil>
To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
Cc: "Motsko, Mark J CTR USARMY ARL \(US\)" <mark.j.motsko....@mail.mil>
Date: 07/21/2014 12:45 PM
Subject: [xcat-user] /xcatpost/updateflag.awk:22: fatal: remote host and
port information (3002, installstatus booted) invalid (UNCLASSIFIED)

________________________________




Classification: UNCLASSIFIED
Caveats: NONE

Hi All.

I'm a new xcat user (please note this) and have run into an issue that's I'm
unsure how to resolve.

I've been rebuilding nodes successfully (upgrading to rhels6.5) and then for
one of the rebuilds I get the error:

awk: /xcatpost/updateflag.awk:22: fatal: remote host and port information
(3002, installstatus booted) invalid

(It consistently happens now and I suspect it has something to do with a
modification to the postscripts table, which I've been doing for other
successfully rebuilds.  I've backed out that change, but it hasn't resolved
the issue for me)

Anyways, I've traced this down to the /xcatpost/mypostscript.post file
missing any references to MASTER (plus other environment variables) and this
causes the postscripts to fail to execute.  Lsdef still shows a status of
"installing"...

Would anyone have some advice on what to check?

Thanks in advance,

Nate Matter, ARL/DSRC Local Support Team Lead Lockheed Martin, IS&GS -
Defense US Army Research Lab DOD Supercomputing Resource Center Aberdeen
Proving Ground, MD 21005
410-278-6942 (Office)
717-546-4043 (Cell)
410-278-8799 (Fax)
nathaniel.b.mat...@lmco.com; nathaniel.b.matter....@mail.mil



Classification: UNCLASSIFIED
Caveats: NONE


[attachment "smime.p7s" deleted by Lissa Valletta/Poughkeepsie/IBM]
----------------------------------------------------------------------------
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck Code Sight
- the same software that powers the world's largest code search on Ohloh,
the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



Classification: UNCLASSIFIED
Caveats: NONE


[attachment "smime.p7s" deleted by Linda Mellor/Poughkeepsie/IBM]
----------------------------------------------------------------------------
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck Code Sight
- the same software that powers the world's largest code search on Ohloh,
the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



Classification: UNCLASSIFIED
Caveats: NONE


[attachment "smime.p7s" deleted by Linda Mellor/Poughkeepsie/IBM]
----------------------------------------------------------------------------
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck Code Sight
- the same software that powers the world's largest code search on Ohloh,
the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



Classification: UNCLASSIFIED
Caveats: NONE


Attachment: smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to