Hi Christopher,
When the node is installed successfully, the node status is "booted", you can use command "lsdef <node> -i status" to check the status. If "confignics -s -r" is failed in postscripts or postbootscripts, the status will not be booted.
As you mentioned "17 failed nodes have DHCP network, and you can use updatenode to work around", I think your OS provision was finished. You can find all xCAT related logs under management node /var/log/xcat directory, the provision log is named computes.log*.
You can also use xcatdebugmode to debug problems, especially on OS provision, using command "chdef -t site xcatdebugmode=1" to enable basic debug mode, after you do some actions, you can find logs under /var/log/xcat.
Here is related doc:https://xcat-docs.readthedocs.io/en/stable/troubleshooting/index.html
"confignics -s -r" is to configure installnic and clear up other NICs configuration, so scripts behind "confignics -s -r" in postscripts or postbootscripts, which had functions related NICs or network may be affected here. You can look at the logs.
Do you have other problems during installation? If you can provide us more information, we can give you a better judgement.
Thanks.
Best Regards
--------------------------------------------------
Yuan Bai (白媛)
CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193
IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
--------------------------------------------------
Yuan Bai (白媛)
CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193
IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
----- Original message -----
From: Christopher Walker <c.j.wal...@qmul.ac.uk>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Yuan Y Bai <by...@cn.ibm.com>
Cc:
Subject: Re: [xcat-user] confignics -s -r
Date: Mon, Mar 25, 2019 6:22 PM
On 21/03/2019 02:37, Yuan Y Bai wrote:
> Hi Christopher,
> Thanks your answers.
> I think you can put "confignics -s -r" in postbootscripts, not
> postscripts in your failed nodes definition.
> I think you'd better upgrade xCAT.
We plan to do this soon.
> Since in xCAT 2.12.4, "confignics
> -s" did the actions "ifdown <installnic>, generate configure files,
> then ifup <installnic>", these actions made unstable to configure
> installnic in the postscripts stage.
Thanks. Is the instability something that just affects installnic, or
does it affect other parts of the install too?
Chris
> Best Regards
> --------------------------------------------------
> Yuan Bai (白媛)
>
> CSTL HPC System Management Development
> Tel:86-10-82451401
> E-mail: by...@cn.ibm.com
> Address: IBM ZGC Campus. Ring Building 28,
> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
> Beijing P.R.China 100193
>
> IBM环宇大厦
> 北京市海淀区东北旺西路8号,中关村软件园28号楼
> 邮编:100193
>
> ----- Original message -----
> From: Christopher Walker <c.j.wal...@qmul.ac.uk>
> To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
> Cc:
> Subject: Re: [xcat-user] confignics -s -r
> Date: Thu, Mar 21, 2019 7:24 AM
> On 20/03/2019 02:52, Yuan Y Bai wrote:
> > Hi Christopher,
> > Could you try to use "confignics -s -r" in postbootscripts?
>
> We could, yes.
>
> > In postscripts stage, "-r" is to shut down the NIC if it is on, and
> > remove interface configuration at the same time, when it ifdown
> install
> > NIC, it may cause unrealiabe.
>
> It sounds like this may well be the issue.
>
> Are you saying there's a potential race between the "-s" and the "-r"
> options?
>
>
> > In order to help us know what happened in your failed nodes,
> could you
> > share the following information?
> > You have 10 nodes successfully, and 17 failed, are all these nodes
> > installing the same OS?
>
> Yes.
>
> Furthermore, they were all of the same hardware type plugged into the
> same switches.
>
> > Which OS do you use?
>
> Centos 7.4
>
> > We have different code
> > logic for different OS.
> > I think you want to use "-r" to "deconfigure other network
> cards", you
> > mentioned there was only one network, so I think other network cards
> > were not configured in postscripts stage,
>
> Correct, though they get the default config from Centos - which is to
> DHCP. We'd prefer that the config were removed - otherwise we
> potentially end up with two IPs on the same network (though it's
> probably sensible to disable the network ports too).
>
>
> > is "confignics -s" enough
> > here?
>
> No, we wish to remove the config for the other nics.
>
> We can, I guess put confignics -s in postcripts and confignics -r in
> postbootscripts (or vice versa). Is that what you'd suggest?
>
> > Do you have different comments here? Please feel freely to
> > contact us, thanks.
> > 10 ran it successfully
> > 17 failed, so nodes still had a dhcp address
>
> Yes indeed.
>
> Thanks,
>
> Chris
>
>
> > Best Regards
> > --------------------------------------------------
> > Yuan Bai (白媛)
> >
> > CSTL HPC System Management Development
> > Tel:86-10-82451401
> > E-mail: by...@cn.ibm.com
> > Address: IBM ZGC Campus. Ring Building 28,
> > ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian
> District,
> > Beijing P.R.China 100193
> >
> > IBM环宇大厦
> > 北京市海淀区东北旺西路8号,中关村软件园28号楼
> > 邮编:100193
> >
> > ----- Original message -----
> > From: Christopher Walker <c.j.wal...@qmul.ac.uk>
> > To: "xcat-user@lists.sourceforge.net"
> <xcat-user@lists.sourceforge.net>
> > Cc:
> > Subject: [xcat-user] confignics -s -r
> > Date: Tue, Mar 19, 2019 7:25 PM
> > We have a problem with "configics -s -r" not running reliably
> in a
> > postscript.
> >
> > While we have some infiniband nodes, the majority use only
> one network
> > for install and as the single network for the nodes.
> >
> > On node install, we wish to assign a static IP address on the
> install
> > nic, and deconfigure other network cards.
> >
> > updatenode <nodename> confignics -s -r
> >
> >
> > Does this just fine.
> >
> > However, it seems unreliable when run as a postscript. On a
> recent
> > reinstall of 30 node:
> >
> > 10 ran it successfully
> > 17 failed, so nodes still had a dhcp address
> > 3 failed for other reasons (telling the bios which image
> to boot).
> >
> > I've no idea what causes this - could it be a race condition
> somewhere?
> > If so, is there a timer I could increase to make it less
> likely to
> > happen?
> >
> > The workaround is to run
> > updatenode <nodename> confignics -s -r
> >
> > by hand afterwards.
> >
> > We are running a relatively old version of xCAT - 2.12.4 -
> and do plan
> > to upgrade soon.
> >
> > Chris
> >
> > --
> > Dr Christopher J. Walker
> > ITS Research
> > Queen Mary University of London, E1 4NS
> > +44 20 7882 5969
> >
> > _______________________________________________
> > xCAT-user mailing list
> > xCAT-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xcat-user
> >
> >
> >
> >
> > _______________________________________________
> > xCAT-user mailing list
> > xCAT-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xcat-user
> >
>
>
> --
> Dr Christopher J. Walker
> ITS Research
> Queen Mary University of London, E1 4NS
> +44 20 7882 5969
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Dr Christopher J. Walker
ITS Research
Queen Mary University of London, E1 4NS
+44 20 7882 5969
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user