Hi Christopher,
Thanks your answers.
I think you can put "confignics -s -r" in postbootscripts, not postscripts in your failed nodes definition.
I think you'd better upgrade xCAT. Since in xCAT 2.12.4, "confignics -s" did the actions "ifdown <installnic>, generate configure files, then ifup <installnic>", these actions made unstable to configure installnic in the postscripts stage.
Best Regards
--------------------------------------------------
Yuan Bai (白媛)
CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193
IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
--------------------------------------------------
Yuan Bai (白媛)
CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193
IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
----- Original message -----
From: Christopher Walker <c.j.wal...@qmul.ac.uk>
To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] confignics -s -r
Date: Thu, Mar 21, 2019 7:24 AM
On 20/03/2019 02:52, Yuan Y Bai wrote:
> Hi Christopher,
> Could you try to use "confignics -s -r" in postbootscripts?
We could, yes.
> In postscripts stage, "-r" is to shut down the NIC if it is on, and
> remove interface configuration at the same time, when it ifdown install
> NIC, it may cause unrealiabe.
It sounds like this may well be the issue.
Are you saying there's a potential race between the "-s" and the "-r"
options?
> In order to help us know what happened in your failed nodes, could you
> share the following information?
> You have 10 nodes successfully, and 17 failed, are all these nodes
> installing the same OS?
Yes.
Furthermore, they were all of the same hardware type plugged into the
same switches.
> Which OS do you use?
Centos 7.4
> We have different code
> logic for different OS.
> I think you want to use "-r" to "deconfigure other network cards", you
> mentioned there was only one network, so I think other network cards
> were not configured in postscripts stage,
Correct, though they get the default config from Centos - which is to
DHCP. We'd prefer that the config were removed - otherwise we
potentially end up with two IPs on the same network (though it's
probably sensible to disable the network ports too).
> is "confignics -s" enough
> here?
No, we wish to remove the config for the other nics.
We can, I guess put confignics -s in postcripts and confignics -r in
postbootscripts (or vice versa). Is that what you'd suggest?
> Do you have different comments here? Please feel freely to
> contact us, thanks.
> 10 ran it successfully
> 17 failed, so nodes still had a dhcp address
Yes indeed.
Thanks,
Chris
> Best Regards
> --------------------------------------------------
> Yuan Bai (白媛)
>
> CSTL HPC System Management Development
> Tel:86-10-82451401
> E-mail: by...@cn.ibm.com
> Address: IBM ZGC Campus. Ring Building 28,
> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
> Beijing P.R.China 100193
>
> IBM环宇大厦
> 北京市海淀区东北旺西路8号,中关村软件园28号楼
> 邮编:100193
>
> ----- Original message -----
> From: Christopher Walker <c.j.wal...@qmul.ac.uk>
> To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
> Cc:
> Subject: [xcat-user] confignics -s -r
> Date: Tue, Mar 19, 2019 7:25 PM
> We have a problem with "configics -s -r" not running reliably in a
> postscript.
>
> While we have some infiniband nodes, the majority use only one network
> for install and as the single network for the nodes.
>
> On node install, we wish to assign a static IP address on the install
> nic, and deconfigure other network cards.
>
> updatenode <nodename> confignics -s -r
>
>
> Does this just fine.
>
> However, it seems unreliable when run as a postscript. On a recent
> reinstall of 30 node:
>
> 10 ran it successfully
> 17 failed, so nodes still had a dhcp address
> 3 failed for other reasons (telling the bios which image to boot).
>
> I've no idea what causes this - could it be a race condition somewhere?
> If so, is there a timer I could increase to make it less likely to
> happen?
>
> The workaround is to run
> updatenode <nodename> confignics -s -r
>
> by hand afterwards.
>
> We are running a relatively old version of xCAT - 2.12.4 - and do plan
> to upgrade soon.
>
> Chris
>
> --
> Dr Christopher J. Walker
> ITS Research
> Queen Mary University of London, E1 4NS
> +44 20 7882 5969
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Dr Christopher J. Walker
ITS Research
Queen Mary University of London, E1 4NS
+44 20 7882 5969
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user