Hi Christopher,
 
Thanks your answers.
 
I think you can put "confignics -s -r" in postbootscripts, not postscripts in your failed nodes definition.
 
I think you'd better upgrade  xCAT. Since in xCAT 2.12.4, "confignics -s"  did the actions "ifdown <installnic>, generate configure files, then ifup <installnic>", these actions made unstable to configure installnic in the postscripts stage.
 
 
Best Regards
--------------------------------------------------
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193
 
 
----- Original message -----
From: Christopher Walker <c.j.wal...@qmul.ac.uk>
To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] confignics -s -r
Date: Thu, Mar 21, 2019 7:24 AM
 
On 20/03/2019 02:52, Yuan Y Bai wrote:
> Hi Christopher,
> Could you try to use "confignics -s -r" in postbootscripts?

We could, yes.

> In postscripts stage,   "-r" is to shut down the NIC if it is on, and
> remove interface configuration at the same time, when it ifdown install
> NIC, it may cause unrealiabe.

It sounds like this may well be the issue.

Are you saying there's a potential race between the "-s" and the "-r"
options?


> In order to help us know what happened in your failed nodes, could you
> share the following information?
> You have 10 nodes successfully, and 17 failed, are all these nodes
> installing the same OS?  

Yes.

Furthermore, they were all of the same hardware type plugged into the
same switches.

> Which OS do you use?  

Centos 7.4

> We have different code
> logic for different OS.
> I think you want to use "-r" to "deconfigure other network cards", you
> mentioned there was only one network, so I think other network cards
> were not configured in postscripts stage,

Correct, though they get the default config from Centos - which is to
DHCP. We'd prefer that the config were removed - otherwise we
potentially end up with two IPs on the same network (though it's
probably sensible to disable the network ports too).


> is "confignics -s" enough
> here?

No, we wish to remove the config for the other nics.

We can, I guess put confignics -s in postcripts and confignics -r in
postbootscripts (or vice versa). Is that what you'd suggest?

>  Do you have different comments here? Please feel freely to
> contact us, thanks.
>      10 ran it successfully
>      17 failed, so nodes still had a dhcp address

Yes indeed.

Thanks,

Chris


> Best Regards
> --------------------------------------------------
> Yuan Bai (白媛)
>
> CSTL HPC System Management Development
> Tel:86-10-82451401
> E-mail: by...@cn.ibm.com
> Address: IBM ZGC Campus. Ring Building 28,
> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
> Beijing P.R.China 100193
>
> IBM环宇大厦
> 北京市海淀区东北旺西路8号,中关村软件园28号楼
> 邮编:100193
>
>     ----- Original message -----
>     From: Christopher Walker <c.j.wal...@qmul.ac.uk>
>     To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
>     Cc:
>     Subject: [xcat-user] confignics -s -r
>     Date: Tue, Mar 19, 2019 7:25 PM
>     We have a problem with "configics -s -r" not running reliably in a
>     postscript.
>
>     While we have some infiniband nodes, the majority use only one network
>     for install and as the single network for the nodes.
>
>     On node install, we wish to assign a static IP address on the install
>     nic, and deconfigure other network cards.
>
>     updatenode <nodename> confignics -s -r
>
>
>     Does this just fine.
>
>     However, it seems unreliable when run as a postscript. On a recent
>     reinstall of 30 node:
>
>          10 ran it successfully
>          17 failed, so nodes still had a dhcp address
>          3 failed for other reasons (telling the bios which image to boot).
>
>     I've no idea what causes this - could it be a race condition somewhere?
>     If so, is there a timer I could increase to make it less likely to
>     happen?
>
>     The workaround is to run
>          updatenode <nodename> confignics -s -r
>
>     by hand afterwards.
>
>     We are running a relatively old version of xCAT - 2.12.4 - and do plan
>     to upgrade soon.
>
>     Chris
>
>     --
>     Dr Christopher J. Walker
>     ITS Research
>     Queen Mary University of London, E1 4NS
>     +44 20 7882 5969
>
>     _______________________________________________
>     xCAT-user mailing list
>     xCAT-user@lists.sourceforge.net
>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>


--
Dr Christopher J. Walker
ITS Research
Queen Mary University of London, E1 4NS
+44 20 7882 5969

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

 
 

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to