On 21/03/2019 02:37, Yuan Y Bai wrote: > Hi Christopher, > Thanks your answers. > I think you can put "confignics -s -r" in postbootscripts, not > postscripts in your failed nodes definition. > I think you'd better upgrade xCAT.
We plan to do this soon. > Since in xCAT 2.12.4, "confignics > -s" did the actions "ifdown <installnic>, generate configure files, > then ifup <installnic>", these actions made unstable to configure > installnic in the postscripts stage. Thanks. Is the instability something that just affects installnic, or does it affect other parts of the install too? Chris > Best Regards > -------------------------------------------------- > Yuan Bai (白媛) > > CSTL HPC System Management Development > Tel:86-10-82451401 > E-mail: by...@cn.ibm.com > Address: IBM ZGC Campus. Ring Building 28, > ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District, > Beijing P.R.China 100193 > > IBM环宇大厦 > 北京市海淀区东北旺西路8号,中关村软件园28号楼 > 邮编:100193 > > ----- Original message ----- > From: Christopher Walker <c.j.wal...@qmul.ac.uk> > To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net> > Cc: > Subject: Re: [xcat-user] confignics -s -r > Date: Thu, Mar 21, 2019 7:24 AM > On 20/03/2019 02:52, Yuan Y Bai wrote: > > Hi Christopher, > > Could you try to use "confignics -s -r" in postbootscripts? > > We could, yes. > > > In postscripts stage, "-r" is to shut down the NIC if it is on, and > > remove interface configuration at the same time, when it ifdown > install > > NIC, it may cause unrealiabe. > > It sounds like this may well be the issue. > > Are you saying there's a potential race between the "-s" and the "-r" > options? > > > > In order to help us know what happened in your failed nodes, > could you > > share the following information? > > You have 10 nodes successfully, and 17 failed, are all these nodes > > installing the same OS? > > Yes. > > Furthermore, they were all of the same hardware type plugged into the > same switches. > > > Which OS do you use? > > Centos 7.4 > > > We have different code > > logic for different OS. > > I think you want to use "-r" to "deconfigure other network > cards", you > > mentioned there was only one network, so I think other network cards > > were not configured in postscripts stage, > > Correct, though they get the default config from Centos - which is to > DHCP. We'd prefer that the config were removed - otherwise we > potentially end up with two IPs on the same network (though it's > probably sensible to disable the network ports too). > > > > is "confignics -s" enough > > here? > > No, we wish to remove the config for the other nics. > > We can, I guess put confignics -s in postcripts and confignics -r in > postbootscripts (or vice versa). Is that what you'd suggest? > > > Do you have different comments here? Please feel freely to > > contact us, thanks. > > 10 ran it successfully > > 17 failed, so nodes still had a dhcp address > > Yes indeed. > > Thanks, > > Chris > > > > Best Regards > > -------------------------------------------------- > > Yuan Bai (白媛) > > > > CSTL HPC System Management Development > > Tel:86-10-82451401 > > E-mail: by...@cn.ibm.com > > Address: IBM ZGC Campus. Ring Building 28, > > ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian > District, > > Beijing P.R.China 100193 > > > > IBM环宇大厦 > > 北京市海淀区东北旺西路8号,中关村软件园28号楼 > > 邮编:100193 > > > > ----- Original message ----- > > From: Christopher Walker <c.j.wal...@qmul.ac.uk> > > To: "xcat-user@lists.sourceforge.net" > <xcat-user@lists.sourceforge.net> > > Cc: > > Subject: [xcat-user] confignics -s -r > > Date: Tue, Mar 19, 2019 7:25 PM > > We have a problem with "configics -s -r" not running reliably > in a > > postscript. > > > > While we have some infiniband nodes, the majority use only > one network > > for install and as the single network for the nodes. > > > > On node install, we wish to assign a static IP address on the > install > > nic, and deconfigure other network cards. > > > > updatenode <nodename> confignics -s -r > > > > > > Does this just fine. > > > > However, it seems unreliable when run as a postscript. On a > recent > > reinstall of 30 node: > > > > 10 ran it successfully > > 17 failed, so nodes still had a dhcp address > > 3 failed for other reasons (telling the bios which image > to boot). > > > > I've no idea what causes this - could it be a race condition > somewhere? > > If so, is there a timer I could increase to make it less > likely to > > happen? > > > > The workaround is to run > > updatenode <nodename> confignics -s -r > > > > by hand afterwards. > > > > We are running a relatively old version of xCAT - 2.12.4 - > and do plan > > to upgrade soon. > > > > Chris > > > > -- > > Dr Christopher J. Walker > > ITS Research > > Queen Mary University of London, E1 4NS > > +44 20 7882 5969 > > > > _______________________________________________ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > > > > > > _______________________________________________ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > -- > Dr Christopher J. Walker > ITS Research > Queen Mary University of London, E1 4NS > +44 20 7882 5969 > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Dr Christopher J. Walker ITS Research Queen Mary University of London, E1 4NS +44 20 7882 5969 _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user