Hi,

I have a theory about the sometimes -
Does the node have a bios entry of the form:
PostAttemptsLimit ?
This entry is supposed to reset bios to some defaults after a fixed number of failed POSTs. Not sure if relevant at all, as anything OS like should happen only after POST, so no POST failures should be detected, but bios reset of the NICs could cause the NetworkManager to fail...

The other, probably better, explanation, is that NetworkManager is buggy as hell and should have been erased before it begun. If there had been any way to disable NetworkManager at the core (i.e. from installation initrd, as a flag during install) that probably would have solved the issue. Simple check - try deploying an older (~rh 5.2) os which doesn't use networkmanager during installation, and see if the issue persists.

as for the first issue - I'm thinking of creating a simple rinstall wrapper that would first rsetboot net the node and then continue. I would suggest this as standard feature - I really can't see the downside.

Regards
On 02/08//2012 15:45, Gilad Berman wrote:
Hello,

We're installing a M4 machine using xCAT 2.7.3 (upgraded from 2.6.9) and xnba. OS is RH 6.2 we are facing two issues -

1. after successful installation OS uefi boot option added as the first in the boot sequence (before the pxe). after that the node will not boot to PXE and there is no option to re-install the node or actually do anything that require network boot. node will always boot to the OS.

2. When installing a node we *sometimes *get the following error  -

mounting /tmp as tmprunning install...
running /sbin/loader
detecting hardware...
waiting for hardware to initialize...
detecting hardware...
waiting for hardware to initialize...
Waiting for NetworkManager to configure eth0.
There was an error configuring your network interface.
This cannot be corrected in cmdline mode.
Halting.
Loader exited unexpectedly! Backtrace:
/sbin/loader[0x409c73]
/lib64/libc.so.6(exit+0xe2)[0x7f7d45fffdb2]
/sbin/loader[0x42028b]
/sbin/loader[0x420975]
/sbin/loader[0x41ba7f]
/sbin/loader[0x413e4b]
/sbin/loader[0x40d013]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f7d45fe8cdd]
/sbin/loader[0x4079c9]
install exited abnormally [1/1]
The system will be rebooted when you press Ctrl-C or Ctrl-Alt-Delete.



Sometime is in bold because sometimes it does work (on the same node) and the node successfully installed. strange.

Any ideas?

thx in advance.


Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: gil...@il.ibm.com
Tel:    972-3-9188262
Mobile: 972-52-2554262

The information contained in this email is being provided by IBM as a matter of courtesy and provided "AS-IS" without any direct and implied warranty; IBM assumes no liability. It is your responsibility to ensure that any resulting customer proposal has been correctly designed to meet your clients' requirements and to have an active review process which ensures an appropriate level of solution assurance is performed for all proposals. IBM does not take responsibility for the solution or solution assurance.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to