Hi,
I have a theory about the sometimes -
Does the node have a bios entry of the form:
PostAttemptsLimit ?
This entry is supposed to reset bios to some defaults after a fixed
number of failed POSTs.
Not sure if relevant at all, as anything OS like should happen only
after POST, so no POST failures should be detected, but bios reset of
the NICs could cause the NetworkManager to fail...
The other, probably better, explanation, is that NetworkManager is buggy
as hell and should have been erased before it begun. If there had been
any way to disable NetworkManager at the core (i.e. from installation
initrd, as a flag during install) that probably would have solved the
issue.
Simple check - try deploying an older (~rh 5.2) os which doesn't use
networkmanager during installation, and see if the issue persists.
as for the first issue - I'm thinking of creating a simple rinstall
wrapper that would first rsetboot net the node and then continue. I
would suggest this as standard feature - I really can't see the downside.
Regards
On 02/08//2012 15:45, Gilad Berman wrote:
Hello,
We're installing a M4 machine using xCAT 2.7.3 (upgraded from 2.6.9)
and xnba. OS is RH 6.2 we are facing two issues -
1. after successful installation OS uefi boot option added as the
first in the boot sequence (before the pxe). after that the node will
not boot to PXE and there is no option to re-install the node or
actually do anything that require network boot. node will always boot
to the OS.
2. When installing a node we *sometimes *get the following error -
mounting /tmp as tmprunning install...
running /sbin/loader
detecting hardware...
waiting for hardware to initialize...
detecting hardware...
waiting for hardware to initialize...
Waiting for NetworkManager to configure eth0.
There was an error configuring your network interface.
This cannot be corrected in cmdline mode.
Halting.
Loader exited unexpectedly! Backtrace:
/sbin/loader[0x409c73]
/lib64/libc.so.6(exit+0xe2)[0x7f7d45fffdb2]
/sbin/loader[0x42028b]
/sbin/loader[0x420975]
/sbin/loader[0x41ba7f]
/sbin/loader[0x413e4b]
/sbin/loader[0x40d013]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f7d45fe8cdd]
/sbin/loader[0x4079c9]
install exited abnormally [1/1]
The system will be rebooted when you press Ctrl-C or Ctrl-Alt-Delete.
Sometime is in bold because sometimes it does work (on the same node)
and the node successfully installed. strange.
Any ideas?
thx in advance.
Regards,
Gilad Berman
HPC Architect
IBM System & Technology Group. Israel
E-mail: gil...@il.ibm.com
Tel: 972-3-9188262
Mobile: 972-52-2554262
The information contained in this email is being provided by IBM as a
matter of courtesy and provided "AS-IS" without any direct and implied
warranty; IBM assumes no liability. It is your responsibility to
ensure that any resulting customer proposal has been correctly
designed to meet your clients' requirements and to have an active
review process which ensures an appropriate level of solution
assurance is performed for all proposals. IBM does not take
responsibility for the solution or solution assurance.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user