Nodels <noderange> nodehm.serialflow
?
From: Rundall, Jacob D [mailto:rund...@illinois.edu]
Sent: Thursday, July 27, 2017 10:12 AM
To: xCAT Users Mailing list
Subject: [xcat-user] stateful nodes won't boot after IMM network settings are
changed or IMM is put on different network
I ran into something that seems strange to me. I was workig with a few Lenovo
System x3650 M5 nodes yesterday that were deployed with xCAT. They’re running
CentOS 7. In order to do some testing I need to move them over to some
different networks:
- remove public/routed network from the OS;
- move the OS to new, unrouted management network (for SSH & deployment), with
a different IP address, without any xCAT servers;
- move the IMM to new, unrouted service/IPMI network, with a different IP
address, also without any xCAT servers.
Also note that I elected to keep their current stateful OS installation rather
than redeploying them.
After I did this with the first node I found that it wouldn’t fully boot
anymore. Specifically it attempted to boot from the on-disk OS, proceeded
through kernel selection, but then would get stuck with “Probing EDD”… showing.
(It likely was proceeding past this but just not showing anything else on the
crash cart display, more on that later.)
Through some other experimentation I found that any of the following conditions
would prevent the node from booting in the same way:
1) without reconfiguring the IMM’s network settings, unplugging the network
cable
2) without reconfiguring the IMM’s network settings, connecting the IMM to a
switch port on the new service/IPMI network
3) reconfiguring the IP of the IMM (or resetting to factory then reconfiguring
on a new IP)
-- this case applies even if I leave the machine connected to our original,
production service/IPMI network
But in any case, if I make sure the IMM’s IP is reverted (or doesn’t change)
and connect it back up to our production service/IPMI network, then the node
will boot again.
I guess my questions are as follow: Is there something on an xCAT-provisioned
node (perhaps specific to our hardware, perhaps generally speaking) that
requires the IMM/BMC to be not have its IP and/or network changed in order to
complete its boot? Is there some kind of communication between the OS and the
IMM/BMC that depends on the network connectivity of the IMM/BMC? Is
communication with the xCAT master involved (remember the xCAT master is on the
production service/IPMI network but not on the new test service/IPMI network)?
And are there any ways around this issue?
A few more details:
A) I did disable EDD probing on one of these machines and found that it
actually still got stuck (this time with a flashing cursor), so in the previous
cases, “Probing EDD”… was simply the last thing that showed on the (crash cart)
screen before it got stuck. I also configured tty0 as console on that node to
get some more verbose output about what was occurring. Unfortunately this
output didn’t appear to make it into /var/log/messages like it seems to on a
successful boot (probably because the machine didn’t get far enough along in
the boot process to pass it on). But here is a photo:
https://uofi.box.com/s/dt16qvigbgtp0gbz2t0b26m41huv7wo7
I checked quite a few of the lines that appear here and they seem to show up in
/var/log/messages after a successful boot as well, so I haven’t uncovered
anything that is indicative of the failure I’m seeing. Perhaps there’s
something that’s not showing that is telling but I don’t know what it is.
B) Rebuilding plain CentOS 7 from a USB drive allows the nodes to boot with new
IMM network settings, with IMMs on the new service/IPMI network.
Thanks much,
Jake Rundall
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user