In addition to Daniel's response, I would like to add more information. I work
with Daniel and his mail is related to me as well, of course.
While debugging the issue, we added a lot of debug messages to the postscripts
and postbootscripts, in order to determine the exact step that the installation
gets stuck on. These debug messages are written to a file, which is accessible
through SSH (the node is accessible via SSH while running the postscripts).
While examining the file during the installation, we can see that it fills up
with debug messages. We have a debug message on each script's beginning and
end. The last debug message is the ending of the latest script of
"postscripts". Since the node does not reboot, we don't reach the
postbootscripts at all.
In addition to that, by examining the output of "lsdef <node> -i currchain"
during the hang, we get "boot". When trying to forcibly reboot the node, it
just boots up as expected, runs the postbootscripts and finishes. It seems that
the node does acknowledge its state to xCAT, but it does not reboot.
As Daniel stated in his previous mail, turning on "xcatdebugmode" on the "site"
table causes the whole process to work.
We would like to understand what is causing this problem.
Thanks
On Jan 8, 2018 16:27, Daniel Letai <d...@letai.org.il> wrote:
I can ssh to the node, but when trying to ssh back from the node to the xcat
server it requires a password.
ssh_keys is set to postbootscripts. Should I move it to postscripts?
On 01/01//2018 17:36, Russ Auld wrote:
Ensure that the node can ssh back to the MN in the anaconda environment. The
updateflag.awk script can hang trying to update the node's status at the end of
postscripts.
On Jan 1, 2018 9:32 AM, Daniel Letai
<d...@letai.org.il><mailto:d...@letai.org.il> wrote:
Hello,
I have encountered a strange issue where sending any node to rinstall "hangs"
after finishing the postscripts - it never reboots, and therefore never
continue to the postbootscripts.
Trying to diagnose the issue led to the strange bit.
Setting xcatdebugmode=1 in site table SOLVED the issue while still not showing
any error in any log.
We have verified this is indeed the case - setting it to 0 reverts to non
functioning rinstall, re-setting to 1 and rinstall works without an issue.
We would like to work without debugmode - what might be the issue and how can
we solve this?
xCAT version - 2.13.8
xCAT node OS - RHEL 7.4
Nodes OS - RHEL 6.5
Thanks,
Daniel Letai
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user