Self-imposed Race-condition Installation Pitfall

Nathan McGarvey Sat, 13 Feb 2021 19:17:21 -0800

All/any,
    I just spent an unfortunate amount of time debugging a CloudStack
install (4.15) in what ended up being a self-caused race condition and
just wanted others to be aware in case they ran into the same issue:


   I set up a [very ugly] ansible playbook to install the 4.15
management server on a clean CentOS 8 install (yes, I know that is EOL
at the end of this year)

   But I did something too fast and caused a race condition:

   I rebooted after the "cloudstack-setup-management" command completed.

   It turns out that I was interrupting the database upgrade scripts and
would reboot to a non-functional manager with weird and
non-deterministic errors when it would try to re-upgrade the database.
(Lots of duplicate column, foreign key constraint, etc. errors.) The
errors only depended on how far the migrations has gotten before the
reboot occurred.

   It was really hard to track down just because I assumed that once
"cloudstack-setup-management" was done, that implied that it was in a
stable state for service stops/restarts and reboots. Obviously I was wrong.

   Hopefully someone finds this useful and may save time/frustration by
being patient and watching the log file until all the database upgrade
steps are complete.


Thanks,
-Nathan McGarvey

Self-imposed Race-condition Installation Pitfall

Reply via email to