On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko <khore...@virtuozzo.com> wrote: > > On 12/20/2018 07:39 PM, Jeffrey Walton wrote: > > > > I'm performing a post-mortem on our [failed] disaster recovery procedures. > > > > We have an OpenVZ-based CentOS 7 VM. We use it for an open source > > project website and wiki. Our backup job in /etc/cron.daily has not > > been executing (nor has other cron jobs, like yum-daily.cron). We > > cannot find mention of the failures in dmesg or other logs in > > /var/log. > > > > It looks like things broke sometime around December 2017 based on the > > date of our last backup. (It is embarrassing, but like I said there > > were no logged failures so I did not know to investigate). I don't > > keep change control logs, but the best I can tell our last two major > > configuration changes were: > > > > * Migrate OpenVZ 7.1 -> 7.2, June 2016 > > * Enable CentOS SCL, December 2017 > ... > unfortunately i have not heard about issues related with OpenVZ + SCL, > seems you are challenged to investigate it. > > i'd start with checking if cron service is run at all,
Thanks Konstantin. I tracked it down to a daily cron job. Backup ran for 7 seconds but did not log its error: Dec 19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting gdrive-backup Dec 19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished gdrive-backup Our site is static so a 7 second backup seems reasonable to me for an incremental. (https://www.cryptopp.com) In reality this is what was happening (from the command line): # duplicity --allow-source-mismatch ... sftp://XXXX:y...@zonk.example.com:22480/backup ... Failed: No module named paramiko There is a Paramiko in the original Python. However, I failed to install Paramiko for the SCL version of Python. And exercising duplicity from the command line failed to reveal the problem: # duplicity --version duplicity 0.7.18.2 In the end it looks like an exercise in why airplanes crash... 1. CentOS 7 ships with antique software - users have to do something special to get into a good state - users must enable SCL 2. SCL is missing software - users have to do something special to get into a good state - Components like Duplicity have to be built from sources 3. Linux paths are still broken - users have to do something special to get into a good state - 20 years or so and counting 4. Cron misreports job results - swallows exceptions and errors 5. User (me) configured machine incorrectly - SCL configuration was wrong 6. User (me) monitored machine incorrectly - Did not detect cron job failures I'd like to strangle the idiot who thought it was a good idea to allow Cron to swallow exceptions and allow things to silently fail. I bet that genius is a CTO of a Fortune 500 company. Jeff _______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users