Re: [Users] OpenVZ,, CentOS SCL and failed cron jobs?
On Fri, Dec 21, 2018 at 5:03 AM Jeffrey Walton wrote: > > On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko > wrote: > > > > On 12/20/2018 07:39 PM, Jeffrey Walton wrote: > > > > > > I'm performing a post-mortem on our [failed] disaster recovery procedures. > > > > > > We have an OpenVZ-based CentOS 7 VM. We use it for an open source > > > project website and wiki. Our backup job in /etc/cron.daily has not > > > been executing (nor has other cron jobs, like yum-daily.cron). We > > > cannot find mention of the failures in dmesg or other logs in > > > /var/log. > > > > > > It looks like things broke sometime around December 2017 based on the > > > date of our last backup. (It is embarrassing, but like I said there > > > were no logged failures so I did not know to investigate). I don't > > > keep change control logs, but the best I can tell our last two major > > > configuration changes were: > > > > > > * Migrate OpenVZ 7.1 -> 7.2, June 2016 > > > * Enable CentOS SCL, December 2017 > > ... > > unfortunately i have not heard about issues related with OpenVZ + SCL, > > seems you are challenged to investigate it. > > > > i'd start with checking if cron service is run at all, > > Thanks Konstantin. > > I tracked it down to a daily cron job. Backup ran for 7 seconds but > did not log its error: > > Dec 19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting > gdrive-backup > Dec 19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished > gdrive-backup > > Our site is static so a 7 second backup seems reasonable to me for an > incremental. (https://www.cryptopp.com) > > In reality this is what was happening (from the command line): > > # duplicity --allow-source-mismatch ... > sftp://:y...@zonk.example.com:22480/backup > ... Failed: No module named paramiko > > There is a Paramiko in the original Python. However, I failed to > install Paramiko for the SCL version of Python. And exercising > duplicity from the command line failed to reveal the problem: > > # duplicity --version > duplicity 0.7.18.2 > > In the end it looks like an exercise in why airplanes crash... > > 1. CentOS 7 ships with antique software > - users have to do something special to get into a good state > - users must enable SCL > 2. SCL is missing software > - users have to do something special to get into a good state > - Components like Duplicity have to be built from sources > 3. Linux paths are still broken > - users have to do something special to get into a good state > - 20 years or so and counting > 4. Cron misreports job results > - swallows exceptions and errors > 5. User (me) configured machine incorrectly > - SCL configuration was wrong > 6. User (me) monitored machine incorrectly > - Did not detect cron job failures > > I'd like to strangle the idiot who thought it was a good idea to allow > Cron to swallow exceptions and allow things to silently fail. I bet > that genius is a CTO of a Fortune 500 company. Re: > There is a Paramiko in the original Python. However, I failed to > install Paramiko for the SCL version of Python. Looking at reports like https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/959089 , the problems are not a one-off problem for us. Its a chronic problem across distros that has not been fixed. Packages and software need to be in a good state. They have to "just work" out of the box. When are distros going to learn that RTFM does not work? If it was going to work it would have happened in the last 50 years or so. The engineers responsible for this mess meet the definition of insane. They keep doing the same thing over and over again expecting a different outcome. It is completely irrational behavior. (end gripe) Jeff ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ,, CentOS SCL and failed cron jobs?
On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko wrote: > > On 12/20/2018 07:39 PM, Jeffrey Walton wrote: > > > > I'm performing a post-mortem on our [failed] disaster recovery procedures. > > > > We have an OpenVZ-based CentOS 7 VM. We use it for an open source > > project website and wiki. Our backup job in /etc/cron.daily has not > > been executing (nor has other cron jobs, like yum-daily.cron). We > > cannot find mention of the failures in dmesg or other logs in > > /var/log. > > > > It looks like things broke sometime around December 2017 based on the > > date of our last backup. (It is embarrassing, but like I said there > > were no logged failures so I did not know to investigate). I don't > > keep change control logs, but the best I can tell our last two major > > configuration changes were: > > > > * Migrate OpenVZ 7.1 -> 7.2, June 2016 > > * Enable CentOS SCL, December 2017 > ... > unfortunately i have not heard about issues related with OpenVZ + SCL, > seems you are challenged to investigate it. > > i'd start with checking if cron service is run at all, Thanks Konstantin. I tracked it down to a daily cron job. Backup ran for 7 seconds but did not log its error: Dec 19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting gdrive-backup Dec 19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished gdrive-backup Our site is static so a 7 second backup seems reasonable to me for an incremental. (https://www.cryptopp.com) In reality this is what was happening (from the command line): # duplicity --allow-source-mismatch ... sftp://:y...@zonk.example.com:22480/backup ... Failed: No module named paramiko There is a Paramiko in the original Python. However, I failed to install Paramiko for the SCL version of Python. And exercising duplicity from the command line failed to reveal the problem: # duplicity --version duplicity 0.7.18.2 In the end it looks like an exercise in why airplanes crash... 1. CentOS 7 ships with antique software - users have to do something special to get into a good state - users must enable SCL 2. SCL is missing software - users have to do something special to get into a good state - Components like Duplicity have to be built from sources 3. Linux paths are still broken - users have to do something special to get into a good state - 20 years or so and counting 4. Cron misreports job results - swallows exceptions and errors 5. User (me) configured machine incorrectly - SCL configuration was wrong 6. User (me) monitored machine incorrectly - Did not detect cron job failures I'd like to strangle the idiot who thought it was a good idea to allow Cron to swallow exceptions and allow things to silently fail. I bet that genius is a CTO of a Fortune 500 company. Jeff ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users
Re: [Users] OpenVZ,, CentOS SCL and failed cron jobs?
On 12/20/2018 07:39 PM, Jeffrey Walton wrote: > Hi Everyone, > > I'm performing a post-mortem on our [failed] disaster recovery procedures. > > We have an OpenVZ-based CentOS 7 VM. We use it for an open source > project website and wiki. Our backup job in /etc/cron.daily has not > been executing (nor has other cron jobs, like yum-daily.cron). We > cannot find mention of the failures in dmesg or other logs in > /var/log. > > It looks like things broke sometime around December 2017 based on the > date of our last backup. (It is embarrassing, but like I said there > were no logged failures so I did not know to investigate). I don't > keep change control logs, but the best I can tell our last two major > configuration changes were: > > * Migrate OpenVZ 7.1 -> 7.2, June 2016 > * Enable CentOS SCL, December 2017 > > The SCL is Software > Collections,https://wiki.centos.org/AdditionalResources/Repositories/SCL > . We needed it because of the ancient versions of Apache, Python and > PHP provided with CentOS 7. > > My question is, is there a bad interaction or adverse relationship > with OpenVZ, SCL repos and cron? Hi Jeffrey, unfortunately i have not heard about issues related with OpenVZ + SCL, seems you are challenged to investigate it. i'd start with checking if cron service is run at all, its logs via "systemctl status crond.service", running crond binary under strace to check which exactly configuration files it reads, may be configure a very simple cron job like "echo 0 > /lalala" just to make sure it's executed at all (may be you run a, say, backup python script which requires new python, but correct PATH is not set). https://stackoverflow.com/questions/4984725/how-to-test-a-weekly-cron-job May be useful if you want to test daily jobs. Logs: most probably logging is just disabled by default, check /etc/rsyslog.conf for "cron.none". Hope that helps. -- Best regards, Konstantin Khorenko, Virtuozzo Linux Kernel Team ___ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users