Re: [Users] OpenVZ,, CentOS SCL and failed cron jobs?

2018-12-21 Thread Jeffrey Walton
On Fri, Dec 21, 2018 at 5:03 AM Jeffrey Walton  wrote:
>
> On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko
>  wrote:
> >
> > On 12/20/2018 07:39 PM, Jeffrey Walton wrote:
> > >
> > > I'm performing a post-mortem on our [failed] disaster recovery procedures.
> > >
> > > We have an OpenVZ-based CentOS 7 VM. We use it for an open source
> > > project website and wiki. Our backup job in /etc/cron.daily has not
> > > been executing (nor has other cron jobs, like yum-daily.cron). We
> > > cannot find mention of the failures in dmesg or other logs in
> > > /var/log.
> > >
> > > It looks like things broke sometime around December 2017 based on the
> > > date of our last backup. (It is embarrassing, but like I said there
> > > were no logged failures so I did not know to investigate). I don't
> > > keep change control logs, but the best I can tell our last two major
> > > configuration changes were:
> > >
> > > * Migrate OpenVZ 7.1 -> 7.2, June 2016
> > > * Enable CentOS SCL, December 2017
> > ...
> > unfortunately i have not heard about issues related with OpenVZ + SCL,
> > seems you are challenged to investigate it.
> >
> > i'd start with checking if cron service is run at all,
>
> Thanks Konstantin.
>
> I tracked it down to a daily cron job. Backup ran for 7 seconds but
> did not log its error:
>
> Dec  19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting
> gdrive-backup
> Dec  19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished
> gdrive-backup
>
> Our site is static so a 7 second backup seems reasonable to me for an
> incremental. (https://www.cryptopp.com)
>
> In reality this is what was happening (from the command line):
>
> # duplicity --allow-source-mismatch ...
> sftp://:y...@zonk.example.com:22480/backup
> ... Failed: No module named paramiko
>
> There is a Paramiko in the original Python. However, I failed to
> install Paramiko for the SCL version of Python. And exercising
> duplicity from the command line failed to reveal the problem:
>
> # duplicity --version
> duplicity 0.7.18.2
>
> In the end it looks like an exercise in why airplanes crash...
>
>   1. CentOS 7 ships with antique software
>   - users have to do something special to get into a good state
>   - users must enable SCL
>   2. SCL is missing software
>   - users have to do something special to get into a good state
>   - Components like Duplicity have to be built from sources
>   3. Linux paths are still broken
>   - users have to do something special to get into a good state
>   - 20 years or so and counting
>   4. Cron misreports job results
>   - swallows exceptions and errors
>   5. User (me) configured machine incorrectly
>   - SCL configuration was wrong
>   6. User (me) monitored machine incorrectly
>   - Did not detect cron job failures
>
> I'd like to strangle the idiot who thought it was a good idea to allow
> Cron to swallow exceptions and allow things to silently fail. I bet
> that genius is a CTO of a Fortune 500 company.

Re:

> There is a Paramiko in the original Python. However, I failed to
> install Paramiko for the SCL version of Python.

Looking at reports like
https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/959089 , the
problems are not a one-off problem for us. Its a chronic problem
across distros that has not been fixed.

Packages and software need to be in a good state. They have to "just
work" out of the box. When are distros going to learn that RTFM does
not work? If it was going to work it would have happened in the last
50 years or so.

The engineers responsible for this mess meet the definition of insane.
They keep doing the same thing over and over again expecting a
different outcome. It is completely irrational behavior.

(end gripe)

Jeff
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ,, CentOS SCL and failed cron jobs?

2018-12-21 Thread Jeffrey Walton
On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko
 wrote:
>
> On 12/20/2018 07:39 PM, Jeffrey Walton wrote:
> >
> > I'm performing a post-mortem on our [failed] disaster recovery procedures.
> >
> > We have an OpenVZ-based CentOS 7 VM. We use it for an open source
> > project website and wiki. Our backup job in /etc/cron.daily has not
> > been executing (nor has other cron jobs, like yum-daily.cron). We
> > cannot find mention of the failures in dmesg or other logs in
> > /var/log.
> >
> > It looks like things broke sometime around December 2017 based on the
> > date of our last backup. (It is embarrassing, but like I said there
> > were no logged failures so I did not know to investigate). I don't
> > keep change control logs, but the best I can tell our last two major
> > configuration changes were:
> >
> > * Migrate OpenVZ 7.1 -> 7.2, June 2016
> > * Enable CentOS SCL, December 2017
> ...
> unfortunately i have not heard about issues related with OpenVZ + SCL,
> seems you are challenged to investigate it.
>
> i'd start with checking if cron service is run at all,

Thanks Konstantin.

I tracked it down to a daily cron job. Backup ran for 7 seconds but
did not log its error:

Dec  19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting
gdrive-backup
Dec  19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished
gdrive-backup

Our site is static so a 7 second backup seems reasonable to me for an
incremental. (https://www.cryptopp.com)

In reality this is what was happening (from the command line):

# duplicity --allow-source-mismatch ...
sftp://:y...@zonk.example.com:22480/backup
... Failed: No module named paramiko

There is a Paramiko in the original Python. However, I failed to
install Paramiko for the SCL version of Python. And exercising
duplicity from the command line failed to reveal the problem:

# duplicity --version
duplicity 0.7.18.2

In the end it looks like an exercise in why airplanes crash...

  1. CentOS 7 ships with antique software
  - users have to do something special to get into a good state
  - users must enable SCL
  2. SCL is missing software
  - users have to do something special to get into a good state
  - Components like Duplicity have to be built from sources
  3. Linux paths are still broken
  - users have to do something special to get into a good state
  - 20 years or so and counting
  4. Cron misreports job results
  - swallows exceptions and errors
  5. User (me) configured machine incorrectly
  - SCL configuration was wrong
  6. User (me) monitored machine incorrectly
  - Did not detect cron job failures

I'd like to strangle the idiot who thought it was a good idea to allow
Cron to swallow exceptions and allow things to silently fail. I bet
that genius is a CTO of a Fortune 500 company.

Jeff
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] OpenVZ,, CentOS SCL and failed cron jobs?

2018-12-21 Thread Konstantin Khorenko
On 12/20/2018 07:39 PM, Jeffrey Walton wrote:
> Hi Everyone,
>
> I'm performing a post-mortem on our [failed] disaster recovery procedures.
>
> We have an OpenVZ-based CentOS 7 VM. We use it for an open source
> project website and wiki. Our backup job in /etc/cron.daily has not
> been executing (nor has other cron jobs, like yum-daily.cron). We
> cannot find mention of the failures in dmesg or other logs in
> /var/log.
>
> It looks like things broke sometime around December 2017 based on the
> date of our last backup. (It is embarrassing, but like I said there
> were no logged failures so I did not know to investigate). I don't
> keep change control logs, but the best I can tell our last two major
> configuration changes were:
>
> * Migrate OpenVZ 7.1 -> 7.2, June 2016
> * Enable CentOS SCL, December 2017
>
> The SCL is Software
> Collections,https://wiki.centos.org/AdditionalResources/Repositories/SCL
>  . We needed it because of the ancient versions of Apache, Python and
> PHP provided with CentOS 7.
>
> My question is, is there a bad interaction or adverse relationship
> with OpenVZ, SCL repos and cron?

Hi Jeffrey,

unfortunately i have not heard about issues related with OpenVZ + SCL,
seems you are challenged to investigate it.

i'd start with checking if cron service is run at all,
its logs via "systemctl status crond.service",
running crond binary under strace to check which exactly configuration files it 
reads,
may be configure a very simple cron job like "echo 0 > /lalala" just to make 
sure it's executed at all
(may be you run a, say, backup python script which requires new python, but 
correct PATH is not set).

https://stackoverflow.com/questions/4984725/how-to-test-a-weekly-cron-job
May be useful if you want to test daily jobs.

Logs: most probably logging is just disabled by default,
check /etc/rsyslog.conf for "cron.none".

Hope that helps.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users