Just doing some post-crash detective work on our web server and have
attributed it to a possible race condition combined with not enough swap
space. The symptoms were the memory usage and load went through the roof
and eventually the kernel starts killing off processes at random to fix
the situation.
(avg 116 vs 0.4 normally) - nice of sendmail to report that in the logs
:-)
in /etc/cron.daily there's an 'rpm' and 'autorpm' job - autorpm is not
supplied with RedHat but's a standard add-on we typically use for keeping
systems up-to-date (instead of the official up2date utility - or apt-rpm).
cron.daily kicks off at the usual 4am and starts autorpm which has
# Start AutoRPM and tell it to wait up to 2 hours before actually
# looking for updates (backgrounds the process to avoid delaying
# other cron jobs)
/usr/sbin/autorpm --notty "auto --delay=7200" &
Now I looked in the code and to see what they mean by 'up to 2 hours' and
yeah, it's a random time between 0 and 2 hours effectively.
Also kicked off at 4am is the rpm job which has
rpm -qa --qf '%{name}-%{version}-%{release}.%{arch}.rpm\n' 2>&1 \
| sort > /var/log/rpmpkgs
Now as you know... rpm doesen't like 2 things accessing it's database at
the same time (at least writing it)... which may possibly explain this
email:-
***********************************************
AutoRPM 3.3 on <censored hostname> started Fri Feb 13 04:02:02 EST
2004
Delaying 4476 seconds...
Reading Auto-Ignore list... Done.
Comparing to locally installed RPMs
ERROR: rpm -qa is hanging! Running rpm --rebuilddb to fix...
I'm thinking maybe we'll just comment out the line in autorpm which calls
--rebuilddb and just deal with it manually if it ever occurs. Also, maybe
autorpm can possibly check if other things are accessing the rpm database
and/or by default delay a minimum of 10 minutes or something.
--
---<GRiP>---
Grant Parnell - senior consultant
EverythingLinux services - the consultant's backup & tech support.
Web: http://www.everythinglinux.com.au/services
We're also busybits.com.au and linuxhelp.com.au and elx.com.au.
Phone 02 8752 6622 to book service or discuss your needs.
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html