Hello there! I am running a daemon on 3 machines running ubuntu 11.04 server and it gets restarted on all of them at approximately the same time.
$ grep 'Restart' /tmp/celeryd.log
[2011-10-22 06:45:12,376: WARNING/Beat] Restarting celeryd
(/usr/bin/celeryd --purge -l DEBUG -B)
[2011-10-22 06:45:12,381: WARNING/MainProcess] Restarting celeryd
(/usr/bin/celeryd --purge -l DEBUG -B)
$ grep 'Restart' /tmp/celeryd.log
[2011-10-22 06:47:17,771: WARNING/Beat] Restarting celeryd
(/usr/bin/celeryd --purge -l DEBUG -B)
[2011-10-22 06:47:17,775: WARNING/MainProcess] Restarting celeryd
(/usr/bin/celeryd --purge -l DEBUG -B)
$ grep 'Restart' /tmp/celeryd.log
[2011-10-22 06:44:06,012: WARNING/Beat] Restarting celeryd
(/usr/bin/celeryd --purge -l DEBUG -B)
[2011-10-22 06:44:06,012: WARNING/MainProcess] Restarting celeryd
(/usr/bin/celeryd --purge -l DEBUG -B)
All machines have plenty of disk space, 64 GB of RAM and 32 core CPUs
(AMD Opteron Processor 8356).
At the time of the calculation these machines are performing pretty
heavy seismic calculations and the load on them would be around 20.
AFAICT memory is not an issue, the swap is barely used.
I am at a loss to find out why and how these restarts occur. Any advice
on how to analyse/diagnose this problem would be very much appreciated.
Please note also that the daemon in question is not started via an
/etc/init.d script but manually:
cd /usr/openquake && nohup celeryd --purge -l DEBUG -B > /tmp/celeryd.log
2>&1 3>&1 &
P.S.: logrotate is not being used
Best regards/Mit freundlichen Grüßen
--
Muharem Hrnjadovic <[email protected]>
Public key id : B2BBFCFC
Key fingerprint : A5A3 CC67 2B87 D641 103F 5602 219F 6B60 B2BB FCFC
signature.asc
Description: OpenPGP digital signature
-- ubuntu-server mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam
