Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing
Package: munin Version: 1.2.4-1 Severity: grave Justification: renders package unusable and of course sends ton of emails about it. zz:~# ps aux | grep munin munin 8769 0.0 0.0 6720 3296 ?Ss 11:40 0:00 /bin/sh -c if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi munin 8774 0.0 0.0 6720 3312 ?S11:40 0:00 /bin/sh /usr/bin/munin-cron munin 8823 0.0 0.0 6720 1648 ?S11:40 0:00 /bin/sh /usr/bin/munin-cron munin 8822 100 0.1 20336 15712 ?RN 11:40 11:03 /usr/bin/perl -w /usr/share/munin/munin-graph --cron root 9166 0.0 0.0 4016 1760 pts/1S+ 11:51 0:00 grep munin and: zz~# ls -l /proc/8822/fd/ razem 5 lr-x-- 1 munin munin 64 2006-08-14 11:45 0 - pipe:[1618978] l-wx-- 1 munin munin 64 2006-08-14 11:45 1 - pipe:[1619363] l-wx-- 1 munin munin 64 2006-08-14 11:45 2 - pipe:[1619363] l-wx-- 1 munin munin 64 2006-08-14 11:45 3 - /var/log/munin/munin-graph.log l-wx-- 1 munin munin 64 2006-08-14 11:45 4 - /var/lib/munin/munin-graph.stats.tmp and: zz~# strace -p 8822 Process 8822 attached - interrupt to quit Process 8822 detached zz:~# strace -p 8774 Process 8774 attached - interrupt to quit wait4(-1, unfinished ... Process 8774 detached zz:~# strace -p 8823 Process 8823 attached - interrupt to quit read(0, unfinished ... Process 8823 detached drogowskaz:~# -- System Information: Debian Release: testing/unstable APT prefers testing APT policy: (500, 'testing'), (500, 'stable') Architecture: ia64 Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.15.1 Locale: LANG=pl_PL, LC_CTYPE=pl_PL (charmap=ISO-8859-2) Versions of packages munin depends on: ii adduser 3.87 Add and remove users and groups pn libdigest-md5-perlnone (no description available) ii libhtml-template-perl 2.8-1 HTML::Template : A module for usin ii librrds-perl 1.2.11-0.5 Time-series data storage and displ pn libtime-hires-perlnone (no description available) ii perl [libstorable-perl] 5.8.8-4Larry Wall's Practical Extraction ii perl-modules 5.8.8-4Core Perl modules ii rrdtool 1.2.11-0.5 Time-series data storage and displ Versions of packages munin recommends: ii libdate-manip-perl5.44-2 a perl library for manipulating da ii munin-node1.2.4-1network-wide graphing framework (n -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing
tags 382947 moreinfo quit * Radek Antoniuk and of course sends ton of emails about it. zz:~# ps aux | grep munin munin 8769 0.0 0.0 6720 3296 ?Ss 11:40 0:00 /bin/sh -c if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi munin 8774 0.0 0.0 6720 3312 ?S11:40 0:00 /bin/sh /usr/bin/munin-cron munin 8823 0.0 0.0 6720 1648 ?S11:40 0:00 /bin/sh /usr/bin/munin-cron munin 8822 100 0.1 20336 15712 ?RN 11:40 11:03 /usr/bin/perl -w /usr/share/munin/munin-graph --cron root 9166 0.0 0.0 4016 1760 pts/1S+ 11:51 0:00 grep munin and: zz~# ls -l /proc/8822/fd/ razem 5 lr-x-- 1 munin munin 64 2006-08-14 11:45 0 - pipe:[1618978] l-wx-- 1 munin munin 64 2006-08-14 11:45 1 - pipe:[1619363] l-wx-- 1 munin munin 64 2006-08-14 11:45 2 - pipe:[1619363] l-wx-- 1 munin munin 64 2006-08-14 11:45 3 - /var/log/munin/munin-graph.log l-wx-- 1 munin munin 64 2006-08-14 11:45 4 - /var/lib/munin/munin-graph.stats.tmp and: zz~# strace -p 8822 Process 8822 attached - interrupt to quit Process 8822 detached zz:~# strace -p 8774 Process 8774 attached - interrupt to quit wait4(-1, unfinished ... Process 8774 detached zz:~# strace -p 8823 Process 8823 attached - interrupt to quit read(0, unfinished ... Process 8823 detached drogowskaz:~# Hmm, strange. It's the first time I've heard of such a problem, ever. Can you see anything out of the ordinary in /var/log/munin/*.log? The emails contains information of the lock files existing, I assume? Are you able to reproduce the problem at will, or was this a one-time occurrence? If so, could you try sudo -u munin /usr/share/munin/munin-graph --debug and mail me the output? I have a suspicion the spinning happens somewhere in RRDtool.. -- Tore Anderson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing
Hmm, strange. It's the first time I've heard of such a problem, ever. Can you see anything out of the ordinary in /var/log/munin/*.log? == munin-node.log == 2006/08/14-15:21:06 Plugin squid_cache exited with status 28416. cat: /proc/net/ip_conntrack: Nie ma takiego pliku ani katalogu cat: /proc/net/ip_conntrack: Nie ma takiego pliku ani katalogu No support for device type: thermal Połączenie odrzucone at /etc/munin/plugins/squid_icp line 154. 2006/08/14-15:21:10 Plugin squid_icp exited with status 28416. Połączenie odrzucone at /etc/munin/plugins/squid_requests line 149. 2006/08/14-15:21:10 Plugin squid_requests exited with status 28416. Połączenie odrzucone at /etc/munin/plugins/squid_traffic line 151. 2006/08/14-15:21:10 Plugin squid_traffic exited with status 28416. The rest is clear. The emails contains information of the lock files existing, I assume? Yeap. Are you able to reproduce the problem at will, or was this a one-time occurrence? If so, could you try It's totally repeatable. When I kill all pids of munin-*, and wait for next cronjob, it happens again. sudo -u munin /usr/share/munin/munin-graph --debug and mail me the output? I have a suspicion the spinning happens somewhere in RRDtool.. Yeap, I can although it's quite big, I have just run it and it ends ok with no errors in fact. One more important thing. This host behaves as a server and a client. I mean: I have two locations. This is one, and is graphing itself and some hosts. And I have a second machine, that graphs the above and some others hosts. Maybe that's an issue of cross-updating or sth. Other weird things is that every time it happens, there are two processes of munin-cron run at the exactly same time. -- Best regards, Radek Antoniuk
Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing
== munin-node.log == I assume Nie ma takiego pliku ani katalogu means no such file or directory, but I'll need help with Połączenie odrzucone... :-) It's totally repeatable. When I kill all pids of munin-*, and wait for next cronjob, it happens again. Hmm. Could you change the cronjob so it says: */5 * * * * munin if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron --debug /tmp/debug-log 21; fi Hopefully it'll get stuck, and with any luck we'll see what happens near the end of /tmp/debug-log. Yeap, I can although it's quite big, I have just run it and it ends ok with no errors in fact. Hmm. Let's try the above first. Maybe that's an issue of cross-updating or sth. It should have worked... It seems it's munin-graph that gets stuck, so I don't think it has something to do with the node or the update process. Hopefully the log'll tell us some more. Other weird things is that every time it happens, there are two processes of munin-cron run at the exactly same time. Hmm. Could you next time take a look with ps axuf, to see if one is the child of the other? -- Tore Anderson
Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing
On 8/14/06, Tore Anderson [EMAIL PROTECTED] wrote: Hmm. Could you next time take a look with ps axuf, to see if one is the child of the other? Uhm, that was obvious and I've forgotten about it.. Połączenie odrzucone is Connection refused ;)) Now, the problem is that I have resolved the problem.. or maybe not the problem itself. I thought about the two cronjobs running simultaneously, and then... I thought, ok, let's try to restart crond. And it worked. It seems like cron was running two things at a time or sth like that and they got mixed. Supposingly, the problem may happen again soon, but we will see about that. -- Best regards, Radek Antoniuk
Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing
severity 382947 normal tags 382947 unreproducible quit * Radek Antoniuk Now, the problem is that I have resolved the problem.. or maybe not the problem itself. I thought about the two cronjobs running simultaneously, and then... I thought, ok, let's try to restart crond. And it worked. It seems like cron was running two things at a time or sth like that and they got mixed. Supposingly, the problem may happen again soon, but we will see about that. Okay. In the mean, I'll downgrade the severity of this bug, as I don't think it'll happen very often and affect many users. Therefore it's not worth dropping Munin from the next version of Debian over it. I hope you agree. If it happens again, though, let me know. Regards -- Tore Anderson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]