Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing

2006-08-14 Thread Radek Antoniuk
Package: munin
Version: 1.2.4-1
Severity: grave
Justification: renders package unusable


and of course sends ton of emails about it.

zz:~# ps aux | grep munin
munin 8769  0.0  0.0   6720  3296 ?Ss   11:40   0:00 /bin/sh -c if 
[ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi
munin 8774  0.0  0.0   6720  3312 ?S11:40   0:00 /bin/sh 
/usr/bin/munin-cron
munin 8823  0.0  0.0   6720  1648 ?S11:40   0:00 /bin/sh 
/usr/bin/munin-cron
munin 8822  100  0.1  20336 15712 ?RN   11:40  11:03 /usr/bin/perl 
-w /usr/share/munin/munin-graph --cron
root  9166  0.0  0.0   4016  1760 pts/1S+   11:51   0:00 grep munin

and:
zz~# ls -l /proc/8822/fd/
razem 5
lr-x-- 1 munin munin 64 2006-08-14 11:45 0 - pipe:[1618978]
l-wx-- 1 munin munin 64 2006-08-14 11:45 1 - pipe:[1619363]
l-wx-- 1 munin munin 64 2006-08-14 11:45 2 - pipe:[1619363]
l-wx-- 1 munin munin 64 2006-08-14 11:45 3 - /var/log/munin/munin-graph.log
l-wx-- 1 munin munin 64 2006-08-14 11:45 4 - 
/var/lib/munin/munin-graph.stats.tmp

and:
zz~# strace -p 8822
Process 8822 attached - interrupt to quit
Process 8822 detached

zz:~# strace -p 8774
Process 8774 attached - interrupt to quit
wait4(-1,  unfinished ...
Process 8774 detached
zz:~# strace -p 8823
Process 8823 attached - interrupt to quit
read(0,  unfinished ...
Process 8823 detached
drogowskaz:~#


-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: ia64
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.15.1
Locale: LANG=pl_PL, LC_CTYPE=pl_PL (charmap=ISO-8859-2)

Versions of packages munin depends on:
ii  adduser   3.87   Add and remove users and groups
pn  libdigest-md5-perlnone (no description available)
ii  libhtml-template-perl 2.8-1  HTML::Template : A module for usin
ii  librrds-perl  1.2.11-0.5 Time-series data storage and displ
pn  libtime-hires-perlnone (no description available)
ii  perl [libstorable-perl]   5.8.8-4Larry Wall's Practical Extraction 
ii  perl-modules  5.8.8-4Core Perl modules
ii  rrdtool   1.2.11-0.5 Time-series data storage and displ

Versions of packages munin recommends:
ii  libdate-manip-perl5.44-2 a perl library for manipulating da
ii  munin-node1.2.4-1network-wide graphing framework (n

-- no debconf information


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing

2006-08-14 Thread Tore Anderson
tags 382947 moreinfo
quit

* Radek Antoniuk

 and of course sends ton of emails about it.
 
 zz:~# ps aux | grep munin
 munin 8769  0.0  0.0   6720  3296 ?Ss   11:40   0:00 /bin/sh -c 
 if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi
 munin 8774  0.0  0.0   6720  3312 ?S11:40   0:00 /bin/sh 
 /usr/bin/munin-cron
 munin 8823  0.0  0.0   6720  1648 ?S11:40   0:00 /bin/sh 
 /usr/bin/munin-cron
 munin 8822  100  0.1  20336 15712 ?RN   11:40  11:03 
 /usr/bin/perl -w /usr/share/munin/munin-graph --cron
 root  9166  0.0  0.0   4016  1760 pts/1S+   11:51   0:00 grep munin
 
 and:
 zz~# ls -l /proc/8822/fd/
 razem 5
 lr-x-- 1 munin munin 64 2006-08-14 11:45 0 - pipe:[1618978]
 l-wx-- 1 munin munin 64 2006-08-14 11:45 1 - pipe:[1619363]
 l-wx-- 1 munin munin 64 2006-08-14 11:45 2 - pipe:[1619363]
 l-wx-- 1 munin munin 64 2006-08-14 11:45 3 - 
 /var/log/munin/munin-graph.log
 l-wx-- 1 munin munin 64 2006-08-14 11:45 4 - 
 /var/lib/munin/munin-graph.stats.tmp
 
 and:
 zz~# strace -p 8822
 Process 8822 attached - interrupt to quit
 Process 8822 detached
 
 zz:~# strace -p 8774
 Process 8774 attached - interrupt to quit
 wait4(-1,  unfinished ...
 Process 8774 detached
 zz:~# strace -p 8823
 Process 8823 attached - interrupt to quit
 read(0,  unfinished ...
 Process 8823 detached
 drogowskaz:~#

  Hmm, strange.  It's the first time I've heard of such a problem, ever.
 Can you see anything out of the ordinary in /var/log/munin/*.log? 
 The emails contains information of the lock files existing, I assume?

  Are you able to reproduce the problem at will, or was this a one-time
 occurrence?  If so, could you try
 sudo -u munin /usr/share/munin/munin-graph --debug and mail me the
 output?  I have a suspicion the spinning happens somewhere in RRDtool..

-- 
Tore Anderson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing

2006-08-14 Thread Radosław Antoniuk

  Hmm, strange.  It's the first time I've heard of such a problem, ever.
 Can you see anything out of the ordinary in /var/log/munin/*.log?


== munin-node.log ==
2006/08/14-15:21:06 Plugin squid_cache exited with status 28416. 
cat: /proc/net/ip_conntrack: Nie ma takiego pliku ani katalogu
cat: /proc/net/ip_conntrack: Nie ma takiego pliku ani katalogu
No support for device type: thermal
Połączenie odrzucone at /etc/munin/plugins/squid_icp line 154.
2006/08/14-15:21:10 Plugin squid_icp exited with status 28416. 
Połączenie odrzucone at /etc/munin/plugins/squid_requests line 149.
2006/08/14-15:21:10 Plugin squid_requests exited with status 28416. 
Połączenie odrzucone at /etc/munin/plugins/squid_traffic line 151.
2006/08/14-15:21:10 Plugin squid_traffic exited with status 28416. 


The rest is clear.


 The emails contains information of the lock files existing, I assume?


Yeap.


  Are you able to reproduce the problem at will, or was this a one-time
 occurrence?  If so, could you try


It's totally repeatable. When I kill all pids of munin-*, and wait for
next cronjob, it happens again.


 sudo -u munin /usr/share/munin/munin-graph --debug and mail me the
 output?  I have a suspicion the spinning happens somewhere in RRDtool..

Yeap, I can although it's quite big, I have just run it and it ends ok
with no errors in fact.

One more important thing. This host behaves as a server and a client. I mean:
I have two locations.
This is one, and is graphing itself and some hosts.
And I have a second machine, that graphs the above and some others hosts.
Maybe that's an issue of cross-updating or sth.

Other weird things is that every time it happens, there are two
processes of munin-cron run at the exactly same time.

--
Best regards,
Radek Antoniuk


Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing

2006-08-14 Thread Tore Anderson

 == munin-node.log ==

  I assume Nie ma takiego pliku ani katalogu means no such file or
 directory, but I'll need help with Połączenie odrzucone...  :-)

 It's totally repeatable. When I kill all pids of munin-*, and wait for
 next cronjob, it happens again.

  Hmm.  Could you change the cronjob so it says:

  */5 * * * * munin if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron 
--debug  /tmp/debug-log 21; fi

  Hopefully it'll get stuck, and with any luck we'll see what happens
 near the end of /tmp/debug-log.

 Yeap, I can although it's quite big, I have just run it and it ends ok
 with no errors in fact.

  Hmm.  Let's try the above first.

 Maybe that's an issue of cross-updating or sth.

  It should have worked...  It seems it's munin-graph that gets stuck,
 so I don't think it has something to do with the node or the update
 process.  Hopefully the log'll tell us some more.

 Other weird things is that every time it happens, there are two
 processes of munin-cron run at the exactly same time.

  Hmm.  Could you next time take a look with ps axuf, to see if
 one is the child of the other?

-- 
Tore Anderson




Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing

2006-08-14 Thread Radosław Antoniuk

On 8/14/06, Tore Anderson [EMAIL PROTECTED] wrote:

  Hmm.  Could you next time take a look with ps axuf, to see if
 one is the child of the other?


Uhm, that was obvious and I've forgotten about it..
Połączenie odrzucone is Connection refused ;))

Now, the problem is that I have resolved the problem.. or maybe not
the problem itself.
I thought about the two cronjobs running simultaneously, and then... I
thought, ok, let's try to restart crond. And it worked.
It seems like cron was running two things at a time or sth like that
and they got mixed.
Supposingly, the problem may happen again soon, but we will see about that.


--
Best regards,
Radek Antoniuk


Bug#382947: munin: Munin hangs on double cron-job execution and generates 100% cpu load doing nothing

2006-08-14 Thread Tore Anderson
severity 382947 normal
tags 382947 unreproducible
quit

* Radek Antoniuk

 Now, the problem is that I have resolved the problem.. or maybe not
 the problem itself.
 I thought about the two cronjobs running simultaneously, and then... I
 thought, ok, let's try to restart crond. And it worked.
 It seems like cron was running two things at a time or sth like that
 and they got mixed.
 Supposingly, the problem may happen again soon, but we will see about
 that.

  Okay.  In the mean, I'll downgrade the severity of this bug, as I
 don't think it'll happen very often and affect many users.  Therefore
 it's not worth dropping Munin from the next version of Debian over it.
 I hope you agree.  If it happens again, though, let me know.

Regards
-- 
Tore Anderson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]