[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
No thank you sir! The only problem remains how to explain to people that their data has turned upside down after they upgrade. Fortunately that's no problem of mine :-) At any rate, as far as I'm concerned that about fixes this bug... -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
For what it's worth, I've attached a graph of the same machine, running normally, with IO Service time over a day, after modification -- no random downward streaks to zero, much more readable, and the numbers actually make sense. ** Attachment added: IO Service Time - by day (modified plugin) https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+attachment/2694148/+files/iostat_ios-day.png -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Here's the Disk Latency graph as a comparison -- similar information, but not exactly the same. ** Attachment added: Disk Latency per device - by day https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+attachment/2694149/+files/diskstats_latency-day.png -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Here is my suggestion, taking into account the U to mean NaN and also I note that you can get deep negative spikes when the counters clock over (unless I'm mistaken they are 32 bit counters, even on a 64 bit kernel, and yes I do seem to have managed to clock at least one over). LINE 202: print(${dev}_rtime.value , ($rio_diff 0 and $rtime_diff 0) ? ($rtime_diff / $rio_diff) : 'U', \n, LINE 203: ${dev}_wtime.value , ($wio_diff 0 and $wtime_diff 0) ? ($wtime_diff / $wio_diff) : 'U', \n, LINE 204: ); I've tested this for about half a day and it looks much nicer than the old version, less clutter. When IO is not happening on a drive, the trace goes away, and I think that's what should happen (people might for example use a drive only for backups or some other intermittent activity). -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Agree that the IO Service time is made somewhat redundant by the Disk Latency graph which shows much the same information. However, IO Service time itemises read and write, and after coming this far, seems a shame not to make the small fix to get it working properly. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] [NEW] Munin IO Service Time graph gives completely implausible numbers
Public bug reported: Recently a process wrote lots of data to the disk (and I'll take responsibility for that) but the Disk IOs per device went up (makes sense) and the Disk latency per device also went up (yup, so far so good) but the IO Service time strangely went down! Yes, it showed more latency but less service time. That's incredible. I'm incredulous. I've been suspicious of this for some time, because I absolutely know that /dev/sdc is a faster device than /dev/sda and /dev/sdb and for a long time it has been showing the lowest latency and the highest service time. It always did seem weird, but now I'm sure these numbers are bogus. ** Affects: munin (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
** Attachment added: Versions and stuff from apport-cli https://bugs.launchpad.net/bugs/919429/+attachment/2684644/+files/bug.apport -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Here is the graph of the latency, also from munin. Showing the exact opposite to the other graph. This one gives an answer that I believe is correct (or at least it is plausible). Note that /dev/sdc is the fastest drive, and shows the lowest latency. Also note that when loaded heavily, the latency on /dev/sda and /dev/sdb go up (they are RAID mirror so they move together). ** Attachment added: Disk latency per device - by week https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+attachment/2684669/+files/diskstats_latency-week.png -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Here is the offending graph, from munin. NOTE: the purple line is the fast drive, and the yellow line is a slower drive, but for some strange reason the yellow line goes DOWN under additional load (should go UP I would expect). Also, the results are completely the other way to the latency graph. ** Attachment added: IO Service time - by week https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+attachment/2684645/+files/iostat_ios-week.png -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Just for reference, here is a graph of IO operations, showing the additional load. The green line is /dev/md0 but that is a RAID mirror of /dev/sda and /dev/sdb so they are taking the load. NOTE: the load on /dev/sdc is constant and would typically be the drive taking the most load. This graph seems very plausible to me. ** Attachment added: Disk IOs per device - by week https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+attachment/2684671/+files/diskstats_iops-week.png -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
Quite likely the kernel version may be significant here: Linux version 2.6.32-33-server (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #70-Ubuntu SMP Thu Jul 7 22:28:30 UTC 2011 linux-headers-2.6.32-332.6.32-33.70 linux-headers-2.6.32-33-server 2.6.32-33.70 linux-headers-server 2.6.32.33.39 linux-image-2.6.32-33-server 2.6.32-33.70 linux-image-server 2.6.32.33.39 linux-server 2.6.32.33.39 Also, see attached CPU info, not sure if that changed IOstat stuff, possibly it does. ** Attachment added: /proc/cpuinfo https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+attachment/2684681/+files/proc_cpuinfo.text -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
I can see one problem in the file /usr/share/munin/plugins/iostat_ios as follows: LINE 202: print(${dev}_rtime.value , ($rtime_diff != 0) ? ($rio_diff / $rtime_diff) : 0, \n, LINE 203: ${dev}_wtime.value , ($wtime_diff != 0) ? ($wio_diff / $wtime_diff) : 0, \n, LINE 204: ); Here we see it divides number of IO operations by time in milliseconds. However the graph has the vertical axis in seconds. Dividing by time gives Hz never back to seconds again. So the graph results are really in Hz (not a useful unit given the context we are working with here). I suggest that the calculation should be: LINE 202: print(${dev}_rtime.value , ($rio_diff != 0) ? ($rtime_diff / $rio_diff) : 0, \n, LINE 203: ${dev}_wtime.value , ($wio_diff != 0) ? ($wtime_diff / $wio_diff) : 0, \n, LINE 204: ); There's another (minor) problem which is to say that returning 0 in a situation where no IO has occurred is a lie, should return NaN or NA but if you read the Munin protocol specification (see link below) it claims Output must be integer or decimal number, so the have no provision for a plugin saying hey, this value does not exist right now and that's strange because the RRD system does support NaN for missing values -- but that's a bigger problem for another day. http://munin-monitoring.org/wiki/protocol-config -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 919429] Re: Munin IO Service Time graph gives completely implausible numbers
http://munin- monitoring.org/browser/trunk/node/node.d.linux/iostat_ios.in?rev=121 (lines 147, 148, 149) Ha ha, sysadmins have been muddling over bogus values for the past 8 years. No one noticed all the readings were backwards, but I bet there have been plenty of reports handed to bosses over those years with charts fully of meaningless squiggly lines. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to munin in Ubuntu. https://bugs.launchpad.net/bugs/919429 Title: Munin IO Service Time graph gives completely implausible numbers To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/munin/+bug/919429/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs