Hi Cameron, [adding the developers list]
OK:
1) we write the unmodified data in line 233 to capture the raw counters. That
is what we are using in line 227 for the comparison
2) ns is created and returned by hash_lookup
3) The ULONG_MAX logic in line 231 is there because we need to ensure that the
result is always positive. Needed because the variables are unsigned.
4) update_ifdata is called once by metric_init and then every time one of
the byte/pkts_in/out collectors fires
Now this does not solve your problem ... Question: do you see any of the debug
messages that should be created by update_ifdata in case of something
unusual?
That should help to get an idea on how the interface counters on your
machine(s)
look like. Lokk in /var/log/messages, or just start gmond noninteractive.
Hmm. Another question: do you compile gmond in 64-bit or 32-bit mode? The
ULONG_MAX logic may/will fail in 32-bit mode, if the kernel is 64-bit. It could
even be that the interface counters on 32-bit kernels are written as 64-bit
values.
Hope this helps
Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
From: Cameron L. Spitzer cspit...@nvidia.com
To: ganglia-gene...@lists.sourceforge.net
ganglia-gene...@lists.sourceforge.net
Sent: Thu, April 28, 2011 3:21:04 AM
Subject: [Ganglia-general] revisiting bogus spikes
Once again I've been asked to make Ganglia usable on Linux hosts with the
Broadcom NIC with the 32-bit byte counters.
E.g., HP Proliant 580 G5, a rather popular machine where Ganglia doesn't
work
out of the box.
So I'm trying to understand ganglia-3.1.7/libmetrics/linux/metrics.c again.
In update_ifdata(), we parse /proc/net/dev for the current bytes and packets
in
and out.
There's a structure ns (declared where?) of type net_dev_stats, representing
the previous sample?
I'm not sure exactly what ns represents.
There's a sanity check at line 227 if ( rbi = ns-rbi ) for whether the
counter went up or down. If it went down, we assume the counter rolled
around,
and guess the value is negative, and invert it, line 231. l_bytes_in +=
ULONG_MAX - ns-rbi + rbi;
(I don't understand how that is supposed to work.)
Then, regardless of whether the sample passed or failed the sanity check, it's
saved in the ns structure.
Line 233, ns-rpi = rpi;
After the parsing is all done, and the crazy value is in ns, an optional
reasonableness test (REMOVE_BOGUS_SPIKES)
returns early if any of the numbers are extremely large. Otherwise it updates
the static running counts and then returns.
On our HP 580G5s, defining REMOVE_BOGUS_SPIKES had no effect. The network
traffic graphs become useless within a minute of starting gmond.
The part I don't understand is when the line 227 check fails, we put the
known-bad data in ns anyway.
I'd appreciate it if someone familiar with update_ifdata() could explain its
logic. When is this routine called?
(I can see modules/network/mod_net.c calls it via bytes_in_func(), but I
haven't
figured out when net_metric_handler()
is called. Maybe that would explain how bogus data in ns doesn't matter.)
Is there any way to keep way out-of-scale data out of these graphs?
Thanks for any help.
-Cameron in Los Gatos
This email message is for the sole use of the intended recipient(s) and may
contain confidential information. Any unauthorized review, use, disclosure
or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original
message.
--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers