Re: [Ganglia-developers] RRD_update illegal attempt to update using time 1252671437 when last update time is 1252671437 (minimum one second step)

2009-09-11 Thread Rick Cobb
Two things to check before concluding it's the code (though I think  
your points are valid):

If you have a data source that's misconfigured, with a cluster name  
that matches a different data source, you'll get this problem, but  
only on __SummaryInfo__ files.

If you have a 3.0 system, and the same metric name is sent via both  
gmetric & gmond for the same host, you'll get this problem.  I suspect  
that can still happen in 3.1, as long as the metric comes from two  
different modules.

I found both these by looking at the pattern of the names of the  
files. If it's really happening randomly to any rrdfile, I'd suspect  
the code, but if it's clustering on specific ones, I'd suspect  
configuration.

-- ReC
On Sep 11, 2009, at 6:21 AM, Spike Spiegel wrote:

> Hi,
>
> our gmetad boxes (2 of them) with 12 data sources, 6 of which are
> gmetad and 6 gmonds, are spamming syslog like mad with the following
> message:
>
> Sep  6 06:33:32 localhost.localdomain /usr/sbin/gmetad[2526]:
> RRD_update (/var/lib/ganglia/rrds/...metric.rrd): illegal attempt to
> update using time 1252244010 when last update time is 1252244010
> (minimum one second step)
>
> This happens for both metrics and summary graphs.
>
> Looking at the hosts every appear to be fine to me, and ntp is running
> everywhere and in sync.
>
> Looking at the code instead both gmetad/gmetad.c and
> gmetad/data_thread.c have a possibly suspicious call to sleep:
>
> in gmetad.c:417
> sleep_time = 10 + ((30-10)*1.0) * rand()/(RAND_MAX + 1.0);
> sleep(sleep_time);
>
> in data_thread.c:193
> sleep_time = (d->step - 5) + (10 * (rand()/(float)RAND_MAX))
> - (end.tv_sec - start.tv_sec);
> if( sleep_time > 0 )
>sleep(sleep_time);
>
> two observation:
> - based on man 3 sleep, if any signal is sent to gmetad, the sleep
> interval can be 0
> - end.tv_sec - start.tv_sec could compute to a considerably high
> number that along with a short step could result in a sleep_time < =
> 0.
>
> thoughts?
>
> thanks
>
> -- 
> "Behind every great man there's a great backpack" - B.
>
> --
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad spamming logs with "unable to write root epilog"

2009-09-11 Thread Rick Cobb
+1; we patched like that, too, and monitor the same way.

-- ReC
On Sep 11, 2009, at 6:43 AM, Spike Spiegel wrote:

> Hi,
>
> recently we added better monitoring for our ganglia infrastructure and
> one of the checks for gmetad contacts it on port 8651, looks for some
> XML string and exits (receiving 20+ MBs of xml every time we run the
> check isn't an option). The 'exists' part means sending a RST before
> gmetad has sent all data which causes root_report_end() to fail with
> subsequent message 'server_thread() %d unable to write root epilog'
> being logged. Is it really necessary to log an error message if the
> client goes away early? after all it's not ganglia/gmetad
> malfunctioning or anything, and we could still keep that for debug
> mode. If that makes sense to you the one line patch is below.
>
> thanks
>
> Index: server.c
> ===
> --- server.c(revision 2058)
> +++ server.c(working copy)
> @@ -639,7 +639,7 @@
>
>  if(root_report_end(&client))
> {
> -   err_msg("server_thread() %d unable to write root
> epilog", pthread_self() );
> +   debug_msg("server_thread() %d unable to write root
> epilog", pthread_self() );
> }
>
>  close(client.fd);
>
> --  
> "Behind every great man there's a great backpack" - B.
>
> --
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> ___
> Ganglia-developers mailing list
> Ganglia-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-developers


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] gmetad spamming logs with "unable to write root epilog"

2009-09-11 Thread Spike Spiegel
Hi,

recently we added better monitoring for our ganglia infrastructure and
one of the checks for gmetad contacts it on port 8651, looks for some
XML string and exits (receiving 20+ MBs of xml every time we run the
check isn't an option). The 'exists' part means sending a RST before
gmetad has sent all data which causes root_report_end() to fail with
subsequent message 'server_thread() %d unable to write root epilog'
being logged. Is it really necessary to log an error message if the
client goes away early? after all it's not ganglia/gmetad
malfunctioning or anything, and we could still keep that for debug
mode. If that makes sense to you the one line patch is below.

thanks

Index: server.c
===
--- server.c(revision 2058)
+++ server.c(working copy)
@@ -639,7 +639,7 @@

  if(root_report_end(&client))
 {
-   err_msg("server_thread() %d unable to write root
epilog", pthread_self() );
+   debug_msg("server_thread() %d unable to write root
epilog", pthread_self() );
 }

  close(client.fd);

-- 
"Behind every great man there's a great backpack" - B.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] RRD_update illegal attempt to update using time 1252671437 when last update time is 1252671437 (minimum one second step)

2009-09-11 Thread Spike Spiegel
Hi,

our gmetad boxes (2 of them) with 12 data sources, 6 of which are
gmetad and 6 gmonds, are spamming syslog like mad with the following
message:

Sep  6 06:33:32 localhost.localdomain /usr/sbin/gmetad[2526]:
RRD_update (/var/lib/ganglia/rrds/...metric.rrd): illegal attempt to
update using time 1252244010 when last update time is 1252244010
(minimum one second step)

This happens for both metrics and summary graphs.

Looking at the hosts every appear to be fine to me, and ntp is running
everywhere and in sync.

Looking at the code instead both gmetad/gmetad.c and
gmetad/data_thread.c have a possibly suspicious call to sleep:

in gmetad.c:417
 sleep_time = 10 + ((30-10)*1.0) * rand()/(RAND_MAX + 1.0);
 sleep(sleep_time);

in data_thread.c:193
 sleep_time = (d->step - 5) + (10 * (rand()/(float)RAND_MAX))
- (end.tv_sec - start.tv_sec);
 if( sleep_time > 0 )
sleep(sleep_time);

two observation:
- based on man 3 sleep, if any signal is sent to gmetad, the sleep
interval can be 0
- end.tv_sec - start.tv_sec could compute to a considerably high
number that along with a short step could result in a sleep_time < =
0.

thoughts?

thanks

-- 
"Behind every great man there's a great backpack" - B.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Solaris OpenCSW update

2009-09-11 Thread Daniel Pocock

If anyone is interested in testing the Ganglia build on the OpenCSW 
build farm, it is now possible.  This provides access to a range of 
Solaris machines including version 8.

libconfuse is now packaged and pre-installed on the OpenCSW machines, so 
all the dependencies for building Ganglia are there.

I've also updated the OpenCSW Makefile (in the OpenCSW SVN) so that a 
Ganglia 3.1.2 (or maybe 3.1.3 when released) package will appear in 
their collection shortly.  Any suggestions to improve this are welcome.

https://gar.svn.sourceforge.net/svnroot/gar/csw/mgar/pkg/ganglia/trunk




--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers