Re: [Ganglia-developers] RRD_update illegal attempt to update using time 1252671437 when last update time is 1252671437 (minimum one second step)
Two things to check before concluding it's the code (though I think your points are valid): If you have a data source that's misconfigured, with a cluster name that matches a different data source, you'll get this problem, but only on __SummaryInfo__ files. If you have a 3.0 system, and the same metric name is sent via both gmetric & gmond for the same host, you'll get this problem. I suspect that can still happen in 3.1, as long as the metric comes from two different modules. I found both these by looking at the pattern of the names of the files. If it's really happening randomly to any rrdfile, I'd suspect the code, but if it's clustering on specific ones, I'd suspect configuration. -- ReC On Sep 11, 2009, at 6:21 AM, Spike Spiegel wrote: > Hi, > > our gmetad boxes (2 of them) with 12 data sources, 6 of which are > gmetad and 6 gmonds, are spamming syslog like mad with the following > message: > > Sep 6 06:33:32 localhost.localdomain /usr/sbin/gmetad[2526]: > RRD_update (/var/lib/ganglia/rrds/...metric.rrd): illegal attempt to > update using time 1252244010 when last update time is 1252244010 > (minimum one second step) > > This happens for both metrics and summary graphs. > > Looking at the hosts every appear to be fine to me, and ntp is running > everywhere and in sync. > > Looking at the code instead both gmetad/gmetad.c and > gmetad/data_thread.c have a possibly suspicious call to sleep: > > in gmetad.c:417 > sleep_time = 10 + ((30-10)*1.0) * rand()/(RAND_MAX + 1.0); > sleep(sleep_time); > > in data_thread.c:193 > sleep_time = (d->step - 5) + (10 * (rand()/(float)RAND_MAX)) > - (end.tv_sec - start.tv_sec); > if( sleep_time > 0 ) >sleep(sleep_time); > > two observation: > - based on man 3 sleep, if any signal is sent to gmetad, the sleep > interval can be 0 > - end.tv_sec - start.tv_sec could compute to a considerably high > number that along with a short step could result in a sleep_time < = > 0. > > thoughts? > > thanks > > -- > "Behind every great man there's a great backpack" - B. > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
Re: [Ganglia-developers] gmetad spamming logs with "unable to write root epilog"
+1; we patched like that, too, and monitor the same way. -- ReC On Sep 11, 2009, at 6:43 AM, Spike Spiegel wrote: > Hi, > > recently we added better monitoring for our ganglia infrastructure and > one of the checks for gmetad contacts it on port 8651, looks for some > XML string and exits (receiving 20+ MBs of xml every time we run the > check isn't an option). The 'exists' part means sending a RST before > gmetad has sent all data which causes root_report_end() to fail with > subsequent message 'server_thread() %d unable to write root epilog' > being logged. Is it really necessary to log an error message if the > client goes away early? after all it's not ganglia/gmetad > malfunctioning or anything, and we could still keep that for debug > mode. If that makes sense to you the one line patch is below. > > thanks > > Index: server.c > === > --- server.c(revision 2058) > +++ server.c(working copy) > @@ -639,7 +639,7 @@ > > if(root_report_end(&client)) > { > - err_msg("server_thread() %d unable to write root > epilog", pthread_self() ); > + debug_msg("server_thread() %d unable to write root > epilog", pthread_self() ); > } > > close(client.fd); > > -- > "Behind every great man there's a great backpack" - B. > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ___ > Ganglia-developers mailing list > Ganglia-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-developers -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] gmetad spamming logs with "unable to write root epilog"
Hi, recently we added better monitoring for our ganglia infrastructure and one of the checks for gmetad contacts it on port 8651, looks for some XML string and exits (receiving 20+ MBs of xml every time we run the check isn't an option). The 'exists' part means sending a RST before gmetad has sent all data which causes root_report_end() to fail with subsequent message 'server_thread() %d unable to write root epilog' being logged. Is it really necessary to log an error message if the client goes away early? after all it's not ganglia/gmetad malfunctioning or anything, and we could still keep that for debug mode. If that makes sense to you the one line patch is below. thanks Index: server.c === --- server.c(revision 2058) +++ server.c(working copy) @@ -639,7 +639,7 @@ if(root_report_end(&client)) { - err_msg("server_thread() %d unable to write root epilog", pthread_self() ); + debug_msg("server_thread() %d unable to write root epilog", pthread_self() ); } close(client.fd); -- "Behind every great man there's a great backpack" - B. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] RRD_update illegal attempt to update using time 1252671437 when last update time is 1252671437 (minimum one second step)
Hi, our gmetad boxes (2 of them) with 12 data sources, 6 of which are gmetad and 6 gmonds, are spamming syslog like mad with the following message: Sep 6 06:33:32 localhost.localdomain /usr/sbin/gmetad[2526]: RRD_update (/var/lib/ganglia/rrds/...metric.rrd): illegal attempt to update using time 1252244010 when last update time is 1252244010 (minimum one second step) This happens for both metrics and summary graphs. Looking at the hosts every appear to be fine to me, and ntp is running everywhere and in sync. Looking at the code instead both gmetad/gmetad.c and gmetad/data_thread.c have a possibly suspicious call to sleep: in gmetad.c:417 sleep_time = 10 + ((30-10)*1.0) * rand()/(RAND_MAX + 1.0); sleep(sleep_time); in data_thread.c:193 sleep_time = (d->step - 5) + (10 * (rand()/(float)RAND_MAX)) - (end.tv_sec - start.tv_sec); if( sleep_time > 0 ) sleep(sleep_time); two observation: - based on man 3 sleep, if any signal is sent to gmetad, the sleep interval can be 0 - end.tv_sec - start.tv_sec could compute to a considerably high number that along with a short step could result in a sleep_time < = 0. thoughts? thanks -- "Behind every great man there's a great backpack" - B. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers
[Ganglia-developers] Solaris OpenCSW update
If anyone is interested in testing the Ganglia build on the OpenCSW build farm, it is now possible. This provides access to a range of Solaris machines including version 8. libconfuse is now packaged and pre-installed on the OpenCSW machines, so all the dependencies for building Ganglia are there. I've also updated the OpenCSW Makefile (in the OpenCSW SVN) so that a Ganglia 3.1.2 (or maybe 3.1.3 when released) package will appear in their collection shortly. Any suggestions to improve this are welcome. https://gar.svn.sourceforge.net/svnroot/gar/csw/mgar/pkg/ganglia/trunk -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers