Re: [omd-users] var/rrdcached growing

Alexander Rusa Tue, 23 Oct 2012 01:58:55 -0700

Danke!
Aber ich hab den rrdached und das ganze omd jetzt schon so oft restarted...
Ich versteh noch immer nicht welcher Prozess mit diesen Journal-Daten was genau 
machen sollte.
Dank deiner Info vermute ich jetzt, dass der rrdcached selbst diese Daten von 
dort wieder lesen und weiterverarbeiten sollte, aber ganz sicher bin ich mir 
noch immer nicht.


        "On startup, the daemon will check for journal files in this directory. 
 If found, all updates therein will be read into
        memory before the daemon starts accepting new connections."

Hmm... alle 11GB werden erst in den RAM geladen bevor der Daemon neue 
connections annimmt? Das könnte eng werden... ;-)

Möglicherweise sammeln sich diese Daten an wenn der Server zeitweise 
überfordert ist und ab einer gewissen größe schafft er es einfach nicht mehr 
die Daten weiter zu verarbeiten.
Bevor die Festplatte voll wird werde ich vermutlich alle journal-Files löschen 
und mich mal darauf konzentrieren den Server zu entlasten.

LG Alex

Am 23.10.2012 um 10:34 schrieb Thomas Kladaric <[email protected]>:

> Hallo,
> 
> hier ein Auszug aus der PNP4Nagios Doku:
> Option -j definiert den Pfad zu einem Journal-Verzeichnis. Dort werden alle 
> Aufträge protokolliert und ggf. beim nächsten Start nachgefahren, falls der 
> rrdcached-Daemon abstürzt.
> 
> -j /var/cache/rrdcached
> Vielleicht hilft dir das ja weiter.
> 
> Mit freundlichen Grüßen / best regards
> 
> Thomas Kladaric
> Systemberatung 
> 
> ITeratio GmbH
> Hollweghstr. 22-26
> D-51103 Köln
> 
> Tel.:         +49 (0) 221 829 18 60
> Fax.: +49 (0) 221 829 18 61
> Web:  http://www.iteratio.com
> 
> Geschäftsführung: Rolf Assenmacher, Hardy Düttmann, Thomas Glöckner
> 
> Sitz der Gesellschaft: Köln
> Registergericht: Köln, HRB 35517
> USt.-Id Nr. DE 215 675 338
> 
> Am 23.10.2012 um 10:13 schrieb Alexander Rusa:
> 
>> Hallo,
>> 
>> ich habe heute morgen entdeckt, dass eher keine Timeout-errors mehr 
>> auftreten und alles OK aussieht im perfdata.log.
>> 
>> Aber was ich nicht und nicht verstehe ist warum die rrdcache-journal-Daten 
>> immer mehr werden und welcher Prozess mit diesen Daten eigentlich was genau 
>> machen sollte!
>> kann mir bitte jemand helfen das zu verstehen?
>> 
>> Mir kommt vor dieser Part fehlt irgendwie in der Grafik auf 
>> http://omdistro.org/wiki/omd/Pnp4nagios
>> 
>> Ich habe jetzt schon über 160 Dateien mit insgesamt über 11GB in 
>> omd/sites/.../var/rrdcached/rrd.journal.* und es werden scheinbar nicht 
>> weniger.
>> 
>> LG Alex
>> 
>> Am 22.10.2012 um 16:42 schrieb Alexander Rusa <[email protected]>:
>> 
>>> Hi,
>>> 
>>> My /opt/omd/sites/.../var/rrdcached directory is growing very fast.
>>> At the moment it contains 151 files with a total of ~9GB.
>>> Currently I am running version 0.56.
>>> It looks like this problem exists since upgrading to 0.52.
>>> 
>>> Last week I tried to find the source of the problem and ended up deleting 
>>> everything inside var/pnp4nagios/perfdata/ because I found out that there 
>>> were some problems because the RRD_STORAGE_TYPE was changed to MULTIPLE and 
>>> after spending some hours in trying to convert the old rrd-files I gave up 
>>> and deleted the whole performance-data-history.
>>> 
>>> Now the Disk space is again critical and I have no idea what the problem 
>>> could be!
>>> 
>>> We are monitoring about 4000 Services.
>>> 
>>> The var/pnp4nagios/log/perfdata.log shows nothing but timeouts:
>>> 
>>> #####
>>> ...
>>> 2012-10-22 16:25:29 [20877] [1] process_perfdata.pl-0.6.19 starting in BULK 
>>> Mode called by NPCD
>>> 2012-10-22 16:25:29 [20877] [1] Found Performance Data for server1 / _HOST_ 
>>> (rta=0.241ms;200.000;500.000;0; pl=0%;40;80;; rtmax=0.298ms;;;; 
>>> rtmin=0.198ms;;;;) 
>>> 2012-10-22 16:25:29 [20879] [1] process_perfdata.pl-0.6.19 starting in BULK 
>>> Mode called by NPCD
>>> 2012-10-22 16:25:29 [20879] [1] Found Performance Data for server2 / 
>>> CPU_load (load1=8.13;20;40;0; load5=8.8;20;40;0; load15=9.12;20;40;0;) 
>>> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: Timeout after 15 secs. ***
>>> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: Deleting current file to avoid 
>>> NPCD loops
>>> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: Please check your 
>>> process_perfdata.cfg
>>> 2012-10-22 16:25:44 [20877] [0] *** TIMEOUT: 
>>> /omd/sites/emerion/var/pnp4nagios/spool//perfdata.1350915913-PID-20877 
>>> deleted
>>> 2012-10-22 16:25:44 [20877] [0] *** Timeout while processing Host: 
>>> "server1" Service: "_HOST_"
>>> 2012-10-22 16:25:44 [20877] [0] *** process_perfdata.pl terminated on 
>>> signal ALRM
>>> ...
>>> #####
>>> 
>>> Can anyone tell me where I could find the root for the problem?
>>> 
>>> One thing I know is, that the server sometimes has a very high load and we 
>>> are planing to move some services away from this machine, but even when I 
>>> stop some resource-eating services only timeouts are showing up in the 
>>> perfdata.log
>>> 
>>> Best regards,
>>> 
>>> Alex
>>> _______________________________________________
>>> omd-users mailing list
>>> [email protected]
>>> http://lists.mathias-kettner.de/mailman/listinfo/omd-users
>> 
>> _______________________________________________
>> omd-users mailing list
>> [email protected]
>> http://lists.mathias-kettner.de/mailman/listinfo/omd-users
> 
> _______________________________________________
> omd-users mailing list
> [email protected]
> http://lists.mathias-kettner.de/mailman/listinfo/omd-users

_______________________________________________
omd-users mailing list
[email protected]
http://lists.mathias-kettner.de/mailman/listinfo/omd-users

Re: [omd-users] var/rrdcached growing

Reply via email to