Re: [Nagios-users-br] Nagios em rede GRANDE, BEM GRANDE.

2010-08-23 Thread Marcel
Com 2 serviços por host vc só deve pingar sua rede para determinar
alcançabilidade, certo?

Qual plugin vc usa? qual versão dos plugins? qual versão do nagios? Qual

Existem recomendações para melhorar o nível da investigação, mas algumas
questões podem ser atacadas sem qualquer informação adicional.

1) Vc está com uma média de tempo de execução das checagens um tanto quanto
alta: 402.97 segundos, qual é a frequência que está checando seus hosts?
tente aumentar a frequência para um tempo um pouco maior, para todas as
checagens (de 5 para 10 minutos), e analise o comportamento.
2) Verifique por mais de um processo pai (PPID=1), se houver mais de um
processo pai, pode causar interferência ao nagios, já que eles irão
compartilhar o objects.cache, retention.dat, status.dat e isso sempre é
motivo de problemas.
3) Se minha asserção sobre 2 serviços por hosts serem pings, tente trocar o
plugin check_ping pelo check_icmp.
4) Caso nenhuma das alternativas acima indique causa raiz, atualize o nagios
e implemente as recomendações de tuning:

Espero ter ajudado,

2010/8/21 Everton Pestana

 Prezadas e prezados,

 Trabalho numa empresa grande, e tenho um grande parque de servidores e
 serviços a serem monitorados.

 Preciso de de uma ajuda pois o nagios esta tendo um comportamento
 muito estranho.

 Hoje estou rodando o nagios  com um único no de processamento com 2GB de

 Com aproximadamente 3000 hosts e 6000 serviços.


  Services Actively Checked:
  Time FrameServices Checked = 1 minute:147 (2.6%)= 5 minutes:5574 (99.5%)
 15 minutes:5574 (99.5%)= 1 hour: 5574 (99.5%)Since program start:  5574
 (99.5%)MetricMin.Max.Average Check Execution Time:  0.00 sec23.26
 sec Check Latency:0.00 sec402.97 sec0.872 sec Percent State Change:0.00%

 Check Statistics:
  TypeLast 1 MinLast 5 MinLast 15 Min Active Scheduled Service

 O que tem acontecido com o nagios, em determinados momento parece que
 a maquina fica totalmente inativa, caem absurdamente os trafegos das
 interfaces (quase zerando) e o load consequentemente cai tb.

 Nesse momento observei que o nagios continua rodando, mas nenhum
 processo filho é executado mais, a maquina parece morta.
 Se eu der um reload no nagios tudo volta ao normal, mas depois de
 algumas horas depois acontece novamente o mesmo problema.Normalmente
 aconteceu as vezes que percebi as 4h da manha.

 Olhei todos os logs do nagios e de sistema possíveis e imaginaveis, e
 não ache nenhum erro nada que pudesse apontar tal comportamento.

 Desde já muito obrigado pela ajuda.


 Everton Pestana

2010-08-23 Thread Everton Pestana

Re: [Nagios-users-br] Nagios em rede GRANDE, BEM GRANDE.

2010-08-23 Thread Leonardo Carneiro
2010/8/21 Everton Pestana

 Trabalho numa empresa grande, e tenho um grande parque de servidores e
 serviços a serem monitorados.

 Preciso de de uma ajuda pois o nagios esta tendo um comportamento
 muito estranho.

 Hoje estou rodando o nagios  com um único no de processamento com 2GB de Ram.

 Com aproximadamente 3000 hosts e 6000 serviços.


 Services Actively Checked:
  Time FrameServices Checked = 1 minute:147 (2.6%)= 5 minutes:5574 (99.5%) =
 15 minutes:5574 (99.5%)= 1 hour: 5574 (99.5%)Since program start:  5574
 (99.5%)    MetricMin.Max.Average Check Execution Time:  0.00 sec23.26 sec0.402
 sec Check Latency:0.00 sec402.97 sec0.872 sec Percent State Change:0.00%

 Check Statistics:
 TypeLast 1 MinLast 5 MinLast 15 Min Active Scheduled Service Checks22526008

 O que tem acontecido com o nagios, em determinados momento parece que
 a maquina fica totalmente inativa, caem absurdamente os trafegos das
 interfaces (quase zerando) e o load consequentemente cai tb.

 Nesse momento observei que o nagios continua rodando, mas nenhum
 processo filho é executado mais, a maquina parece morta.
 Se eu der um reload no nagios tudo volta ao normal, mas depois de
 algumas horas depois acontece novamente o mesmo problema.Normalmente
 aconteceu as vezes que percebi as 4h da manha.

 Olhei todos os logs do nagios e de sistema possíveis e imaginaveis, e
 não ache nenhum erro nada que pudesse apontar tal comportamento.

 Desde já muito obrigado pela ajuda.


 Everton Pestana
Olá Everton, na lista internacional do Nagios existe uma discussão
exatamente sobre um problema semelhante ao seu: problemas de
estabilidade e escalabilidade em instâncias muito grandes do Nagios.

Sugiro que vc dê uma olhada no histórico, pois o pessoal fez uma
discussão bem longa com vááárias dicas de como resolver o problema.

Pelo que eu lembro, não foi nenhuma ação isolada que corrigiu esse
tipo de problema, mas várias ações que aumentaram a eficiencia do
Nagios em processar os serviços e hosts.

[Nagios-users] How can we configure into nagios dynamic thresholds depending on timeframes

2010-08-23 Thread Alex Peeters

Dear Sire,

How can we configure into nagios dynamic thresholds depending on timeframes.

Example.   -w = 80 -c = 90 during business hours but -w = 90 -c = 95  
outsite business hours.

How can we configure into nagios dynamic thresholds depending on  
timeframes: Part II

define service{
 use local-service ; Name  
of service template to use
 host_name   localhost
 service_description Current Users
   check_command   check_local_users!20!50
   check_period nonworkhours
   notification_period  nonworkhours

define service{
 use local-service ; Name  
of service template to use
 host_name   localhost
 service_description Current Users
   check_command   check_local_users!40!60
   check_period workhours
   notification_period  workhours

In my above example i dit configure the same test twice. The two  
timeframes 'nonworkhours' en 'workhours' together equals 24x7

Is this way of configuring allowed. Because this solves my problem.

1) how will the nagios scheduling react on this configuration?

2) how will the display react on this configuration?

# 'workhours' timeperiod definition
define timeperiod{
 timeperiod_name workhours
 alias   Normal Working Hours
 monday  09:00-17:00
 tuesday 09:00-17:00
 wednesday   09:00-17:00
 friday  09:00-17:00

# 'nonworkhours' timeperiod definition
define timeperiod{
 timeperiod_name nonworkhours
 alias   Non-Work Hours
 sunday  00:00-24:00
 monday  00:00-09:00,17:00-24:00
 tuesday 00:00-09:00,17:00-24:00
 wednesday   00:00-09:00,17:00-24:00
 friday  00:00-09:00,17:00-24:00

check_period: This directive is used to specify the short name of the  
time period during which active checks of this host can be made.

check_period: This directive is used to specify the short name of the  
time period during which active checks of this service can be made.

If you do not use the check_period directive to specify a timeperiod,  
Nagios will be able to schedule active
checks of the host or service anytime it needs to. This is essentially  
a 24x7 monitoring scenario.

Specifying a timeperiod in the check_period directive allows you to  
restrict the time that Nagios perform
regularly scheduled, active checks of the host or service. When Nagios  
attempts to reschedule a host or
service check, it will make sure that the next check falls within a  
valid time range within the defined
timeperiod. If it doesn’t, Nagios will adjust the next check time to  
coincide with the next valid time in
the specified timeperiod. This means that the host or service may not  
get checked again for another hour,
day, or week, etc.

Exclusions and Host/Service Checks - There is a bug in the  
service/host check scheduling logic
that rears its head when you use timeperiod definitions that use the  
excludedirective. The
problem occurs when Nagios Core tries to re-schedule the next check.  
In this case, the
scheduling logic may incorrectly schedule the next check further out  
in the future than it
should. In essence, it skips over the (missing) logic where it could  
determine an earlier possible
time using the exception times. Imperfect Solution: Don’t use  
timeperiod definitions that
exclude other timeperods for your host/service check periods. A fix is  
being worked on, and
will hopefully make it into a 3.4.x release.

Vriendelijke Groeten,

-- Alex Peeters

2010-08-23 Thread Alex Peeters

Re: [Nagios-users] contactgroup definition

2010-08-23 Thread Assaf Flatto
You should get at least a warning for the missing contactgroups wrote:
 Is it normal that there is NO error on startup when I
 - in a contact definition
 - add a contactgroups directive
 - for a contactgroup that has not been defined?

 I only get an error when I try to use a non-existing contact group in a
 service definition. This makes me wonder what is going on.

 [FEATURE REQUEST] No matter how it finally works, it would be nice to be
 able to generate the groups on-the-fly as mentioned above, but with the
 possibility to view the resulting groups.


Re: [Nagios-users] Can NRPE Output Be Graphed

2010-08-23 Thread Assaf Flatto
Robert Jackson wrote:

 I’m in the process of setting up NRPE daemons on remote Linux servers 
 to enable me to monitor them. I’ve set-up a couple of checks (zombie 
 and total processes) as per the documentation. Everything is working 
 fine and Nagios reports correctly the numbers involved. I’m now 
 wondering if these checks can/will output to PNP4Nagios to enable them 
 to be graphed?

 Also what else can I use NRPE for to monitor Linux servers?

Hello Robert

To answer your first question - Yes PNP4Nagios will be able to graph the 
data .

The second question , short answer is - any thing you want , nrpe is 
only the middle man between the nagios server and the plugins to be 
executed on the remote machines , to that end it is able to do anything 
you write a plugin to check and pass the results back to nagios.


Re: [Nagios-users] Send_nsca problem

2010-08-23 Thread Assaf Flatto
Eric Anderson wrote:

 I have what I think is a very basic problem but I cannot seem to locate it. I 
 believe it to be a permissions issue.
 I'm attempting to use send_nsca to forward received syslog traffic to the 
 Nagios process.

 This article describes using using syslog to forward to send_nsca and then to 
 a Nagios server running nsca:

 I'm attempting something similar except with a twist; the basic idea is this:
 1. Syslog receives messages from clients.
 2. A script parses syslog and sends the info to send_nsca process on the same 
 3. Send_nsca sends to nsca running on this host
 4. NSCA forwards to Nagios.

 I've successfully got Nagios and NSCA running. At this point I want to test 
 send_nsca with the following command:
 send_nsca locahost -c /usr/local/nagios/nsca-2.7.2/sample-config/  

 The test file contaings:
 localhosttabTestMessagetab0tabThis is a test

 After running this file, I get this in my /var/log/messages file:
 Aug 12 22:16:54 . nsca[25712]: Handling the connection...
 Aug 12 22:16:54 . nsca[25712]: SERVICE CHECK - Host name: 'localhost', 
 Service Description: 'Test Message', Return Code: '0', Output: 'This is a 
 test message.'
 Aug 12 22:16:54 . nsca[25712]: Command file 
 '/var/spool/nagios/cmd/nagios.cmd' does not exist, attempt to use alternate 
 dump file '/var/spool/nagios/cmd/nsca.dump' for output
 Aug 12 22:16:54 . nsca[25712]: Could not open alternate dump file 
 '/var/spool/nagios/cmd/nsca.dump' for appending
 Aug 12 22:16:54 . nsca[25712]: End of connection...

 Can anyone suggest where I may be going wrong here?



 If I ls -haltr on /var/spool/nagios/cmd I get the following:
 drwxr-xr-x 3 nagios nagios 4.0k 2009-11-10 11:09 ..
 prw-rw-r-- 1 nagios nagcmd 0 2010-08-12 15:16 nagios.cmd
 prw-rw-r-- 1 nagios nagcmd 0 2010-08-12 15:25 nsca.dump
Is nagios part of the nagcmd group ?

are you executing as root or as nagios user ?

Re: [Nagios-users] single email alert to multiple contacts?

2010-08-23 Thread Parish, Brent
Agree totally!  

All alerts from Nagios go to the same post-processing script we built
and that's where they get shuffled off where they need to go, based on
user preferences.

We built a database and simple CGI interface (within Nagios pages).
Users click on the preferences link and subscribe to systems they are
interested in receiving alerts from.  They can then decide what email
address to send to, based on time of day, hostname, alert level, etc.

That takes virtually all the alert management off the Nagios maintainer
(me!) and allows people to modify their own contact information (e.g. at
work, send to instant messenger.  on vacation, send to phone.  At home,
send to home email. etc)

There are fall through rules that can optionally send to an admin
group mailbox (with appropriate verbiage in the alert message indicating
the fall through) if no one is subscribed to get the alert.

Finally, we built a quick  dirty reporting page that lists all
contacts for all services for all hosts, so we can glance through and
pin down gaps.

Re: [Nagios-users] single email alert to multiple contacts?

2010-08-23 Thread Sean McAfee
Parish, Brent wrote:
 Agree totally!  
 All alerts from Nagios go to the same post-processing script we built
 and that's where they get shuffled off where they need to go, based on
 user preferences.

This sounds amazing!

Is there any chance this could be released to the community?

Sean McAfee
Senior Systems Engineer

Re: [Nagios-users] single email alert to multiple contacts?

2010-08-23 Thread Julian Hein

Am 23.08.10 21:22 schrieb Sean McAfee unter

 Parish, Brent wrote:
 Agree totally!  
 All alerts from Nagios go to the same post-processing script we built
 and that's where they get shuffled off where they need to go, based on
 user preferences.
 This sounds amazing!
 Is there any chance this could be released to the community?

You could look at NoMa (Notification Manager), which does the same:


