Re: [opsview-users] Cancel downtime seems to kill nagios

unix Fri, 21 May 2010 04:36:07 -0700

We have had this problem in version 3.0 3.1 and now in 3.5.2 .
Running rhel 5.3 in distributed environment master clustered and 2 slaves.
The cluster service always starts up opsview again after it has crashed.
So our only problem is a lot of Service results are stale when it happens.
Abut 10-20% of our Cancel downtime crashes opsview.
We also had a single opsviewserver and on this server opsview newer crashed.
But of course it  work flawless at the moment.

If we can find a way to provoke it so it happen every time we willtrace it.


On 2010-05-20 19:56, Ton Voon wrote:

On 20 May 2010, at 20:35, Rafael Carneiro wrote:
> It's a distributed environment, where everything but 20 boxes are> monitored by slaves (2 clusters of 2 slaves, about 600 hosts being> monitored).
>
> I seem to be able to replicate that by scheduling and then deleting> downtime for a host group.
>
> I've changed debug_level=-1 and am still only able to see this in> the nagios.log before it crashes: [1274383762] EXTERNAL COMMAND:> DEL_HOSTGROUP_SVC_DOWNTIME;hostgroup_name
>
> I had core dumps enabled, but don't know where to look for them> (not sure if they're being created).
They should be created in the /usr/local/nagios/etc directory.

An strace would be helpful.

Ton

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Re: [opsview-users] Cancel downtime seems to kill nagios

Reply via email to