Re: [opsview-users] Cancel downtime seems to kill nagios

Ton Voon Fri, 21 May 2010 05:09:37 -0700

I've raised https://secure.opsera.com/jira/browse/OPS-1165 for thisproblem.

I would appreciate if we could have more information about thisparticular problem so we can look into why it is happening.


If you can consistently reproduce it, can we have access to your system?

Ton

On 21 May 2010, at 12:34, unix wrote:

We have had this problem in version 3.0 3.1 and now in 3.5.2 .
Running rhel 5.3 in distributed environment master clustered and 2slaves.The cluster service always starts up opsview again after it hascrashed.So our only problem is a lot of Service results are stale when ithappens.
Abut 10-20% of our Cancel downtime crashes opsview.
We also had a single opsviewserver and on this server opsview newercrashed.
But of course it  work flawless at the moment.
If we can find a way to provoke it so it happen every time we willtrace it.
On 2010-05-20 19:56, Ton Voon wrote:
On 20 May 2010, at 20:35, Rafael Carneiro wrote:
> It's a distributed environment, where everything but 20 boxesare > monitored by slaves (2 clusters of 2 slaves, about 600 hostsbeing > monitored).
>
> I seem to be able to replicate that by scheduling and thendeleting > downtime for a host group.
>
> I've changed debug_level=-1 and am still only able to see thisin > the nagios.log before it crashes: [1274383762] EXTERNALCOMMAND: > DEL_HOSTGROUP_SVC_DOWNTIME;hostgroup_name
>
> I had core dumps enabled, but don't know where to look for them> (not sure if they're being created).
They should be created in the /usr/local/nagios/etc directory.
An strace would be helpful.
Ton


_______________________________________________
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users

Re: [opsview-users] Cancel downtime seems to kill nagios

Reply via email to