I've raised https://secure.opsera.com/jira/browse/OPS-1165 for this problem.

I would appreciate if we could have more information about this particular problem so we can look into why it is happening.

If you can consistently reproduce it, can we have access to your system?

Ton

On 21 May 2010, at 12:34, unix wrote:

We have had this problem in version 3.0 3.1 and now in 3.5.2 .
Running rhel 5.3 in distributed environment master clustered and 2 slaves. The cluster service always starts up opsview again after it has crashed. So our only problem is a lot of Service results are stale when it happens.
Abut 10-20% of our Cancel downtime crashes opsview.
We also had a single opsviewserver and on this server opsview newer crashed.
But of course it  work flawless at the moment.
If we can find a way to provoke it so it happen every time we will trace it.

On 2010-05-20 19:56, Ton Voon wrote:
On 20 May 2010, at 20:35, Rafael Carneiro wrote:
> It's a distributed environment, where everything but 20 boxes are > monitored by slaves (2 clusters of 2 slaves, about 600 hosts being > monitored).
>
> I seem to be able to replicate that by scheduling and then deleting > downtime for a host group.
>
> I've changed debug_level=-1 and am still only able to see this in > the nagios.log before it crashes: [1274383762] EXTERNAL COMMAND: > DEL_HOSTGROUP_SVC_DOWNTIME;hostgroup_name
>
> I had core dumps enabled, but don't know where to look for them > (not sure if they're being created).
They should be created in the /usr/local/nagios/etc directory.
An strace would be helpful.
Ton

_______________________________________________
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users

Reply via email to