We have had this problem in version 3.0 3.1 and now in 3.5.2 .
Running rhel 5.3 in distributed environment master clustered and 2 slaves.
The cluster service always starts up opsview again after it has crashed.
So our only problem is a lot of Service results are stale when it happens.
Abut 10-20% of our Cancel downtime crashes opsview.
We also had a single opsviewserver and on this server opsview newer crashed.
But of course it work flawless at the moment.
If we can find a way to provoke it so it happen every time we will
trace it.
On 2010-05-20 19:56, Ton Voon wrote:
On 20 May 2010, at 20:35, Rafael Carneiro wrote:
> It's a distributed environment, where everything but 20 boxes are
> monitored by slaves (2 clusters of 2 slaves, about 600 hosts being
> monitored).
>
> I seem to be able to replicate that by scheduling and then deleting
> downtime for a host group.
>
> I've changed debug_level=-1 and am still only able to see this in
> the nagios.log before it crashes: [1274383762] EXTERNAL COMMAND:
> DEL_HOSTGROUP_SVC_DOWNTIME;hostgroup_name
>
> I had core dumps enabled, but don't know where to look for them
> (not sure if they're being created).
They should be created in the /usr/local/nagios/etc directory.
An strace would be helpful.
Ton
_______________________________________________
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users