I've raised https://secure.opsera.com/jira/browse/OPS-1165 for this
problem.
I would appreciate if we could have more information about this
particular problem so we can look into why it is happening.
If you can consistently reproduce it, can we have access to your system?
Ton
On 21 May 2010, at 12:34, unix wrote:
We have had this problem in version 3.0 3.1 and now in 3.5.2 .
Running rhel 5.3 in distributed environment master clustered and 2
slaves.
The cluster service always starts up opsview again after it has
crashed.
So our only problem is a lot of Service results are stale when it
happens.
Abut 10-20% of our Cancel downtime crashes opsview.
We also had a single opsviewserver and on this server opsview newer
crashed.
But of course it work flawless at the moment.
If we can find a way to provoke it so it happen every time we will
trace it.
On 2010-05-20 19:56, Ton Voon wrote:
On 20 May 2010, at 20:35, Rafael Carneiro wrote:
> It's a distributed environment, where everything but 20 boxes
are > monitored by slaves (2 clusters of 2 slaves, about 600 hosts
being > monitored).
>
> I seem to be able to replicate that by scheduling and then
deleting > downtime for a host group.
>
> I've changed debug_level=-1 and am still only able to see this
in > the nagios.log before it crashes: [1274383762] EXTERNAL
COMMAND: > DEL_HOSTGROUP_SVC_DOWNTIME;hostgroup_name
>
> I had core dumps enabled, but don't know where to look for them
> (not sure if they're being created).
They should be created in the /usr/local/nagios/etc directory.
An strace would be helpful.
Ton
_______________________________________________
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users