Le 2 août 2011 15:35, "CAVAGNINI Damien" <damien.cavagn...@thalesgroup.com>
a écrit :
>
> Hello,
>
> At my Company, we are testing both gearmand and shinken for our next
monitoring infrastructure.
> We are facing some problems with shinken pollers, lots of checks are
ending in defunct process. (via nagios perl plugins, both officials and of
our own)
> Sometimes we have up to 800 zombies at a time.
>
> It seems like, the zombies are noticed as snmp_timeout in the log.
>
> I tried different values for service_chek_timeout and host_check_timeout,
without success.
>
> Actually, both values are set to 60.
>
> The same plugins are used by nagios and gearmand, and show no problem.
>
> We are checking 16000 services with one poller. The same poller is used
for shinken or nagios / gearmand (not at the same time of course ;)
>
> 16000 checks show no problem with nagios / germand.
>
>
>
> The poller is a 8 cores / 16 MT cores with 12 Go RAM.
> We have another physical server as arbiter, broker (ndo and NPCD),
receiver and reactionner ; and some VMs (from 1 up to 3 ) for schedulers.
Hi,
In fact, each worker of the poller launch max_process_by_worker and check
them each 0.001..0.1 seconds (slow start like algo here). So yes it imply a
higer level of temporary zombie when a lot of checks are comming in the same
second, but should not be long and so much.
You can try to increase the min_workers (tips : put 0 and it will take the
number of cpus, so 8 I think, MT will not be taken into account I think) and
decrease the process_by_worker (256->128).
So there will be more "waiting for sons" processes with the same number of
sons, so less zombies :)
Jean
>
> Any idea about the zombies ?
>
> Regards
>
>
>
------------------------------------------------------------------------------
> BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
> The must-attend event for mobile developers. Connect with experts.
> Get tools for creating Super Apps. See the latest technologies.
> Sessions, hands-on labs, demos & much more. Register early & save!
> http://p.sf.net/sfu/rim-blackberry-1
> _______________________________________________
> Shinken-devel mailing list
> Shinken-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/shinken-devel
>
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts.
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel