so, i'm still trying to wrap my brain around this issue. a couple of nights 
ago, i got another flood of pages in the middle of the night, from one of the 
servers. all of these servers do some extremely intensive processing and data 
xfer over the wire during the night. they're all pretty equally loaded during 
this time. yet only one server apparently caused zenoss to think something was 
wrong. in addition to the mongrel_rails processes running on each of these 
servers, there's also a single instance of memcached running on each server. so 
a give server has one memcached, and fourteen mongrels. in zenoss, each server 
has been carefully modelled. the mongrels process is set up with 

'zignore parameters' true
'zalert on restart' false
'zcount procs' true (though i get the same results with 'false')
'zfail severity' error
'zmonitor' true

memcached is set up the same way.

last night, when the one server generated alerts, i noticed in the history that 
memcached was shown to be down at exactly the same time as the fourteen 
mongrels were reported down. but three minutes later, it self-reported that it 
was back up - whereas the mongrels never showed up again. no page was generated 
for memcached because of that, but i got fourteen pages for each of the 
mongrels that were reported down. 

importantly - none of these processes were actuall ever down. why zenoss 
thought they were down, no idea. but i'm confused as to why the memcached would 
automatically self-report that it was back up, while the mongrels never do so. 

as i said, i'm still trying to wrap my brain around all this. i need to prevent 
these spurious alerts from being generated, as they're annoying, and i need my 
sleep if nothing is really wrong!

any help gratefully appreciated. i wish the documentation were a little less 
dense in some places, and a little more dense in others, as i might be able to 
figure this all out myself. for example, the 'count process' option - the admin 
guide says "Determines the number of instances of the process that are 
running.". okey dokey! but what does that actually *do* for me? under what 
circumstances is it desireable to count the processes? what ramifications does 
that setting have for an alerting issue as i'm experiencing?

anyway, i'll stop chattering here, and maybe a guru can enlighten me.




-------------------- m2f --------------------

Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=22204#22204

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to