so, i'm still trying to wrap my brain around this issue. a couple of nights ago, i got another flood of pages in the middle of the night, from one of the servers. all of these servers do some extremely intensive processing and data xfer over the wire during the night. they're all pretty equally loaded during this time. yet only one server apparently caused zenoss to think something was wrong. in addition to the mongrel_rails processes running on each of these servers, there's also a single instance of memcached running on each server. so a give server has one memcached, and fourteen mongrels. in zenoss, each server has been carefully modelled. the mongrels process is set up with
'zignore parameters' true 'zalert on restart' false 'zcount procs' true (though i get the same results with 'false') 'zfail severity' error 'zmonitor' true memcached is set up the same way. last night, when the one server generated alerts, i noticed in the history that memcached was shown to be down at exactly the same time as the fourteen mongrels were reported down. but three minutes later, it self-reported that it was back up - whereas the mongrels never showed up again. no page was generated for memcached because of that, but i got fourteen pages for each of the mongrels that were reported down. importantly - none of these processes were actuall ever down. why zenoss thought they were down, no idea. but i'm confused as to why the memcached would automatically self-report that it was back up, while the mongrels never do so. as i said, i'm still trying to wrap my brain around all this. i need to prevent these spurious alerts from being generated, as they're annoying, and i need my sleep if nothing is really wrong! any help gratefully appreciated. i wish the documentation were a little less dense in some places, and a little more dense in others, as i might be able to figure this all out myself. for example, the 'count process' option - the admin guide says "Determines the number of instances of the process that are running.". okey dokey! but what does that actually *do* for me? under what circumstances is it desireable to count the processes? what ramifications does that setting have for an alerting issue as i'm experiencing? anyway, i'll stop chattering here, and maybe a guru can enlighten me. -------------------- m2f -------------------- Read this topic online here: http://community.zenoss.com/forums/viewtopic.php?p=22204#22204 -------------------- m2f -------------------- _______________________________________________ zenoss-users mailing list [email protected] http://lists.zenoss.org/mailman/listinfo/zenoss-users
