On Oct 9, 2006, at 10:25 AM, Derek Crudgington wrote: ... >> What about taking the output of a svcs -a, and >> stashing the STATE and >> STIME columns? >> ... >> >> Although a better data source would be to use svcprop >> so you don't >> have to worry about the STIME representation >> changing over time. ... > Shawn, > > Thanks for the ideas. Monitoring all services would be good.. and > the exclude. One thing I've noticed is not all services have a log > file, so I guess it would just have to use the /var/svc/log to > check if they are available. I don't understand why smfalert now > takes up that many resources.. is it because of the tail process? > So with your idea using the STIME, are you saying have it run svcs - > a every so often and check the STIME? This would seem more resource > hungry to me but not sure..
Derek, There would be a difference in resource consumption. Yes all of the processes, each one has overhead. The way I am looking at it there are tradeoffs... From a configuration standpoint, I would rather just monitor them all (easier to configure) Less time, better coverage, probably more noise From a footprint standpoint, significantly less memory for more CPU I could use less memory by using the existing smfalert and only monitoring selected services More configuration time, ongoing updates Large zone installations...If I had 100 zones I would have some problems with the current implementation (or really 16 and I would be a bit tight, the overhead of the monitoring would make it an untenable solution) One other thing...you might want to make the from address a variable as well. I don't think anyone has the address smfalert at sun.com, but they might get annoyed if they start getting a bunch of bounces and auto responders. Possibly smfalert at hostname Running smfalert in a both the global and a local zone for two days (243 monitored services) you can see that the running scripts are not using a horrendous amount of cpu time, but they are using a relatively large amount of memory. (from a 16GB system) PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT 100 484 1314M 815M 4.4% 0:15:35 0.0% smfalert Other interesting output can been seen from ptree `pgrep smfalert` prstat -J -j smfalert (if you happen to have made a project) What you have is a trade off, in utilization you also have a tradeoff in terms of what you can get information about. Do you want to only monitor service a,b, and d but not c? Or do you want to monitor all services except c? if you were to use svcprop (which I would recommend over svcs for programatic purposes) you could use logic similar to the following (this is off the cuff and not complete) e.g. the while won't work as shown If you were to use svcs -a you would have to worry about timesttamps changing from times to dates online Oct_07 svc:/network/nfs/mapid:default online 12:51:45 svc:/system/system-log:default # setup whle ( my $line = ( svcprop -p restarter/state -p restarter/ state_timestamp -p restarter/logfile '*' ) ) { my ($fmri,$prop,$value) = ((split(/\/:properties| /)[0,-1]; next if ( $fmri ~= /$exclude/ ); $state->{$fmri}->{$prop} = $value; } # update while ( my $line = ( svcprop -p restarter/state -p restarter/ state_timestamp '*' ) ) { $now = ""; my ($fmri,$prop,$value) = ((split(/\/:properties| /)[0,-1]; next if ( $fmri ~= /$exclude/ ); $now->{$fmri}->{$prop} = $value; } # check foreach my $fmri ( keys %{ $state } ) { if ( defined($now->{$fmri}) && $now->{$fmri}->{'/restarter/state_timestamp'} > $state->{$fmri}->{'/restarter/state_timestamp'} ) { push @bad, $fmri; } } # alert foreach my $fmri ( @bad ) { # check for defined logfile, if so gather data, if not maybe "No logfile for service" # or check the restarter log for interesting messages (maybe do this anyway) # assemble error message, translate time into human readable form # send message(es) # possibly update state, for restarted services. #alert_code here # you could/should break your alert code out of the detection logic, # it should make it easier to tweak the messages or delivery methods } # you could run more frequently than this...but how much will you gain? sleep 60 > > > This message posted from opensolaris.org > _______________________________________________ > smf-discuss mailing list > smf-discuss at opensolaris.org -- Shawn Ferry shawn.ferry at sun.com Senior Principal Systems Engineer Sun Managed Operations Delivery 703.579.1948