> Derek, > > I like it although I have some ideas. > > > Some thoughts on usage: > > It might be a bit more useful on error if you output > a reason for not > starting. > e.g. if you don't have all your options, which one(s) > are missing or > malformed > > > Taking the usage to the extreme... > Die on no logfile is not very friendly, it would seem > reasonable to > me to monitor all > running services. > > I can see a use for an include 'all' option while > also adding an > exclude option. > > e.g. > ./smfalert.pl -m send -r root -i "all" -e > "gss:default" > > As trying to monitor everything at the moment is a > bit unwieldy > ./smfalert.pl -m send -r root -i "`svcs | awk > '/gss|xfs|stfsloader| > smserver|rstat|rusers/{next} /svc:/{print $3}' ORS=' > '`" > > > Some thoughts on implementation: > > I think there may be an alternate method to achieve > the broader goal > of service monitoring given the current > capabilities of SMF, I would still like to see an > include/exclude > method in any case. > > Right now smfalert is a bit on the heavy side when > run against all > services with logfiles. > (I created a project and launched smfalert in the new > project to > allow for easier tracking) > > projadd smfalert > newtask -p smfalert ./smfalert.pl -m send -r root -i > "`svcs -a | awk > '/gss|xfs|stfsloader|smserver|rstat|rusers/{next} > /svc:/{print $3}' > ORS=' '`" > > prstat -J > PROJID NPROC SIZE RSS MEMORY TIME CPU > PROJECT > 1 19 190M 94M 0.6% 5:53:05 0.0% > user.root > 100 258 700M 434M 2.4% 0:00:00 0.0% > smfalert > 0 57 268M 196M 1.1% 29:10:32 0.0% > system > > Or running in both the global and a non-global zone > prstat -J > PROJID NPROC SIZE RSS MEMORY TIME CPU > PROJECT > 1 21 199M 99M 0.6% 5:53:06 0.0% > user.root > 0 57 268M 196M 1.1% 29:10:33 0.0% > system > 100 484 1314M 815M 4.4% 0:00:00 0.0% > smfalert > > What about taking the output of a svcs -a, and > stashing the STATE and > STIME columns? > > You could then with one process per-zone parse the > output from svcs - > a and report on changes in status > including the logfile output as needed and available. > > You could then have messages that contain: > Hostname: foo > Instance: system-log > Previous State(Time): Online(May_31) > Current State(Time): Online(17:37:55) > > > Although a better data source would be to use svcprop > so you don't > have to worry about the STIME representation > changing over time. > > First Run: > svcprop -p restarter/state -p > restarter/state_timestamp -p restarter/ > logfile system-log > 1160256235.416468000 > online > > > Subsequent (as you have cached the logfile if there > is one): > svcprop -p restarter/state -p > restarter/state_timestamp system-log > 1160256235.416468000 > online > > > Shawn > > On Oct 7, 2006, at 11:09 AM, Derek Crudgington wrote: > > >> * Derek Crudgington <dacrud at gmail.com> [2006-10-04 > >> 08:34]: > >>> If anyone is interested, I have written a Perl > >> daemon that runs in the > >>> background and monitors SMF services to e-mail > you > >> when something > >>> happens. Check http://hell.jedicoder.net/?p=83 > for > >> more info. > >> > >> Cool. I had a brief look; you can simplify > >> $grep = `svcs -l $smf | grep logfile`; > >> > >> to > >> $grep = `svcprop -p restarter/logfile $smf`; > >> (and probably change the variable name...). We > >> should be able to get > >> a proper API in place for transitions, so you > >> needn't be forced to log > >> scrape to see stops and starts. > >> - Stephen > >> - > >> Stephen Hahn, PhD Solaris Kernel Development, Sun > >> Microsystems > >> stephen.hahn at sun.com http://blogs.sun.com/sch/ > >> _______________________________________________ > >> smf-discuss mailing list > >> smf-discuss at opensolaris.org > >> > > > > Thanks for the tip! > > > > > > This message posted from opensolaris.org > > _______________________________________________ > > smf-discuss mailing list > > smf-discuss at opensolaris.org > > -- > Shawn Ferry shawn.ferry at sun.com > Senior Principal Systems Engineer > Sun Managed Operations Delivery > 703.579.1948 > > > _______________________________________________ > smf-discuss mailing list > smf-discuss at opensolaris.org >
Shawn, Thanks for the ideas. Monitoring all services would be good.. and the exclude. One thing I've noticed is not all services have a log file, so I guess it would just have to use the /var/svc/log to check if they are available. I don't understand why smfalert now takes up that many resources.. is it because of the tail process? So with your idea using the STIME, are you saying have it run svcs -a every so often and check the STIME? This would seem more resource hungry to me but not sure.. This message posted from opensolaris.org