[smf-discuss] Re: Re: SMF Alerter

Derek Crudgington Mon, 09 Oct 2006 07:25:56 PDT

> Derek,
> 
> I like it although I have some ideas.
> 
> 
> Some thoughts on usage:
> 
> It might be a bit more useful on error if you output
> a reason for not  
> starting.
> e.g. if you don't have all your options, which one(s)
> are missing or  
> malformed
> 
> 
> Taking the usage to the extreme...
> Die on no logfile is not very friendly, it would seem
> reasonable to  
> me to monitor all
> running services.
> 
> I can see a use for an include 'all' option while
> also adding an  
> exclude option.
> 
> e.g.
> ./smfalert.pl -m send -r root -i "all" -e
> "gss:default"
> 
> As trying to monitor everything at the moment is a
> bit unwieldy
> ./smfalert.pl -m send -r root -i "`svcs | awk
> '/gss|xfs|stfsloader| 
> smserver|rstat|rusers/{next} /svc:/{print $3}' ORS='
> '`"
> 
> 
> Some thoughts on implementation:
> 
> I think there may be an alternate method to achieve
> the broader goal  
> of service monitoring given the current
> capabilities of SMF, I would still like to see an
> include/exclude  
> method in any case.
> 
> Right now smfalert is a bit on the heavy side when
> run against all  
> services with logfiles.
> (I created a project and launched smfalert in the new
> project to  
> allow for easier tracking)
> 
> projadd smfalert
> newtask -p smfalert ./smfalert.pl -m send -r root -i
> "`svcs -a | awk  
> '/gss|xfs|stfsloader|smserver|rstat|rusers/{next}
> /svc:/{print $3}'  
> ORS=' '`"
> 
> prstat -J
> PROJID    NPROC  SIZE   RSS MEMORY      TIME  CPU
> PROJECT
> 1       19  190M   94M   0.6%   5:53:05 0.0%
>  user.root
> 100      258  700M  434M   2.4%   0:00:00 0.0%
>  smfalert
> 0       57  268M  196M   1.1%  29:10:32 0.0%
> system
> 
> Or running in both the global and a non-global zone
> prstat -J
> PROJID    NPROC  SIZE   RSS MEMORY      TIME  CPU
> PROJECT
> 1       21  199M   99M   0.6%   5:53:06 0.0%
>  user.root
> 0       57  268M  196M   1.1%  29:10:33 0.0%
> system
> 100      484 1314M  815M   4.4%   0:00:00 0.0%
>  smfalert
> 
> What about taking the output of a svcs -a, and
> stashing the STATE and  
> STIME columns?
> 
> You could then with one process per-zone parse the
> output from svcs - 
> a and report on changes in status
> including the logfile output as needed and available.
> 
> You could then  have messages that contain:
> Hostname: foo
> Instance: system-log
> Previous State(Time): Online(May_31)
> Current State(Time): Online(17:37:55)
> 
> 
> Although a better data source would be to use svcprop
> so you don't  
> have to worry about the STIME representation
> changing over time.
> 
> First Run:
> svcprop -p restarter/state -p
> restarter/state_timestamp -p restarter/ 
> logfile system-log
> 1160256235.416468000
> online
> 
> 
> Subsequent (as you have cached the logfile if there
> is one):
> svcprop -p restarter/state -p
> restarter/state_timestamp system-log
> 1160256235.416468000
> online
> 
> 
> Shawn
> 
> On Oct 7, 2006, at 11:09 AM, Derek Crudgington wrote:
> 
> >> * Derek Crudgington <dacrud at gmail.com> [2006-10-04
> >> 08:34]:
> >>> If anyone is interested, I have written a Perl
> >> daemon that runs in the
> >>> background and monitors SMF services to e-mail
> you
> >> when something
> >>> happens.  Check http://hell.jedicoder.net/?p=83
> for
> >> more info.
> >>
> >>  Cool.  I had a brief look; you can simplify
> >>   $grep = `svcs -l $smf | grep logfile`;
> >>
> >>   to
> >>  $grep = `svcprop -p restarter/logfile $smf`;
> >> (and probably change the variable name...).  We
> >>  should be able to get
> >> a proper API in place for transitions, so you
> >>  needn't be forced to log
> >>  scrape to see stops and starts.
> >>   - Stephen
> >> -
> >> Stephen Hahn, PhD  Solaris Kernel Development, Sun
> >> Microsystems
> >> stephen.hahn at sun.com  http://blogs.sun.com/sch/
> >> _______________________________________________
> >> smf-discuss mailing list
> >> smf-discuss at opensolaris.org
> >>
> >
> > Thanks for the tip!
> >
> >
> > This message posted from opensolaris.org
> > _______________________________________________
> > smf-discuss mailing list
> > smf-discuss at opensolaris.org
> 
> --
> Shawn Ferry                    shawn.ferry at sun.com
> Senior Principal Systems Engineer
> Sun Managed Operations Delivery
> 703.579.1948
> 
> 
> _______________________________________________
> smf-discuss mailing list
> smf-discuss at opensolaris.org
>



Shawn,

Thanks for the ideas.  Monitoring all services would be good.. and the exclude. 
 One thing I've noticed is not all services have a log file, so I guess it 
would just have to use the /var/svc/log to check if they are available.  I 
don't understand why smfalert now takes up that many resources.. is it because 
of the tail process?  So with your idea using the STIME, are you saying have it 
run svcs -a every so often and check the STIME? This would seem more resource 
hungry to me but not sure..
 
 
This message posted from opensolaris.org

[smf-discuss] Re: Re: SMF Alerter

Reply via email to