[smf-discuss] Re: SMF Alerter

Shawn Ferry Sat, 07 Oct 2006 17:50:05 -0400

Derek,

I like it although I have some ideas.

Some thoughts on usage:

It might be a bit more useful on error if you output a reason for not  
starting.
e.g. if you don't have all your options, which one(s) are missing or  
malformed

Taking the usage to the extreme...
Die on no logfile is not very friendly, it would seem reasonable to  
me to monitor all
running services.

I can see a use for an include 'all' option while also adding an  
exclude option.

e.g.
./smfalert.pl -m send -r root -i "all" -e "gss:default"

As trying to monitor everything at the moment is a bit unwieldy
./smfalert.pl -m send -r root -i "`svcs | awk '/gss|xfs|stfsloader| 
smserver|rstat|rusers/{next} /svc:/{print $3}' ORS=' '`"

Some thoughts on implementation:

I think there may be an alternate method to achieve the broader goal  
of service monitoring given the current
capabilities of SMF, I would still like to see an include/exclude  
method in any case.

Right now smfalert is a bit on the heavy side when run against all  
services with logfiles.
(I created a project and launched smfalert in the new project to  
allow for easier tracking)

projadd smfalert
newtask -p smfalert ./smfalert.pl -m send -r root -i "`svcs -a | awk  
'/gss|xfs|stfsloader|smserver|rstat|rusers/{next} /svc:/{print $3}'  
ORS=' '`"

prstat -J
PROJID    NPROC  SIZE   RSS MEMORY      TIME  CPU PROJECT
      1       19  190M   94M   0.6%   5:53:05 0.0% user.root
    100      258  700M  434M   2.4%   0:00:00 0.0% smfalert
      0       57  268M  196M   1.1%  29:10:32 0.0% system

Or running in both the global and a non-global zone
prstat -J
PROJID    NPROC  SIZE   RSS MEMORY      TIME  CPU PROJECT
      1       21  199M   99M   0.6%   5:53:06 0.0% user.root
      0       57  268M  196M   1.1%  29:10:33 0.0% system
    100      484 1314M  815M   4.4%   0:00:00 0.0% smfalert

What about taking the output of a svcs -a, and stashing the STATE and  
STIME columns?

You could then with one process per-zone parse the output from svcs - 
a and report on changes in status
including the logfile output as needed and available.

You could then  have messages that contain:
Hostname: foo
Instance: system-log
Previous State(Time): Online(May_31)
Current State(Time): Online(17:37:55)

Although a better data source would be to use svcprop so you don't  
have to worry about the STIME representation
changing over time.

First Run:
svcprop -p restarter/state -p restarter/state_timestamp -p restarter/ 
logfile system-log
1160256235.416468000
online

Subsequent (as you have cached the logfile if there is one):
svcprop -p restarter/state -p restarter/state_timestamp system-log
1160256235.416468000
online

Shawn

On Oct 7, 2006, at 11:09 AM, Derek Crudgington wrote:

>> * Derek Crudgington <dacrud at gmail.com> [2006-10-04
>> 08:34]:
>>> If anyone is interested, I have written a Perl
>> daemon that runs in the
>>> background and monitors SMF services to e-mail you
>> when something
>>> happens.  Check http://hell.jedicoder.net/?p=83 for
>> more info.
>>
>>  Cool.  I had a brief look; you can simplify
>>   $grep = `svcs -l $smf | grep logfile`;
>>
>>   to
>>  $grep = `svcprop -p restarter/logfile $smf`;
>> (and probably change the variable name...).  We
>>  should be able to get
>> a proper API in place for transitions, so you
>>  needn't be forced to log
>>  scrape to see stops and starts.
>>   - Stephen
>> -
>> Stephen Hahn, PhD  Solaris Kernel Development, Sun
>> Microsystems
>> stephen.hahn at sun.com  http://blogs.sun.com/sch/
>> _______________________________________________
>> smf-discuss mailing list
>> smf-discuss at opensolaris.org
>>
>
> Thanks for the tip!
>
>
> This message posted from opensolaris.org
> _______________________________________________
> smf-discuss mailing list
> smf-discuss at opensolaris.org

--
Shawn Ferry                    shawn.ferry at sun.com
Senior Principal Systems Engineer
Sun Managed Operations Delivery
703.579.1948

[smf-discuss] Re: SMF Alerter

Reply via email to