[smf-discuss] Re: Re: Re: SMF Alerter

Derek Crudgington Mon, 09 Oct 2006 12:41:08 PDT

> 
> On Oct 9, 2006, at 10:25 AM, Derek Crudgington wrote:
> ...
> >> What about taking the output of a svcs -a, and
> >> stashing the STATE and
> >> STIME columns?
> >>
> ...
> >>
> >> Although a better data source would be to use
> svcprop
> >> so you don't
> >> have to worry about the STIME representation
> >> changing over time.
> ...
> > Shawn,
> >
> > Thanks for the ideas.  Monitoring all services
> would be good.. and  
> > the exclude.  One thing I've noticed is not all
> services have a log  
> > file, so I guess it would just have to use the
> /var/svc/log to  
> > check if they are available.  I don't understand
> why smfalert now  
> > takes up that many resources.. is it because of the
> tail process?   
> > So with your idea using the STIME, are you saying
> have it run svcs - 
> > a every so often and check the STIME? This would
> seem more resource  
> > hungry to me but not sure..
> 
> Derek,
> 
> There would be a difference in resource consumption.
> Yes all of the processes, each one has overhead.
> 
> The way I am looking at it there are tradeoffs...
> From a configuration standpoint, I would rather just
>  monitor them  
> ll (easier to configure)
>       Less time, better coverage, probably more noise
> 
> From a footprint standpoint, significantly less
>  memory for more CPU
> 
> I could use less memory by using the existing
> smfalert and only  
> monitoring selected services
>       More configuration time, ongoing updates
> 
> Large zone installations...If I had 100 zones I would
> have some  
> problems with the current implementation
> (or really 16 and I would be a bit tight, the
> overhead of the  
> monitoring would make it an untenable solution)
> 
> One other thing...you might want to make the from
> address a variable  
> as well. I don't think anyone has the address
> smfalert at sun.com, but they might get annoyed if they
> start getting a  
> bunch of bounces and auto responders.
> Possibly smfalert at hostname
> 
> 
> Running smfalert in a both the global and a local
> zone for two days  
> (243 monitored services)
> you can see that the running scripts are not using a
> horrendous  
> amount of cpu time,
> but they are using a relatively large amount of
> memory.
> (from a 16GB system)
> 
> PROJID    NPROC  SIZE   RSS MEMORY      TIME  CPU
> PROJECT
> 100      484 1314M  815M   4.4%   0:15:35 0.0%
>  smfalert
> Other interesting output can been seen from
> ptree `pgrep smfalert`
> prstat -J -j smfalert (if you happen to have made a
> project)
> 
> What you have is a trade off, in utilization you also
> have a tradeoff in
> terms of what you can get information about.
> 
> Do you want to only monitor service a,b, and d but
> not c?
> Or do you want to monitor all services except c?
> 
> if you were to use svcprop (which I would recommend
> over svcs for  
> programatic purposes) you could use logic similar to
> the following
> (this is off the cuff and not complete) e.g. the
> while won't work as  
> shown
> 
> If you were to use svcs -a  you would have to worry
> about timesttamps  
> changing from times to dates
> online         Oct_07
>   svc:/network/nfs/mapid:default
> line         12:51:45 svc:/system/system-log:default
> 
> 
> 
> # setup
> whle ( my $line =  ( svcprop -p restarter/state -p
> restarter/ 
> state_timestamp -p restarter/logfile '*' ) )
> {
> my ($fmri,$prop,$value) = ((split(/\/:properties|
> | /)[0,-1];
>          next if ( $fmri ~= /$exclude/ );
> $state->{$fmri}->{$prop} = $value;
> }
> 
> # update
> while ( my $line = ( svcprop -p restarter/state -p
> restarter/ 
> state_timestamp '*' ) )
> {
>          $now = "";
> i,$prop,$value) = ((split(/\/:properties| /)[0,-1];
>          next if ( $fmri ~= /$exclude/ );
> $now->{$fmri}->{$prop} = $value;
> }
> 
> # check
> foreach my $fmri ( keys %{ $state } )
> {
>       if ( defined($now->{$fmri})
>               &&      $now->{$fmri}->{'/restarter/state_timestamp'}
>                       >
>                       $state->{$fmri}->{'/restarter/state_timestamp'}
>       ) {
>               push @bad, $fmri;
>       }
> }
> 
> 
> # alert
> 
> foreach my $fmri ( @bad )
> {
> # check for defined logfile, if so gather data, if
> f not maybe  "No  
> logfile for service"
> #     or check the restarter log for interesting
> g messages (maybe do  
> this anyway)
> # assemble error message, translate time into human
> n readable form
>       # send message(es)
>       # possibly update state, for restarted services.
>       #alert_code here
> 
> # you could/should break your alert code out of the
> e detection logic,
> # it should make it easier to tweak the messages or
> r delivery methods
> }
> 
> # you could run more frequently than this...but how
> much will you gain?
> sleep 60
> 
> 
> >
> >
> > This message posted from opensolaris.org
> > _______________________________________________
> > smf-discuss mailing list
> > smf-discuss at opensolaris.org
> 
> --
> Shawn Ferry                    shawn.ferry at sun.com
> Senior Principal Systems Engineer
> Sun Managed Operations Delivery
> 703.579.1948
> 
> 
> _______________________________________________
> smf-discuss mailing list
> smf-discuss at opensolaris.org
>


One thing I am seeing about the property restarter/logfile is some use 
alt_logfile.  For instance 

# svcs -l svc:/system/identity:node
fmri         svc:/system/identity:node
name         system identity (nodename)
enabled      true
state        online
next_state   none
state_time   Sun Oct 08 10:21:35 2006
alt_logfile  /etc/svc/volatile/system-identity:node.log
restarter    svc:/system/svc/restarter:default
dependency   require_any/none svc:/network/loopback (online)
dependency   optional_all/none svc:/network/physical (online)

but it also has a log file in /var/svc/log/system-identity\:node.log so I am 
not sure which one to use here.  Can you provide any info on this?
 
 
This message posted from opensolaris.org

[smf-discuss] Re: Re: Re: SMF Alerter

Reply via email to