Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-15 Thread Brenda J. Butler


On Sun, Jul 14, 2013 at 10:46:10AM -0400, Peter Sjöberg wrote:
 On 07/13/2013 10:55 PM, Brenda J. Butler wrote:
  
  I'm curious why nagios/munin are overkill.  I think they exactly match
  your requirements.
 My requirement is not monitoring - that is managed in a different way.
 My problem is that something happened and I need to find out what and
 why. While nagios can alert that the load is high on a server it would't
 say exactly why and when I get to the system the cause may be gone.

Ah ... How about argus then: http://argus.tcp4me.com/.  I haven't used
it (much) myself.  I first heard about it in relation to forensics -
some custmoer of the person describing it had installed it a few years
before an incident, and when the incident happened the investigator
had all the info s/he needed because argus had been quietly saving all
kinds of data.

I see the pages describe it as monitoring now, but I guess you don't
have to turn on alerts if you already have monitoring software.

bjb
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-15 Thread Brenda J. Butler
On Sun, Jul 14, 2013 at 02:29:54PM -0400, Brenda J. Butler wrote:
 
 
 On Sun, Jul 14, 2013 at 10:46:10AM -0400, Peter Sjöberg wrote:
  On 07/13/2013 10:55 PM, Brenda J. Butler wrote:
   
   I'm curious why nagios/munin are overkill.  I think they exactly match
   your requirements.
  My requirement is not monitoring - that is managed in a different way.
  My problem is that something happened and I need to find out what and
  why. While nagios can alert that the load is high on a server it would't
  say exactly why and when I get to the system the cause may be gone.
 
 Ah ... How about argus then: http://argus.tcp4me.com/.  I haven't used
 it (much) myself.  I first heard about it in relation to forensics -
 some custmoer of the person describing it had installed it a few years
 before an incident, and when the incident happened the investigator
 had all the info s/he needed because argus had been quietly saving all
 kinds of data.
 
 I see the pages describe it as monitoring now, but I guess you don't
 have to turn on alerts if you already have monitoring software.
 
 bjb
---end quoted text---


Well on closer look, it seems argus is more for network auditing.
http://www.qosient.com/argus/
Although I'm not sure if this is the same project as the one above ...
but it's more likely to be the one I read about a couple of years
ago.

nagios does keep a database of historical records - and hate to say it
but this is the sort of thing that log files are for.  Why can't you
have log files?  (no need to answer to me ... this is a question for
your employer/customer)  Log records can go to a log server on a
separate machine, in case space/confidentiality is an issue.

bjb
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-14 Thread Peter Sjöberg
On 07/13/2013 10:55 PM, Brenda J. Butler wrote:
 
 I'm curious why nagios/munin are overkill.  I think they exactly match
 your requirements.
My requirement is not monitoring - that is managed in a different way.
My problem is that something happened and I need to find out what and
why. While nagios can alert that the load is high on a server it would't
say exactly why and when I get to the system the cause may be gone.
Having a program that collects the output of top -b -c -n 2 -i and
other similar commands to a file every minute would help me see what was
going on and that's about all I need.
Besides that it's also other issues with nagios in our env. Nagios is a
centralized app with a web interface. It's no way we can install nagios
locally on every server and network wise it's not one server that can
talk to all plus that we already have other ways to monitor production
servers.

 Scheduling the tests and keeping track of the result in a scalable way
 can be a bit complicated - the actual tests are basically plugins.
 nagios and munin come with a few built-in tests (basically, the ones
 you want to see) and the rest are plugins, probably in separate
 packages.
Using nagios+nrpe in the lab to keep an eye on some non prod servers for
our self and even written some small plugins to add monitoring of some
in house apps.

 It's a bit annoying to learn nagios config language though, I have to
 admit.
I have managed to figure out some of it but it takes a while to get used to.
 Munin is way less complicated
Looked a little more at it and while it's not for the original issue I
think I will implement it at home.
, but the thinning of data as
 time goes by annoys me.  Then again, it was one of your requirements.
Actually, I simplified it and just drop it all together after 2 days.
 The graphs are a bonus. 
Had some fun trying to interpretate graphs from some collected SAR data
from a server that had something like 300 SAN luns over shared
path=every disk did show up 3 times and the output graphs where one
per page in a huge pdf file. Then grep on the raw data was so much
easier to handle.
 You don't have to look at them if you don't
 want to.
 
 I haven't looked at zenoss, but will keep an eye open for it.
https://github.com/lpaseen/nyss
 
 bjb
 
 
 On Fri, Jul 12, 2013 at 11:49:23PM -0400, Peter Sjöberg wrote:
 On 07/12/2013 10:28 AM, Brenda J. Butler wrote:


 I don't know oswatcher, but based on your description the following
 would be usefule for you:


 munin (keeps a contstant sized database, which thins out as you look back
 in time).
 10sec look and it looks like overkill but I will look at it more.


 nagios
 Definitely overkill. Using nagios for other things but what I'm after is
 not monitoring as much as a tool to use after the monitoring alerted
 that something is bad. At that point I want to know what did lead up to
 all memory used up or what process that did consume all cpu/io since
 once the alert happens it many time gets resolved with a big shotgun
 like a reboot (like when they accidentally started 40 instances of a
 java app on a server designed for 4) and we are left to tell what
 happened without logs.


 On 07/12/2013 01:36 PM, Jeffrey Moncrieff wrote:
 You can also try zenoss.

 Will check on that later


 In both cases, if there is some test they don't already do, you can
 write your own and have them use it.

 Well, google did find https://github.com/stephenlang/scrutiny and that's
 about the closest I seen to what I'm looking for but a bit to basic.

 Since after all it's not that much to it I started writing something
 that I will try out over the weekend. I know one challenge will be to be
 able to actually collect anything when the system is crawling but
 anything is better then what we have now which is nothing (besides 1
 minute sar data which tend to stop before system dies).

 /ps

 
 
 
 ___
 Linux mailing list
 Linux@lists.oclug.on.ca
 http://oclug.on.ca/mailman/listinfo/linux
 
 ---end quoted text---
 




signature.asc
Description: OpenPGP digital signature
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-13 Thread Brenda J. Butler


I don't know oswatcher, but based on your description the following
would be usefule for you:


munin (keeps a contstant sized database, which thins out as you look back
in time).

nagios

In both cases, if there is some test they don't already do, you can
write your own and have them use it.

bjb


On Fri, Jul 12, 2013 at 09:56:02AM -0400, Peter Sjöberg wrote:
 Just wonder if it's something already out there that does something
 similar to what oracles oswatcher does ?
 What I'm looking for is some tool to use when analyzing server issues
 and while oswatcher could be good it's questionable license and I don't
 run oracle at all on most of the servers I need it.
 The tool would collect the output of
 ps,top,iostat,netstat,vmstat,mpstat,who,... every so often (like every
 minute) to some kind of archive and then after so long (X hours) old
 data is removed.
 I can easily write something my self but before doing that I want to see
 if someone else already taken the trouble doing it.
 
 -- 
 ---
 Techwiz, Peter Sjoberg PGP key (12F506C8) on keyserver  homepage
 Key fingerprint =  3DC2 CEBA 1590 B41A 3780  955A DB42 02BB 12F5 06C8
 Homepage: http://www.techwiz.ca/~peters 
 Pictures: http://www.flickr.com/photos/henahadu/
 
 



 ___
 Linux mailing list
 Linux@lists.oclug.on.ca
 http://oclug.on.ca/mailman/listinfo/linux

---end quoted text---
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-13 Thread Richard Guy Briggs
On Fri, Jul 12, 2013 at 10:28:59AM -0400, Brenda J. Butler wrote:
 I don't know oswatcher, but based on your description the following
 would be usefule for you:
 
 munin (keeps a contstant sized database, which thins out as you look back
 in time).
 
 nagios
 
 In both cases, if there is some test they don't already do, you can
 write your own and have them use it.

My understanding is Nagios can use mrtg or rrdtool databases that also
keep a constant size database like munin (maybe munin actually uses one
of these two as well).

I should have a look at munin and nagios...  since I'm still using MRTG
and it is getting a bit out of hand and there is lots of that data that
would be better plotted on the same graph...
http://toccata2.tricolour.ca/mrtg/mrtg-rrd.cgi/

 bjb
 
 On Fri, Jul 12, 2013 at 09:56:02AM -0400, Peter Sj?berg wrote:
  Just wonder if it's something already out there that does something
  similar to what oracles oswatcher does ?
  What I'm looking for is some tool to use when analyzing server issues
  and while oswatcher could be good it's questionable license and I don't
  run oracle at all on most of the servers I need it.
  The tool would collect the output of
  ps,top,iostat,netstat,vmstat,mpstat,who,... every so often (like every
  minute) to some kind of archive and then after so long (X hours) old
  data is removed.
  I can easily write something my self but before doing that I want to see
  if someone else already taken the trouble doing it.
  
  Techwiz, Peter Sjoberg PGP key (12F506C8) on keyserver  homepage

slainte mhath, RGB

--
Richard Guy Briggs   --  ~\-- ~\hpv.tricolour.net
www.TriColour.net--  \___   o \@   @   Ride yer bike!
Ottawa, ON, CANADA  --  Lo___M__\\/\%__\\/\%
Vote! -- greenparty.ca_GTVS6#790__(*)__(*)(*)(*)_
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


[OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-12 Thread Peter Sjöberg
Just wonder if it's something already out there that does something
similar to what oracles oswatcher does ?
What I'm looking for is some tool to use when analyzing server issues
and while oswatcher could be good it's questionable license and I don't
run oracle at all on most of the servers I need it.
The tool would collect the output of
ps,top,iostat,netstat,vmstat,mpstat,who,... every so often (like every
minute) to some kind of archive and then after so long (X hours) old
data is removed.
I can easily write something my self but before doing that I want to see
if someone else already taken the trouble doing it.

-- 
---
Techwiz, Peter Sjoberg PGP key (12F506C8) on keyserver  homepage
Key fingerprint =  3DC2 CEBA 1590 B41A 3780  955A DB42 02BB 12F5 06C8
Homepage: http://www.techwiz.ca/~peters 
Pictures: http://www.flickr.com/photos/henahadu/




signature.asc
Description: OpenPGP digital signature
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-12 Thread OddSox

Not sure if it's exactly what you're looking for but - http://www.nagios.org/ ?


On Jul 12, 2013, at 9:56 AM, Peter Sjöberg wrote:

 Just wonder if it's something already out there that does something
 similar to what oracles oswatcher does ?

___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-12 Thread Jeffrey Moncrieff


You can also try zenoss. 

Jeffrey Dean Moncrieff
jeffrey.moncri...@yahoo.ca



 From: OddSox oddsoxs...@gmail.com
To: peters-oc...@techwiz.ca 
Cc: Ottawa Linux Users Group linux@lists.oclug.on.ca 
Sent: Friday, July 12, 2013 1:29:12 PM
Subject: Re: [OCLUG-Tech] oswatcher alternative, collector of 
top/ps/iostat/vmstat/... info
 


Not sure if it's exactly what you're looking for but - http://www.nagios.org/ ?


On Jul 12, 2013, at 9:56 AM, Peter Sjöberg wrote:

 Just wonder if it's something already out there that does something
 similar to what oracles oswatcher does ?

___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux


Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info

2013-07-12 Thread Peter Sjöberg
On 07/12/2013 10:28 AM, Brenda J. Butler wrote:
 
 
 I don't know oswatcher, but based on your description the following
 would be usefule for you:
 
 
 munin (keeps a contstant sized database, which thins out as you look back
 in time).
10sec look and it looks like overkill but I will look at it more.

 
 nagios
Definitely overkill. Using nagios for other things but what I'm after is
not monitoring as much as a tool to use after the monitoring alerted
that something is bad. At that point I want to know what did lead up to
all memory used up or what process that did consume all cpu/io since
once the alert happens it many time gets resolved with a big shotgun
like a reboot (like when they accidentally started 40 instances of a
java app on a server designed for 4) and we are left to tell what
happened without logs.


On 07/12/2013 01:36 PM, Jeffrey Moncrieff wrote:
 You can also try zenoss.

Will check on that later

 
 In both cases, if there is some test they don't already do, you can
 write your own and have them use it.
 
Well, google did find https://github.com/stephenlang/scrutiny and that's
about the closest I seen to what I'm looking for but a bit to basic.

Since after all it's not that much to it I started writing something
that I will try out over the weekend. I know one challenge will be to be
able to actually collect anything when the system is crawling but
anything is better then what we have now which is nothing (besides 1
minute sar data which tend to stop before system dies).

/ps



signature.asc
Description: OpenPGP digital signature
___
Linux mailing list
Linux@lists.oclug.on.ca
http://oclug.on.ca/mailman/listinfo/linux