Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
On Sun, Jul 14, 2013 at 10:46:10AM -0400, Peter Sjöberg wrote: On 07/13/2013 10:55 PM, Brenda J. Butler wrote: I'm curious why nagios/munin are overkill. I think they exactly match your requirements. My requirement is not monitoring - that is managed in a different way. My problem is that something happened and I need to find out what and why. While nagios can alert that the load is high on a server it would't say exactly why and when I get to the system the cause may be gone. Ah ... How about argus then: http://argus.tcp4me.com/. I haven't used it (much) myself. I first heard about it in relation to forensics - some custmoer of the person describing it had installed it a few years before an incident, and when the incident happened the investigator had all the info s/he needed because argus had been quietly saving all kinds of data. I see the pages describe it as monitoring now, but I guess you don't have to turn on alerts if you already have monitoring software. bjb ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
On Sun, Jul 14, 2013 at 02:29:54PM -0400, Brenda J. Butler wrote: On Sun, Jul 14, 2013 at 10:46:10AM -0400, Peter Sjöberg wrote: On 07/13/2013 10:55 PM, Brenda J. Butler wrote: I'm curious why nagios/munin are overkill. I think they exactly match your requirements. My requirement is not monitoring - that is managed in a different way. My problem is that something happened and I need to find out what and why. While nagios can alert that the load is high on a server it would't say exactly why and when I get to the system the cause may be gone. Ah ... How about argus then: http://argus.tcp4me.com/. I haven't used it (much) myself. I first heard about it in relation to forensics - some custmoer of the person describing it had installed it a few years before an incident, and when the incident happened the investigator had all the info s/he needed because argus had been quietly saving all kinds of data. I see the pages describe it as monitoring now, but I guess you don't have to turn on alerts if you already have monitoring software. bjb ---end quoted text--- Well on closer look, it seems argus is more for network auditing. http://www.qosient.com/argus/ Although I'm not sure if this is the same project as the one above ... but it's more likely to be the one I read about a couple of years ago. nagios does keep a database of historical records - and hate to say it but this is the sort of thing that log files are for. Why can't you have log files? (no need to answer to me ... this is a question for your employer/customer) Log records can go to a log server on a separate machine, in case space/confidentiality is an issue. bjb ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
On 07/13/2013 10:55 PM, Brenda J. Butler wrote: I'm curious why nagios/munin are overkill. I think they exactly match your requirements. My requirement is not monitoring - that is managed in a different way. My problem is that something happened and I need to find out what and why. While nagios can alert that the load is high on a server it would't say exactly why and when I get to the system the cause may be gone. Having a program that collects the output of top -b -c -n 2 -i and other similar commands to a file every minute would help me see what was going on and that's about all I need. Besides that it's also other issues with nagios in our env. Nagios is a centralized app with a web interface. It's no way we can install nagios locally on every server and network wise it's not one server that can talk to all plus that we already have other ways to monitor production servers. Scheduling the tests and keeping track of the result in a scalable way can be a bit complicated - the actual tests are basically plugins. nagios and munin come with a few built-in tests (basically, the ones you want to see) and the rest are plugins, probably in separate packages. Using nagios+nrpe in the lab to keep an eye on some non prod servers for our self and even written some small plugins to add monitoring of some in house apps. It's a bit annoying to learn nagios config language though, I have to admit. I have managed to figure out some of it but it takes a while to get used to. Munin is way less complicated Looked a little more at it and while it's not for the original issue I think I will implement it at home. , but the thinning of data as time goes by annoys me. Then again, it was one of your requirements. Actually, I simplified it and just drop it all together after 2 days. The graphs are a bonus. Had some fun trying to interpretate graphs from some collected SAR data from a server that had something like 300 SAN luns over shared path=every disk did show up 3 times and the output graphs where one per page in a huge pdf file. Then grep on the raw data was so much easier to handle. You don't have to look at them if you don't want to. I haven't looked at zenoss, but will keep an eye open for it. https://github.com/lpaseen/nyss bjb On Fri, Jul 12, 2013 at 11:49:23PM -0400, Peter Sjöberg wrote: On 07/12/2013 10:28 AM, Brenda J. Butler wrote: I don't know oswatcher, but based on your description the following would be usefule for you: munin (keeps a contstant sized database, which thins out as you look back in time). 10sec look and it looks like overkill but I will look at it more. nagios Definitely overkill. Using nagios for other things but what I'm after is not monitoring as much as a tool to use after the monitoring alerted that something is bad. At that point I want to know what did lead up to all memory used up or what process that did consume all cpu/io since once the alert happens it many time gets resolved with a big shotgun like a reboot (like when they accidentally started 40 instances of a java app on a server designed for 4) and we are left to tell what happened without logs. On 07/12/2013 01:36 PM, Jeffrey Moncrieff wrote: You can also try zenoss. Will check on that later In both cases, if there is some test they don't already do, you can write your own and have them use it. Well, google did find https://github.com/stephenlang/scrutiny and that's about the closest I seen to what I'm looking for but a bit to basic. Since after all it's not that much to it I started writing something that I will try out over the weekend. I know one challenge will be to be able to actually collect anything when the system is crawling but anything is better then what we have now which is nothing (besides 1 minute sar data which tend to stop before system dies). /ps ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux ---end quoted text--- signature.asc Description: OpenPGP digital signature ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
I don't know oswatcher, but based on your description the following would be usefule for you: munin (keeps a contstant sized database, which thins out as you look back in time). nagios In both cases, if there is some test they don't already do, you can write your own and have them use it. bjb On Fri, Jul 12, 2013 at 09:56:02AM -0400, Peter Sjöberg wrote: Just wonder if it's something already out there that does something similar to what oracles oswatcher does ? What I'm looking for is some tool to use when analyzing server issues and while oswatcher could be good it's questionable license and I don't run oracle at all on most of the servers I need it. The tool would collect the output of ps,top,iostat,netstat,vmstat,mpstat,who,... every so often (like every minute) to some kind of archive and then after so long (X hours) old data is removed. I can easily write something my self but before doing that I want to see if someone else already taken the trouble doing it. -- --- Techwiz, Peter Sjoberg PGP key (12F506C8) on keyserver homepage Key fingerprint = 3DC2 CEBA 1590 B41A 3780 955A DB42 02BB 12F5 06C8 Homepage: http://www.techwiz.ca/~peters Pictures: http://www.flickr.com/photos/henahadu/ ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux ---end quoted text--- ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
On Fri, Jul 12, 2013 at 10:28:59AM -0400, Brenda J. Butler wrote: I don't know oswatcher, but based on your description the following would be usefule for you: munin (keeps a contstant sized database, which thins out as you look back in time). nagios In both cases, if there is some test they don't already do, you can write your own and have them use it. My understanding is Nagios can use mrtg or rrdtool databases that also keep a constant size database like munin (maybe munin actually uses one of these two as well). I should have a look at munin and nagios... since I'm still using MRTG and it is getting a bit out of hand and there is lots of that data that would be better plotted on the same graph... http://toccata2.tricolour.ca/mrtg/mrtg-rrd.cgi/ bjb On Fri, Jul 12, 2013 at 09:56:02AM -0400, Peter Sj?berg wrote: Just wonder if it's something already out there that does something similar to what oracles oswatcher does ? What I'm looking for is some tool to use when analyzing server issues and while oswatcher could be good it's questionable license and I don't run oracle at all on most of the servers I need it. The tool would collect the output of ps,top,iostat,netstat,vmstat,mpstat,who,... every so often (like every minute) to some kind of archive and then after so long (X hours) old data is removed. I can easily write something my self but before doing that I want to see if someone else already taken the trouble doing it. Techwiz, Peter Sjoberg PGP key (12F506C8) on keyserver homepage slainte mhath, RGB -- Richard Guy Briggs -- ~\-- ~\hpv.tricolour.net www.TriColour.net-- \___ o \@ @ Ride yer bike! Ottawa, ON, CANADA -- Lo___M__\\/\%__\\/\% Vote! -- greenparty.ca_GTVS6#790__(*)__(*)(*)(*)_ ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
[OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
Just wonder if it's something already out there that does something similar to what oracles oswatcher does ? What I'm looking for is some tool to use when analyzing server issues and while oswatcher could be good it's questionable license and I don't run oracle at all on most of the servers I need it. The tool would collect the output of ps,top,iostat,netstat,vmstat,mpstat,who,... every so often (like every minute) to some kind of archive and then after so long (X hours) old data is removed. I can easily write something my self but before doing that I want to see if someone else already taken the trouble doing it. -- --- Techwiz, Peter Sjoberg PGP key (12F506C8) on keyserver homepage Key fingerprint = 3DC2 CEBA 1590 B41A 3780 955A DB42 02BB 12F5 06C8 Homepage: http://www.techwiz.ca/~peters Pictures: http://www.flickr.com/photos/henahadu/ signature.asc Description: OpenPGP digital signature ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
Not sure if it's exactly what you're looking for but - http://www.nagios.org/ ? On Jul 12, 2013, at 9:56 AM, Peter Sjöberg wrote: Just wonder if it's something already out there that does something similar to what oracles oswatcher does ? ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
You can also try zenoss. Jeffrey Dean Moncrieff jeffrey.moncri...@yahoo.ca From: OddSox oddsoxs...@gmail.com To: peters-oc...@techwiz.ca Cc: Ottawa Linux Users Group linux@lists.oclug.on.ca Sent: Friday, July 12, 2013 1:29:12 PM Subject: Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info Not sure if it's exactly what you're looking for but - http://www.nagios.org/ ? On Jul 12, 2013, at 9:56 AM, Peter Sjöberg wrote: Just wonder if it's something already out there that does something similar to what oracles oswatcher does ? ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux
Re: [OCLUG-Tech] oswatcher alternative, collector of top/ps/iostat/vmstat/... info
On 07/12/2013 10:28 AM, Brenda J. Butler wrote: I don't know oswatcher, but based on your description the following would be usefule for you: munin (keeps a contstant sized database, which thins out as you look back in time). 10sec look and it looks like overkill but I will look at it more. nagios Definitely overkill. Using nagios for other things but what I'm after is not monitoring as much as a tool to use after the monitoring alerted that something is bad. At that point I want to know what did lead up to all memory used up or what process that did consume all cpu/io since once the alert happens it many time gets resolved with a big shotgun like a reboot (like when they accidentally started 40 instances of a java app on a server designed for 4) and we are left to tell what happened without logs. On 07/12/2013 01:36 PM, Jeffrey Moncrieff wrote: You can also try zenoss. Will check on that later In both cases, if there is some test they don't already do, you can write your own and have them use it. Well, google did find https://github.com/stephenlang/scrutiny and that's about the closest I seen to what I'm looking for but a bit to basic. Since after all it's not that much to it I started writing something that I will try out over the weekend. I know one challenge will be to be able to actually collect anything when the system is crawling but anything is better then what we have now which is nothing (besides 1 minute sar data which tend to stop before system dies). /ps signature.asc Description: OpenPGP digital signature ___ Linux mailing list Linux@lists.oclug.on.ca http://oclug.on.ca/mailman/listinfo/linux