Hi Clayton

> I need input on:
> 1. Number of devices you are managing logs for (large scale being over
> 10,000 devices)

We're currently managing logs for ~200 network devices and another
500-600 servers using syslog-ng; perhaps not big enough, but who knows
;)

> 2. What log levels you are sending from the devices ( i.e. 0-6 for normal
> operation, 0-7 when troubleshooting?)

We have a central loghost in each physical site which accepts and
queues logs, forwarding them onto our master box as long as it's
available.

On the master box, I have a piece of Ruby which filters out common
garbage we're unable to remove; mostly this is legacy software which
logs far too much to inappropriate levels and we don't care about much
any more.

In normal operating mode, we actively track 0-5; when we're tracing a
problem you can flip the mangler into a more verbose mode where it
inserts 0-6 into MySQL -- typically we don't look at the DEBUG level
at all, since if you're working with an application in that much depth
you might as well configure it to log locally for a while :)

> 3. What log levels you are reacting on (if not all).

Stated above.

> 4. How many people are assigned to look at log messages

Nobody is specifically assigned to the task, the severity of the
problem determines the response - generally speaking it just creates a
ticket which someone on support rotation can take action upon, really
urgent stuff triggers SMS messages.

> 5. What program(s) are used to do log analysis

Bespoke bits and pieces written in Ruby.  We have a very interesting
'attack analysis' module in development which scans for common
dictionary-based attacks and, subject to certain conditions being met,
should be able to null route persistent buggers.

> 6. How are you analyzing the logs? Are you doing a baseline analysis (based
> on number of events per device) or are you reacting on every incoming
> message...or do you just ignore them because there are too many to look at?

Baseline analysis to detect host downtime, ie: we specify a minimum
number of INSERT queries per second we expect to see generated by each
host.

Apart from that, messages are filtered and classified after insertion.

> 7. Anything I missed?

Don't think so :)  I suppose my only other comment is that we mostly
use syslog-ng and PHP-Syslog-NG in a reactive fashion, to track down
the cause of problems and assess the extent of any potential issue -
active monitoring is handled by Nagios.

Best Regards,
Alex

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Php-syslog-ng-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/php-syslog-ng-support

Reply via email to