On Fri, 26 Feb 2010, Rainer Gerhards wrote: > Hi all, > > I have blogged about my quest for log normalization. I think there is some > good information on the upcoming GPLed Adiscon LogAnalyzer and future > directions for rsyslog in the blog post. So I thought I share the link: > > http://blog.gerhards.net/2010/02/syslog-normalization.html > > Please note that part of the effort requires community involvement. I would > be very interested to learn if you think we could win enough support to make > this a useful effort. I am asking for your feedback, because it will help me > streamline my priorities for future rsyslog work.
a few comments (but remember that I am usually dealing with high data rates, so my concerns are biased in that direction) log analysis is usually done in batches as opposed to in real-time. some of this is due to the difficulty in doing it in real time, but a lot of it is the processing overhead (you don't want to take so long to process an individual request that you miss the next one to arrive) at low volumes the idea of name-value pairs in the logs makes a lot of sense, but there is significantly more overhead in parsing a log with name-value pairs in arbitrary orders than there is in using a tree parsing approach to analyze known log formats in a fixed order. The message size can also increase significantly. As a result, at high traffic volumes this starts to be a bad (or at least questionable) idea. I would love to see rsyslog gain the ability to efficiently do tree-based parsing instead of regex parsing. regex parsing is easy to understand and tinker with, but very expensive to implement. it may be that having something that 'compiles' a list of regex parsers into a tree parser is the right answer for usability. I would save several hours of processing a day if I could easily (and efficiently) make rsyslog write different logs to different files (at high data rates and with a few hundred conditions based variations in the syslog tag) While there are some common events across different types of logs (logins for example) they almost always contain slightly different data in them. I also have no faith at all that anyone is going to make much effort to clean up their logs to make them nicely parseable, and if they do I see even less chance that they will end up using the same terms for the same thing. As such I see more value in trying to get samples of logs and what they mean than in trying to define a normalized version to shoehorn the logs into. It is worth doing this for some events (logins, failed logins for example), but I think it's a mistake to think that this will end up covering all, or even the majority of log messages. There's also a problem in that the ideal format for the output depends on what you are doing with the output. If I could wave a magic wand and get the result I would look for something like this the parser starts at the beginning of the message (at the priority) and can branch on priority/faclilty, timestamp, host, syslogtag, message and indicate if the message should be parsed into name-value pairs, or split based on a character (or character sequence like the perl split command allows) into individually addressable elements (defaulting to whitespace separated elements), then the format (and if needed dynafile path/file components) could be constructed from these variables. At any point in the parsing it should be possible to jump to another parser tree (so that you could say that sm-mta, sendmail, Sendmail, etc as syslog tags all end up using the same parser for the message without having to redefine the rules for each one) With this capability, people could start writing parser 'branches' to understand a specific log type and output a 'standard' format (as such a format can start to be defined). This can be done in rsyslog today, but it is fairly difficult to define, and as I understand it, inefficient enough that it's not practical to do in real-time under heavy load. If this is fast enough, then the next step would be to add the ability to have the format/action be 'increment a counter for log type X' and a signal to rsyslog could generate a report on these counters. Although at some point it becomes better to feed the message into another opensource tool (SEC, Simple Event Correlator for example) instead of trying to do everything in rsyslog. parsing the file to know what to do with it, and be able to re-format log messages is very defiantly something that can fit into the rsyslog model of receiving, formatting, and delivering logs. Alerting on specific log entries, counting the number of times one thing shows up in logs, and this sort of thing start pushing beyond the core of rsyslog, and it may be better to feed other tools instead. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

