The lines I'm working with are fairly long and gnarly, so instead of putting it inline I've pasted a sample line here:
https://gist.github.com/33276cce196a639c016e I can easily break it down into the relevant fields with the rsyslog property replacer and mmnormalize… I can get the timestamp referring URL, etc all out of it. I can also build a basic CEE'ish sort of output using a template. However, what I really want is to break it down and normalize it into something that looks like this (written in json format, but I really want access to it as invididual properties): { "programname": "rg_events, "host": "mediacast3", "request_time": "26/Dec/2012:19:18:36 +0000", "request_type": "GET", "rg_type": "2.4.5:info", "rg_player_type": "standard, "rg_publisher": "Vibe | Open Distribution Player | TheYBF", "rg_publisher_id": "896", … "response_code": 200, "referrer_url": "http://theybf.com/2012/05/15/confirmed-tia-mowry-leaves-the-game-to-focus-on-other-projects", "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/534.56.5 (KHTML, like Gecko) Version/5.1.6 Safari/534.56.5" } You get the idea. Each of those "rg_<foo>" fields are a beacon in our events, and they are not in every request, nor are they guaranteed to all be present or in any specific order. While it's easy for me to break down the request_url field out of the log line (the entire GET /events?foo=1&bar=2&blahblahblah) I then want to break down that particular field when present, extract all the query params and turn them into properties on the fly… so that not only can I emit them in a template to a remote store (elastic search, redis, etc) I can also *route* on those specific properties. I.e. I want to then be able to say something like: if %rg_publisher_id% == "896" then { # do something spiffy with it here } What I have been doing instead currently is doing coarse normalization in rsyslog then emitting it to logstash, letting logstash chew through it and convert it and then stuff it into elastic search while at the same time forking messages from rsyslog to a zeromq bus in a rough CEE'ish format and using zmq listeners that I've written to cull through a large data stream looking for things they are interested in. The routing to the ZMQ bus is necessarily coarse, since the only way I can route on the beacons is with regexps right now and that's really not how I'd *like* to do it. It'd be so much cleaner to be able to treat each of those beacons as a property. I'm also goofing around with preformatting it by using logstash to chew it *first* and normalize it, then sending the logstash output into rsyslog for routing. I'd like to short circuit that, though, and use rsyslog to do the normalization as well and then route it with much more fine-grained control. This might be possible already and I might simply be overlooking how to do it, but I couldn't get mmnormalize to do that deep level of normalization for me. -- Gary F. On Jan 31, 2013, at 7:08 AM, Rainer Gerhards <[email protected]> wrote: > On Thu, 2013-01-31 at 06:58 -0800, Gary Foster wrote: >> I am doing something similar, but the way I'm handling it is to push the >> formatting upstream. I'm actually moving towards generating the log >> messages in preparsed format (well structured JSON along the lines of CEE). >> >> For example, when an incoming GET request comes in on an nginx server, it >> contains a huge number of potential params... GET /foo?bar=1&baz=2 etc. >> The bar and baz params are what I'm really interested in (along with the >> timestamp, url, etc of course), and they are moderately dynamic instead of >> being a fixed pattern every time, so I'm pushing that out to the clients so >> it becomes json like {"action": "GET", "url": "foo", "bar": "1", "baz": >> "2"}. > > Can you post a sample input log line, as rsyslog receives it. This is > one of the hot topics for rsyslog currently and I would like to get a > bit more insight into current use cases (maybe it's easy to write a > parser module to do that work...). > > Rainer >> >> I am not even sure if it is completely possible to do that all entirely >> within rsyslog right now, since the key/value pairs are dynamic so I just >> simply do it it pre-rsyslog and then use rsyslog to route it on the JSON >> keys. I'm routing about 500 per sec without even breaking a sweat, and >> have tested it upwards of 30k per sec. It is more moving parts though, >> which I am not particularly a fan of. >> >> -- Gary F. >> >> On Thu, Jan 31, 2013 at 5:44 AM, Rainer Gerhards >> <[email protected]>wrote: >> >>> On Thu, 2013-01-31 at 14:51 +0200, Radu Gheorghe wrote: >>>> Hi Ben, >>>> >>>> 2013/1/31 Ben Bradley <[email protected]> >>>> >>>>> Hi everyone >>>>> >>>>> I'm currently using logstash as the log collector from a few rsyslog >>>>> sender clients. I'd like to use rsyslog to receive the remote logs >>> instead >>>>> of logstash. This means I'm keeping things simple and can possibly >>> also use >>>>> RELP. >>>>> >>>>> If the rsyslog receiver is doing alot of regex parsing on each message >>>>> received (i.e. parsing Apache logs into ElasticSearch fields) at what >>> sort >>>>> of volume of log messages would I start to notice performance problems? >>>>> >>>>> Eventually I'm expecting about 5-10GB per day to be received by our >>>>> centralised rsyslog log server. >>>>> >>>> >>>> I guess it all comes down to performance testing, but 10GB would probably >>>> mean ~20M logs or something like that. If the majority of those will be >>>> sent during the day (say 10 hours), my poor math says if you handle >>> 500-600 >>>> logs/sec you should be fine. >>> >>> seeing that number, I'd say it requires quite some regexpes to get >>> rsyslog to sweat. HOWEVER... do we really need regexpes? Can you post a >>> couple of samples? >>> >>> Rainer >>>> >>>> I've never used regex with rsyslog in a performance situation, so I can't >>>> say, but it seems to me like it should easily handle that amount. >>>> >>>> >>>>> >>>>> Should I actually get the rsyslog senders to parse the regex patterns >>> of >>>>> Apache logs into JSON then forward that JSON to the receiver? So the >>>>> sender's got the regex overhead? >>>>> >>>>> Or will an rsyslog receiver easily be able to parse all the regex >>> patterns >>>>> with my volume of logging? >>>>> Having the regex patterns parsed in one place would make for easier >>>>> management. If necessary we can just throw more vCPUs and memory at >>> the log >>>>> server without needing to touch the web nodes. >>>>> >>>> >>>> I suspect the load won't be too high, but making the clients to that will >>>> scale a lot better and - especially since we don't expect the total load >>> to >>>> be high - nobody will feel that load if it's that distributed. And if you >>>> add more web nodes, you won't have to touch anything. Not even adding >>> vCPUs >>>> and memory. >>>> >>>> Personally, I'd try the "centralized" method first, because it's easier >>> to >>>> get started. If all works smoothly, you can push the same(ish) config to >>>> the web nodes. If you ever feel the need to do that :) By then, >>> configuring >>>> them might get easier because of natural evolutions of packaging, testing >>>> and documentation. >>>> >>>> Best regards, >>>> Radu >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>> DON'T LIKE THAT. >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of >> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >> LIKE THAT. > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

