Cut and paste broke on that last email. Trying again for easy reading. set $!errorlevel = re_extract($msg, '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, 1, 'N/A'); set $!session = re_extract($msg, '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*', 0, 1, 'N/A'); set $!appname = re_extract($msg, '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 'Unknown'); set $!appversion = re_extract($msg, '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_-]*[\\^]', 0, 1, 'Unknown'); set $!appinstance = re_extract($msg, '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_-]*[\\^]', 0, 1, 'N/A'); set $!logtype = re_extract($msg, '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*)[_-]*[a-zA-Z0-9_-]*[\\^]', 0, 1, 'Unknown'); set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, 1, 'NoMatch');
-----Original Message----- From: Boylan, James Sent: Tuesday, October 22, 2013 7:18 AM To: rsyslog-users Subject: RE: [rsyslog] Large Scale Rsyslog deployment Sure. I'm working on scheduling time to clean them up more. (I already had cleaned them up from the original ones that David had seen a few months ago.) set $!errorlevel = re_extract($msg, '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, 1, 'N/A'); set $!session = re_extract($msg, '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*', 0, 1, 'N/A'); set $!appname = re_extract($msg, '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 'Unknown'); set $!appversion = re_extract($msg, '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_-]*[\\^]', 0, 1, 'Unknown'); set $!appinstance = re_extract($msg, '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_-]*[\\^]', 0, 1, 'N/A'); set $!logtype = re_extract($msg, '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*)[_-]*[a-zA-Z0-9_-]*[\\^]', 0, 1, 'Unknown'); set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, 1, 'NoMatch'); All of our syslog messages that we receive had a <app_name>|<app_version>|<instance_number>_<logtype>^ appended to the start of the msg field. The errorlevel and session variables are pulled from within the actual log payload itself. Due to how the message is structured I'm not sure how much more I can condense the regex down. Which is why I'm definitely going to start looking into the C string generator modules. -- James -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Aaron Wiebe Sent: Tuesday, October 22, 2013 7:13 AM To: rsyslog-users Subject: Re: [rsyslog] Large Scale Rsyslog deployment Can you share your regex? Maybe there is some simple things we can do there without going straight to C (not that I don't recommend that route - it will be the best route, but this might be quicker). On Tue, Oct 22, 2013 at 8:00 AM, Boylan, James <[email protected]>wrote: > The initial design I'm looking at has 8 instances per server. Which is > about the maximum these serves can handle with the complex regex we're > running. As David points out, the regex is the biggest choke point in > the application when it comes down to it. > > I wonder if anyone has a link detailing on how one might build out the > parsing in a C module for Rsyslog. I'll do the footwork, but if > someone has a link going over it at a high level it might save me some time. > > Thanks! > > -- James > > -----Original Message----- > From: [email protected] [mailto: > [email protected]] On Behalf Of Aaron Wiebe > Sent: Tuesday, October 22, 2013 6:54 AM > To: rsyslog-users > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > Em, link fix: https://github.com/blackberry/hadoop-logdriver > > > On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote: > > > On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James > ><[email protected] > >wrote: > > > >> > >> I agree that not doing all of the expensive regex would be a better > >> solution, and I'm actually in the process of making changes with > >> our developers to address that, but for the short term I'm working > >> with what we have on hand. My eventual goal is to just have them > >> output in JSON. It saves a lot of time long term and works well > >> with parsing the messages in both Elasticsearch and Hadoop. > > > > > > I previously built a 100k+ syslog infrastructure... per server. ;) > > We used imptcp - and I know David's experience has been primarily > > with > UDP. > > The difference from our side was that we wanted to know when we > > dropped messages, so tcp provided that level of confidence - either > > the message was dropped in rsyslog (which we could get from the > > queue > > stats) or on the other side. > > > > On regex: the format of your regex itself will feed the compute > > requirements quite significantly. Simplify, use anchors, avoid > > hungry wildcards. If you can, move to a straight string match. > > > > On instances: rsyslog will top out around 2-3 cores. Run 5-10 > > instances on the same machine using different ports if possible, on > modern hardware. > > > > On Hadoop ingest and Elastic search: Take a look at > > http://github.com/blackberry/logdriver-hadoop - it might be of use > > to you. Additionally, you may want to consider using Kafka and/or > > Storm for ingest rather than rsyslog. That was the direction we were > > heading. > > (Sorry Rainer!) > > > > It doesn't sound like your volume is that high. You just need to > > segment your ingest a bit. One machine should comfortably be able > > to handle 100-200k messages per second - but the threading model (as > > much as it's improved recently) still can't quite max out modern hardware. > > Look at multiple instances on the same machine to see if you can't > > bring the concurrency up. > > > > -Aaron > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

