Aaron - I can, but I'll have to do it a bit later when I have some available time to grab a snippet of logs.
-- James -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Aaron Wiebe Sent: Tuesday, October 22, 2013 7:27 AM To: rsyslog-users Subject: Re: [rsyslog] Large Scale Rsyslog deployment Ah, I see. Can you share a sample message (replacing anything that could be remotely sensitive)? On Tue, Oct 22, 2013 at 8:18 AM, Boylan, James <[email protected]>wrote: > > Cut and paste broke on that last email. Trying again for easy reading. > > set $!errorlevel = re_extract($msg, > '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, 1, 'N/A'); set $!session = > re_extract($msg, > '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA- > Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*', > 0, 1, 'N/A'); > set $!appname = re_extract($msg, '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, > 'Unknown'); set $!appversion = re_extract($msg, > '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_ > -]*[\\^]', > 0, 1, 'Unknown'); > set $!appinstance = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_ > -]*[\\^]', > 0, 1, 'N/A'); > set $!logtype = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]* > )[_-]*[a-zA-Z0-9_-]*[\\^]', > 0, 1, 'Unknown'); > set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, 1, > 'NoMatch'); > > > > -----Original Message----- > From: Boylan, James > Sent: Tuesday, October 22, 2013 7:18 AM > To: rsyslog-users > Subject: RE: [rsyslog] Large Scale Rsyslog deployment > > Sure. I'm working on scheduling time to clean them up more. (I already > had cleaned them up from the original ones that David had seen a few > months > ago.) > > set $!errorlevel = re_extract($msg, > '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, 1, 'N/A'); set $!session = > re_extract($msg, > '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA- > Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*', > 0, 1, 'N/A'); set $!appname = re_extract($msg, > '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 'Unknown'); set $!appversion = > re_extract($msg, > '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_ > -]*[\\^]', 0, 1, 'Unknown'); set $!appinstance = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_ > -]*[\\^]', 0, 1, 'N/A'); set $!logtype = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]* > )[_-]*[a-zA-Z0-9_-]*[\\^]', 0, 1, 'Unknown'); set $!cleanmessage = > re_extract($msg, '^.*[\\^](.*)', 0, 1, 'NoMatch'); > > All of our syslog messages that we receive had a > <app_name>|<app_version>|<instance_number>_<logtype>^ appended to the > start of the msg field. The errorlevel and session variables are > pulled from within the actual log payload itself. > > Due to how the message is structured I'm not sure how much more I can > condense the regex down. Which is why I'm definitely going to start > looking into the C string generator modules. > > -- James > > -----Original Message----- > From: [email protected] [mailto: > [email protected]] On Behalf Of Aaron Wiebe > Sent: Tuesday, October 22, 2013 7:13 AM > To: rsyslog-users > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > Can you share your regex? Maybe there is some simple things we can do > there without going straight to C (not that I don't recommend that > route - it will be the best route, but this might be quicker). > > > On Tue, Oct 22, 2013 at 8:00 AM, Boylan, James > <[email protected] > >wrote: > > > The initial design I'm looking at has 8 instances per server. Which > > is about the maximum these serves can handle with the complex regex > > we're running. As David points out, the regex is the biggest choke > > point in the application when it comes down to it. > > > > I wonder if anyone has a link detailing on how one might build out > > the parsing in a C module for Rsyslog. I'll do the footwork, but if > > someone has a link going over it at a high level it might save me > > some > time. > > > > Thanks! > > > > -- James > > > > -----Original Message----- > > From: [email protected] [mailto: > > [email protected]] On Behalf Of Aaron Wiebe > > Sent: Tuesday, October 22, 2013 6:54 AM > > To: rsyslog-users > > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > > > Em, link fix: https://github.com/blackberry/hadoop-logdriver > > > > > > On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote: > > > > > On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James > > ><[email protected] > > >wrote: > > > > > >> > > >> I agree that not doing all of the expensive regex would be a > > >> better solution, and I'm actually in the process of making > > >> changes with our developers to address that, but for the short > > >> term I'm working with what we have on hand. My eventual goal is > > >> to just have them output in JSON. It saves a lot of time long > > >> term and works well with parsing the messages in both Elasticsearch and > > >> Hadoop. > > > > > > > > > I previously built a 100k+ syslog infrastructure... per server. > > > ;) We used imptcp - and I know David's experience has been > > > primarily with > > UDP. > > > The difference from our side was that we wanted to know when we > > > dropped messages, so tcp provided that level of confidence - > > > either the message was dropped in rsyslog (which we could get from > > > the queue > > > stats) or on the other side. > > > > > > On regex: the format of your regex itself will feed the compute > > > requirements quite significantly. Simplify, use anchors, avoid > > > hungry wildcards. If you can, move to a straight string match. > > > > > > On instances: rsyslog will top out around 2-3 cores. Run 5-10 > > > instances on the same machine using different ports if possible, > > > on > > modern hardware. > > > > > > On Hadoop ingest and Elastic search: Take a look at > > > http://github.com/blackberry/logdriver-hadoop - it might be of use > > > to you. Additionally, you may want to consider using Kafka and/or > > > Storm for ingest rather than rsyslog. That was the direction we > > > were > heading. > > > (Sorry Rainer!) > > > > > > It doesn't sound like your volume is that high. You just need to > > > segment your ingest a bit. One machine should comfortably be able > > > to handle 100-200k messages per second - but the threading model > > > (as much as it's improved recently) still can't quite max out > > > modern > hardware. > > > Look at multiple instances on the same machine to see if you can't > > > bring the concurrency up. > > > > > > -Aaron > > > > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > > DON'T LIKE THAT. > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if > > you DON'T LIKE THAT. > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

