Ah, I see. Can you share a sample message (replacing anything that could be remotely sensitive)?
On Tue, Oct 22, 2013 at 8:18 AM, Boylan, James <[email protected]>wrote: > > Cut and paste broke on that last email. Trying again for easy reading. > > set $!errorlevel = re_extract($msg, '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, > 1, 'N/A'); > set $!session = re_extract($msg, > '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*', > 0, 1, 'N/A'); > set $!appname = re_extract($msg, '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, > 'Unknown'); > set $!appversion = re_extract($msg, > '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_-]*[\\^]', > 0, 1, 'Unknown'); > set $!appinstance = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_-]*[\\^]', > 0, 1, 'N/A'); > set $!logtype = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*)[_-]*[a-zA-Z0-9_-]*[\\^]', > 0, 1, 'Unknown'); > set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, 1, 'NoMatch'); > > > > -----Original Message----- > From: Boylan, James > Sent: Tuesday, October 22, 2013 7:18 AM > To: rsyslog-users > Subject: RE: [rsyslog] Large Scale Rsyslog deployment > > Sure. I'm working on scheduling time to clean them up more. (I already had > cleaned them up from the original ones that David had seen a few months > ago.) > > set $!errorlevel = re_extract($msg, '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, > 1, 'N/A'); set $!session = re_extract($msg, > '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*', > 0, 1, 'N/A'); set $!appname = re_extract($msg, > '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 'Unknown'); set $!appversion = > re_extract($msg, > '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_-]*[\\^]', > 0, 1, 'Unknown'); set $!appinstance = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_-]*[\\^]', > 0, 1, 'N/A'); set $!logtype = re_extract($msg, > '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*)[_-]*[a-zA-Z0-9_-]*[\\^]', > 0, 1, 'Unknown'); set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, > 1, 'NoMatch'); > > All of our syslog messages that we receive had a > <app_name>|<app_version>|<instance_number>_<logtype>^ appended to the start > of the msg field. The errorlevel and session variables are pulled from > within the actual log payload itself. > > Due to how the message is structured I'm not sure how much more I can > condense the regex down. Which is why I'm definitely going to start looking > into the C string generator modules. > > -- James > > -----Original Message----- > From: [email protected] [mailto: > [email protected]] On Behalf Of Aaron Wiebe > Sent: Tuesday, October 22, 2013 7:13 AM > To: rsyslog-users > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > Can you share your regex? Maybe there is some simple things we can do > there without going straight to C (not that I don't recommend that route - > it will be the best route, but this might be quicker). > > > On Tue, Oct 22, 2013 at 8:00 AM, Boylan, James <[email protected] > >wrote: > > > The initial design I'm looking at has 8 instances per server. Which is > > about the maximum these serves can handle with the complex regex we're > > running. As David points out, the regex is the biggest choke point in > > the application when it comes down to it. > > > > I wonder if anyone has a link detailing on how one might build out the > > parsing in a C module for Rsyslog. I'll do the footwork, but if > > someone has a link going over it at a high level it might save me some > time. > > > > Thanks! > > > > -- James > > > > -----Original Message----- > > From: [email protected] [mailto: > > [email protected]] On Behalf Of Aaron Wiebe > > Sent: Tuesday, October 22, 2013 6:54 AM > > To: rsyslog-users > > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > > > Em, link fix: https://github.com/blackberry/hadoop-logdriver > > > > > > On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote: > > > > > On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James > > ><[email protected] > > >wrote: > > > > > >> > > >> I agree that not doing all of the expensive regex would be a better > > >> solution, and I'm actually in the process of making changes with > > >> our developers to address that, but for the short term I'm working > > >> with what we have on hand. My eventual goal is to just have them > > >> output in JSON. It saves a lot of time long term and works well > > >> with parsing the messages in both Elasticsearch and Hadoop. > > > > > > > > > I previously built a 100k+ syslog infrastructure... per server. ;) > > > We used imptcp - and I know David's experience has been primarily > > > with > > UDP. > > > The difference from our side was that we wanted to know when we > > > dropped messages, so tcp provided that level of confidence - either > > > the message was dropped in rsyslog (which we could get from the > > > queue > > > stats) or on the other side. > > > > > > On regex: the format of your regex itself will feed the compute > > > requirements quite significantly. Simplify, use anchors, avoid > > > hungry wildcards. If you can, move to a straight string match. > > > > > > On instances: rsyslog will top out around 2-3 cores. Run 5-10 > > > instances on the same machine using different ports if possible, on > > modern hardware. > > > > > > On Hadoop ingest and Elastic search: Take a look at > > > http://github.com/blackberry/logdriver-hadoop - it might be of use > > > to you. Additionally, you may want to consider using Kafka and/or > > > Storm for ingest rather than rsyslog. That was the direction we were > heading. > > > (Sorry Rainer!) > > > > > > It doesn't sound like your volume is that high. You just need to > > > segment your ingest a bit. One machine should comfortably be able > > > to handle 100-200k messages per second - but the threading model (as > > > much as it's improved recently) still can't quite max out modern > hardware. > > > Look at multiple instances on the same machine to see if you can't > > > bring the concurrency up. > > > > > > -Aaron > > > > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > > LIKE THAT. > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > > DON'T LIKE THAT. > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE > THAT. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

