Aaron -

I can, but I'll have to do it a bit later when I have some available time to 
grab a snippet of logs.

-- James

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Aaron Wiebe
Sent: Tuesday, October 22, 2013 7:27 AM
To: rsyslog-users
Subject: Re: [rsyslog] Large Scale Rsyslog deployment

Ah, I see.  Can you share a sample message (replacing anything that could be 
remotely sensitive)?


On Tue, Oct 22, 2013 at 8:18 AM, Boylan, James <[email protected]>wrote:

>
> Cut and paste broke on that last email. Trying again for easy reading.
>
> set $!errorlevel = re_extract($msg, 
> '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, 1, 'N/A'); set $!session  = 
> re_extract($msg, 
> '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-
> Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*',
> 0, 1, 'N/A');
> set $!appname = re_extract($msg, '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 
> 'Unknown'); set $!appversion = re_extract($msg, 
> '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_
> -]*[\\^]',
> 0, 1, 'Unknown');
> set $!appinstance = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_
> -]*[\\^]',
> 0, 1, 'N/A');
> set $!logtype = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*
> )[_-]*[a-zA-Z0-9_-]*[\\^]',
> 0, 1, 'Unknown');
> set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, 1, 
> 'NoMatch');
>
>
>
> -----Original Message-----
> From: Boylan, James
> Sent: Tuesday, October 22, 2013 7:18 AM
> To: rsyslog-users
> Subject: RE: [rsyslog] Large Scale Rsyslog deployment
>
> Sure. I'm working on scheduling time to clean them up more. (I already 
> had cleaned them up from the original ones that David had seen a few 
> months
> ago.)
>
> set $!errorlevel = re_extract($msg, 
> '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0, 1, 'N/A'); set $!session  = 
> re_extract($msg, 
> '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-
> Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*',
> 0, 1, 'N/A'); set $!appname = re_extract($msg, 
> '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 'Unknown'); set $!appversion = 
> re_extract($msg, 
> '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_
> -]*[\\^]', 0, 1, 'Unknown'); set $!appinstance = re_extract($msg, 
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_
> -]*[\\^]', 0, 1, 'N/A'); set $!logtype = re_extract($msg, 
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*
> )[_-]*[a-zA-Z0-9_-]*[\\^]', 0, 1, 'Unknown'); set $!cleanmessage = 
> re_extract($msg, '^.*[\\^](.*)', 0, 1, 'NoMatch');
>
> All of our syslog messages that we receive had a 
> <app_name>|<app_version>|<instance_number>_<logtype>^ appended to the 
> start of the msg field. The errorlevel and session variables are 
> pulled from within the actual log payload itself.
>
> Due to how the message is structured I'm not sure how much more I can 
> condense the regex down. Which is why I'm definitely going to start 
> looking into the C string generator modules.
>
> -- James
>
> -----Original Message-----
> From: [email protected] [mailto:
> [email protected]] On Behalf Of Aaron Wiebe
> Sent: Tuesday, October 22, 2013 7:13 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] Large Scale Rsyslog deployment
>
> Can you share your regex?  Maybe there is some simple things we can do 
> there without going straight to C (not that I don't recommend that 
> route - it will be the best route, but this might be quicker).
>
>
> On Tue, Oct 22, 2013 at 8:00 AM, Boylan, James 
> <[email protected]
> >wrote:
>
> > The initial design I'm looking at has 8 instances per server. Which 
> > is about the maximum these serves can handle with the complex regex 
> > we're running. As David points out, the regex is the biggest choke 
> > point in the application when it comes down to it.
> >
> > I wonder if anyone has a link detailing on how one might build out 
> > the parsing in a C module for Rsyslog. I'll do the footwork, but if 
> > someone has a link going over it at a high level it might save me 
> > some
> time.
> >
> > Thanks!
> >
> > -- James
> >
> > -----Original Message-----
> > From: [email protected] [mailto:
> > [email protected]] On Behalf Of Aaron Wiebe
> > Sent: Tuesday, October 22, 2013 6:54 AM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] Large Scale Rsyslog deployment
> >
> > Em, link fix:  https://github.com/blackberry/hadoop-logdriver
> >
> >
> > On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote:
> >
> > > On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James 
> > ><[email protected]
> > >wrote:
> > >
> > >>
> > >> I agree that not doing all of the expensive regex would be a 
> > >> better solution, and I'm actually in the process of making 
> > >> changes with our developers to address that, but for the short 
> > >> term I'm working with what we have on hand. My eventual goal is 
> > >> to just have them output in JSON. It saves a lot of time long 
> > >> term and works well with parsing the messages in both Elasticsearch and 
> > >> Hadoop.
> > >
> > >
> > > I previously built a 100k+ syslog infrastructure... per server.  
> > > ;) We used imptcp - and I know David's experience has been 
> > > primarily with
> > UDP.
> > >  The difference from our side was that we wanted to know when we 
> > > dropped messages, so tcp provided that level of confidence - 
> > > either the message was dropped in rsyslog (which we could get from 
> > > the queue
> > > stats) or on the other side.
> > >
> > > On regex:  the format of your regex itself will feed the compute 
> > > requirements quite significantly.  Simplify, use anchors, avoid 
> > > hungry wildcards.  If you can, move to a straight string match.
> > >
> > > On instances:  rsyslog will top out around 2-3 cores.  Run 5-10 
> > > instances on the same machine using different ports if possible, 
> > > on
> > modern hardware.
> > >
> > > On Hadoop ingest and Elastic search:  Take a look at 
> > > http://github.com/blackberry/logdriver-hadoop - it might be of use 
> > > to you.  Additionally, you may want to consider using Kafka and/or 
> > > Storm for ingest rather than rsyslog.  That was the direction we 
> > > were
> heading.
> > >  (Sorry Rainer!)
> > >
> > > It doesn't sound like your volume is that high.  You just need to 
> > > segment your ingest a bit.  One machine should comfortably be able 
> > > to handle 100-200k messages per second - but the threading model 
> > > (as much as it's improved recently) still can't quite max out 
> > > modern
> hardware.
> > > Look at multiple instances on the same machine to see if you can't 
> > > bring the concurrency up.
> > >
> > > -Aaron
> > >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you 
> > DON'T LIKE THAT.
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
> > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad 
> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if 
> > you DON'T LIKE THAT.
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites 
> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE 
> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you 
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is 
a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our 
control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to