Ah, I see.  Can you share a sample message (replacing anything that could
be remotely sensitive)?


On Tue, Oct 22, 2013 at 8:18 AM, Boylan, James <[email protected]>wrote:

>
> Cut and paste broke on that last email. Trying again for easy reading.
>
> set $!errorlevel = re_extract($msg, '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0,
> 1, 'N/A');
> set $!session  = re_extract($msg,
> '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*',
> 0, 1, 'N/A');
> set $!appname = re_extract($msg, '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1,
> 'Unknown');
> set $!appversion = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_-]*[\\^]',
> 0, 1, 'Unknown');
> set $!appinstance = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_-]*[\\^]',
> 0, 1, 'N/A');
> set $!logtype = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*)[_-]*[a-zA-Z0-9_-]*[\\^]',
> 0, 1, 'Unknown');
> set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0, 1, 'NoMatch');
>
>
>
> -----Original Message-----
> From: Boylan, James
> Sent: Tuesday, October 22, 2013 7:18 AM
> To: rsyslog-users
> Subject: RE: [rsyslog] Large Scale Rsyslog deployment
>
> Sure. I'm working on scheduling time to clean them up more. (I already had
> cleaned them up from the original ones that David had seen a few months
> ago.)
>
> set $!errorlevel = re_extract($msg, '^.*[\\^][0-9.-]+\\|([A-Z]+)\\|.*', 0,
> 1, 'N/A'); set $!session  = re_extract($msg,
> '^.*[\\^][0-9.-]+\\|[A-Z]+\\|[a-zA-Z0-9.-]+\\|[a-zA-Z0-9._-]+\\|[a-zA-Z0-9]*\\|([a-zA-Z0-9._-]*)[~]*[a-zA-Z0-9._-]*\\|.*',
> 0, 1, 'N/A'); set $!appname = re_extract($msg,
> '^([A-Za-z0-9._-]+)\\|.*[\\^]', 0, 1, 'Unknown'); set $!appversion =
> re_extract($msg,
> '^[A-Za-z0-9._-]+\\|([A-Za-z0-9._-]+)\\|[A-Za-z0-9._]+[-_]*[A-Za-z0-9_-]*[\\^]',
> 0, 1, 'Unknown'); set $!appinstance = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|([A-Za-z0-9._]+)[-_]*[A-Za-z0-9_-]*[\\^]',
> 0, 1, 'N/A'); set $!logtype = re_extract($msg,
> '^[A-Za-z0-9._-]+\\|[A-Za-z0-9._-]+\\|[A-Za-z0-9.]+[-_]*([A-Za-z0-9_]*)[_-]*[a-zA-Z0-9_-]*[\\^]',
> 0, 1, 'Unknown'); set $!cleanmessage = re_extract($msg, '^.*[\\^](.*)', 0,
> 1, 'NoMatch');
>
> All of our syslog messages that we receive had a
> <app_name>|<app_version>|<instance_number>_<logtype>^ appended to the start
> of the msg field. The errorlevel and session variables are pulled from
> within the actual log payload itself.
>
> Due to how the message is structured I'm not sure how much more I can
> condense the regex down. Which is why I'm definitely going to start looking
> into the C string generator modules.
>
> -- James
>
> -----Original Message-----
> From: [email protected] [mailto:
> [email protected]] On Behalf Of Aaron Wiebe
> Sent: Tuesday, October 22, 2013 7:13 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] Large Scale Rsyslog deployment
>
> Can you share your regex?  Maybe there is some simple things we can do
> there without going straight to C (not that I don't recommend that route -
> it will be the best route, but this might be quicker).
>
>
> On Tue, Oct 22, 2013 at 8:00 AM, Boylan, James <[email protected]
> >wrote:
>
> > The initial design I'm looking at has 8 instances per server. Which is
> > about the maximum these serves can handle with the complex regex we're
> > running. As David points out, the regex is the biggest choke point in
> > the application when it comes down to it.
> >
> > I wonder if anyone has a link detailing on how one might build out the
> > parsing in a C module for Rsyslog. I'll do the footwork, but if
> > someone has a link going over it at a high level it might save me some
> time.
> >
> > Thanks!
> >
> > -- James
> >
> > -----Original Message-----
> > From: [email protected] [mailto:
> > [email protected]] On Behalf Of Aaron Wiebe
> > Sent: Tuesday, October 22, 2013 6:54 AM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] Large Scale Rsyslog deployment
> >
> > Em, link fix:  https://github.com/blackberry/hadoop-logdriver
> >
> >
> > On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote:
> >
> > > On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James
> > ><[email protected]
> > >wrote:
> > >
> > >>
> > >> I agree that not doing all of the expensive regex would be a better
> > >> solution, and I'm actually in the process of making changes with
> > >> our developers to address that, but for the short term I'm working
> > >> with what we have on hand. My eventual goal is to just have them
> > >> output in JSON. It saves a lot of time long term and works well
> > >> with parsing the messages in both Elasticsearch and Hadoop.
> > >
> > >
> > > I previously built a 100k+ syslog infrastructure... per server.  ;)
> > > We used imptcp - and I know David's experience has been primarily
> > > with
> > UDP.
> > >  The difference from our side was that we wanted to know when we
> > > dropped messages, so tcp provided that level of confidence - either
> > > the message was dropped in rsyslog (which we could get from the
> > > queue
> > > stats) or on the other side.
> > >
> > > On regex:  the format of your regex itself will feed the compute
> > > requirements quite significantly.  Simplify, use anchors, avoid
> > > hungry wildcards.  If you can, move to a straight string match.
> > >
> > > On instances:  rsyslog will top out around 2-3 cores.  Run 5-10
> > > instances on the same machine using different ports if possible, on
> > modern hardware.
> > >
> > > On Hadoop ingest and Elastic search:  Take a look at
> > > http://github.com/blackberry/logdriver-hadoop - it might be of use
> > > to you.  Additionally, you may want to consider using Kafka and/or
> > > Storm for ingest rather than rsyslog.  That was the direction we were
> heading.
> > >  (Sorry Rainer!)
> > >
> > > It doesn't sound like your volume is that high.  You just need to
> > > segment your ingest a bit.  One machine should comfortably be able
> > > to handle 100-200k messages per second - but the threading model (as
> > > much as it's improved recently) still can't quite max out modern
> hardware.
> > > Look at multiple instances on the same machine to see if you can't
> > > bring the concurrency up.
> > >
> > > -Aaron
> > >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> > LIKE THAT.
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
> > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T LIKE THAT.
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
> This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
> THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to