Can you share your regex? Maybe there is some simple things we can do there without going straight to C (not that I don't recommend that route - it will be the best route, but this might be quicker).
On Tue, Oct 22, 2013 at 8:00 AM, Boylan, James <[email protected]>wrote: > The initial design I'm looking at has 8 instances per server. Which is > about the maximum these serves can handle with the complex regex we're > running. As David points out, the regex is the biggest choke point in the > application when it comes down to it. > > I wonder if anyone has a link detailing on how one might build out the > parsing in a C module for Rsyslog. I'll do the footwork, but if someone has > a link going over it at a high level it might save me some time. > > Thanks! > > -- James > > -----Original Message----- > From: [email protected] [mailto: > [email protected]] On Behalf Of Aaron Wiebe > Sent: Tuesday, October 22, 2013 6:54 AM > To: rsyslog-users > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > Em, link fix: https://github.com/blackberry/hadoop-logdriver > > > On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote: > > > On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James <[email protected] > >wrote: > > > >> > >> I agree that not doing all of the expensive regex would be a better > >> solution, and I'm actually in the process of making changes with our > >> developers to address that, but for the short term I'm working with > >> what we have on hand. My eventual goal is to just have them output in > >> JSON. It saves a lot of time long term and works well with parsing > >> the messages in both Elasticsearch and Hadoop. > > > > > > I previously built a 100k+ syslog infrastructure... per server. ;) > > We used imptcp - and I know David's experience has been primarily with > UDP. > > The difference from our side was that we wanted to know when we > > dropped messages, so tcp provided that level of confidence - either > > the message was dropped in rsyslog (which we could get from the queue > > stats) or on the other side. > > > > On regex: the format of your regex itself will feed the compute > > requirements quite significantly. Simplify, use anchors, avoid hungry > > wildcards. If you can, move to a straight string match. > > > > On instances: rsyslog will top out around 2-3 cores. Run 5-10 > > instances on the same machine using different ports if possible, on > modern hardware. > > > > On Hadoop ingest and Elastic search: Take a look at > > http://github.com/blackberry/logdriver-hadoop - it might be of use to > > you. Additionally, you may want to consider using Kafka and/or Storm > > for ingest rather than rsyslog. That was the direction we were heading. > > (Sorry Rainer!) > > > > It doesn't sound like your volume is that high. You just need to > > segment your ingest a bit. One machine should comfortably be able to > > handle 100-200k messages per second - but the threading model (as much > > as it's improved recently) still can't quite max out modern hardware. > > Look at multiple instances on the same machine to see if you can't > > bring the concurrency up. > > > > -Aaron > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: > This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE > THAT. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

