The initial design I'm looking at has 8 instances per server. Which is about 
the maximum these serves can handle with the complex regex we're running. As 
David points out, the regex is the biggest choke point in the application when 
it comes down to it.

I wonder if anyone has a link detailing on how one might build out the parsing 
in a C module for Rsyslog. I'll do the footwork, but if someone has a link 
going over it at a high level it might save me some time.

Thanks!

-- James

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Aaron Wiebe
Sent: Tuesday, October 22, 2013 6:54 AM
To: rsyslog-users
Subject: Re: [rsyslog] Large Scale Rsyslog deployment

Em, link fix:  https://github.com/blackberry/hadoop-logdriver


On Tue, Oct 22, 2013 at 7:53 AM, Aaron Wiebe <[email protected]> wrote:

> On Tue, Oct 22, 2013 at 7:16 AM, Boylan, James <[email protected]>wrote:
>
>>
>> I agree that not doing all of the expensive regex would be a better 
>> solution, and I'm actually in the process of making changes with our 
>> developers to address that, but for the short term I'm working with 
>> what we have on hand. My eventual goal is to just have them output in 
>> JSON. It saves a lot of time long term and works well with parsing 
>> the messages in both Elasticsearch and Hadoop.
>
>
> I previously built a 100k+ syslog infrastructure... per server.  ;)  
> We used imptcp - and I know David's experience has been primarily with UDP.
>  The difference from our side was that we wanted to know when we 
> dropped messages, so tcp provided that level of confidence - either 
> the message was dropped in rsyslog (which we could get from the queue 
> stats) or on the other side.
>
> On regex:  the format of your regex itself will feed the compute 
> requirements quite significantly.  Simplify, use anchors, avoid hungry 
> wildcards.  If you can, move to a straight string match.
>
> On instances:  rsyslog will top out around 2-3 cores.  Run 5-10 
> instances on the same machine using different ports if possible, on modern 
> hardware.
>
> On Hadoop ingest and Elastic search:  Take a look at 
> http://github.com/blackberry/logdriver-hadoop - it might be of use to 
> you.  Additionally, you may want to consider using Kafka and/or Storm 
> for ingest rather than rsyslog.  That was the direction we were heading.
>  (Sorry Rainer!)
>
> It doesn't sound like your volume is that high.  You just need to 
> segment your ingest a bit.  One machine should comfortably be able to 
> handle 100-200k messages per second - but the threading model (as much 
> as it's improved recently) still can't quite max out modern hardware.  
> Look at multiple instances on the same machine to see if you can't 
> bring the concurrency up.
>
> -Aaron
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is 
a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our 
control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to