Great. Thanks David!

 I'll definitely read this over.

-- James

-----Original Message-----
From: David Lang [mailto:[email protected]] 
Sent: Tuesday, October 22, 2013 7:36 AM
To: rsyslog-users
Cc: Boylan, James
Subject: Re: [rsyslog] Large Scale Rsyslog deployment

attached is the article I wrote for ;login magazine earlier this year on the 
topic of designing enterprise logging.

David Lang

On Tue, 22 Oct 2013, Boylan, James wrote:

> Oh! My apologies. A bit of a miscommunication there.
>
> That is exactly what I'm looking at doing. The currently layout with two 
> single central logging servers handling pretty much everything is painful and 
> not efficient. The First layer Relaying layer in the new design will be the 
> one that updates the message with the FQDN of the sending server and then 
> passes it down the line. A rough explanation of my plan below.
>
> Sending servers -
> These are all the clients and devices that are sending logs to be 
> collected. These go to the relay layer
>
> Relay Layer -
> These servers collect all of the bound logs from all of the front end 
> devices. Aside from traffic management. (Send these messages to Here, and 
> here, and here.) This layer will only update the message payload to force the 
> FQDN into it. It's desitnations are a Rsyslog Layer for parsing the messages 
> into JSON for Elasticsearch, a collector for Hadoop and the Archive Servers.
>
> Archive Servers -
> These servers build out a log directory structure based on the name of the 
> application, the server it came from the date/hour and the type of log. It 
> uses this Dynafile structure to save the logs to disk for archiving onto tape.
>
> Elasticsearch Layer -
> This layer is topped with a set of Rsyslog Instances that receive the 
> traffic, parse it into JSON and then submit it into Elasticsearch.
>
> Hadoop Collector -
> A custom collector built for collecting log chunks and submitting it into 
> Hadoop.
>
> My primary goal was to get an idea of the server counts people were using to 
> build out this size of infrastructure to make sure I wasn't overdoing the 
> kind of hardware we are needing based on the use cases.
>
> -- James
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of David Lang
> Sent: Tuesday, October 22, 2013 6:39 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] Large Scale Rsyslog deployment
>
> On Tue, 22 Oct 2013, Boylan, James wrote:
>
>> Unfortunately, do to the nature of how things are output into the log 
>> line because of decision made prior to me taking over the logging 
>> infrastructure I'm forced to use the regex for now. This is 
>> compounded by the fact that I can't yet replace the front end 
>> syslog-ng instances with Rsyslog due to a custom management application that 
>> is wrapped around it.
>
> I'm not suggesting that you change the sending machine, I'm suggesting that 
> you do this on the first-tier relay boxes.
>
> it's pretty hard to replace the syslog daemon on a cisco router for 
> example :-)
>
> I'm saying that the variables that you are currently extracting on the 
> central server could be extracted on the relay servers, and then stored as 
> variables in the JSON message sent to the central server. At that point the 
> central server has access to the results of the regex without having to 
> perform the regex itself.
>
>> I agree that not doing all of the expensive regex would be a better 
>> solution, and I'm actually in the process of making changes with our 
>> developers to address that, but for the short term I'm working with 
>> what we have on hand. My eventual goal is to just have them output in 
>> JSON. It saves a lot of time long term and works well with parsing 
>> the messages in both Elasticsearch and Hadoop.
>
> one point that I'm trying to make is that while it takes a regex to extract 
> in the current template language, is this something that a bit of C code 
> could easily find? I have a vague memory of an earlier thread that sounds 
> like your configuration with lots of complex regex patterns for filenames and 
> paths. If I am remembering things correctly, you were really trying to 
> extract specific fields, but in ways that the template config language didn't 
> easily support.
> However for known fields like I'm remembering, the C code to look in a string 
> and find the correct substring is not that hard (and is much faster than a 
> general regex engine), so getting a custom string generation (sm) module 
> written so that you would load the module, then just use the template name 
> rather than crafting the template via the regexes and variables could be 
> significantly faster.
>
> It would mean coding some of your logic in C, which is not as nice to change 
> as a template, but I'll bet the performance gains would be drastic.
>
> David Lang
>
>> -- James
>>
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of David Lang
>> Sent: Tuesday, October 22, 2013 6:11 AM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] Large Scale Rsyslog deployment
>>
>> On Tue, 22 Oct 2013, David Lang wrote:
>>
>>> If you are doing a lot of complicated regex stuff for creating 
>>> strings, strongly consider writing (or asking Adiscon to write) a 
>>> custom string generation module. It's very likely that a bit of C 
>>> code can do things far more efficently than what you have to do in 
>>> the template configuation to create the paths.
>>>
>>> I'm strongly in favor of pushing message cleanups out to the 
>>> first-tier relay systems, you probably have more of them, and as 
>>> such you can allocate more CPU to cleanup work.
>>>
>>> when you use JSON in your relay layer you can add additional 
>>> metadata to the log message without confusing the final destination 
>>> (things like the real source IP of the log message, is this 
>>> dev/QA/prod/DR, what business unit is this for so alerts can go to 
>>> the right people,
>>> etc)
>>>
>>> re: TCP vs UDP, if you are just going over a local switch, UDP is 
>>> very reliable, the more potential chokepoints in the path, the more 
>>> valuable TCP becomes. RELP really becomes needed when the path 
>>> becomes long and there are a lot of messages in flight, or you end 
>>> up with connections with a relatively high probability of silently 
>>> failing (WAN links, or firewalls that can timeout connections or 
>>> failover/restart and loose track of existing connections are great
>>> examples)
>>
>> by the way, metadata could be application name or other extracted 
>> information that you then use directly in your templates on the central 
>> servers without having to parse it out.
>>
>> David Lang
>>
>>> David Lang
>>>
>>> On Tue, 22 Oct 2013, Boylan, James wrote:
>>>
>>>> The performance problems are the expected ones. Our current 
>>>> environment layout has two large log archiving servers using rather 
>>>> complex regex for generating the dynafile name. Added to that, we 
>>>> are looking at adding output to elasticsearch, but to do so we need 
>>>> to use an even more complex set of regex to build the JSON output.
>>>> So you can imagine the negative impact to the environment when handling 
>>>> that many messages per second.
>>>>
>>>> I've just finished rewriting the config in the new format and with 
>>>> the patch for re_extract that Rainer sent I've implemented all of 
>>>> the regex in local variables since there was a lot of duplicate 
>>>> regex comparisons happening. That's helped a lot.
>>>>
>>>> I'm looking at breaking it out so the relay layer only acts as 
>>>> traffic manager and rebuilding the Rsyslog message to force a FQDN 
>>>> into the server name field. Then it will be passed to the specific 
>>>> server pools for handling archiving, elasticsearch and hadoop.
>>>>
>>>> All of this is done with TCP as we needed to make sure that packets 
>>>> reached the destination as reliably as possible without adding too much 
>>>> overhead.
>>>> (I'm still considering relp on the transmission portion between the 
>>>> relay layer and the back end services, but I haven't made a 
>>>> decision
>>>> yet.)
>>>>
>>>> -- James
>>>>
>>>> ----- Reply message -----
>>>> From: "David Lang" <[email protected]>
>>>> To: "rsyslog-users" <[email protected]>
>>>> Subject: [rsyslog] Large Scale Rsyslog deployment
>>>> Date: Tue, Oct 22, 2013 5:03 am
>>>>
>>>>
>>>>
>>>> On Tue, 22 Oct 2013, Boylan, James wrote:
>>>>
>>>>> I know there are several individuals on the list that manage a 
>>>>> large scale Rsyslog environment handle 70k to 100k+ messages per second.
>>>>>
>>>>> I was wondering if they could share roughly the number of Rsyslog 
>>>>> instances running on their relay layer. I'm hoping to get 
>>>>> confirmation on the numbers I'm looking at. Commenting on if they 
>>>>> are using UDP versus TCP would be helpful as well.
>>>>
>>>> I am vastly overprovisioned on the relay layer as I put in relays 
>>>> per environment. I use UDP from the application servers to the 
>>>> relay and then UDP if it's relaying on a local subnet (multi-homed 
>>>> relay systems in one
>>>> datacenter) or
>>>> RELP if it's relaying to a remote network (especially if it goes 
>>>> over a
>>>> WAN)
>>>>
>>>> in my older, all-local setup I put 6 pairs of relays in one 
>>>> datacenter, in the newer, larger environment I started with 3 pairs 
>>>> of relays per datacenter, but expect that more will be needed 
>>>> eventually (I am no longer at the company, so I won't be building 
>>>> out that system further). I have all of these relay to one pair of 
>>>> core relay boxes that then distribute the logs to the different 
>>>> analysis boxes.
>>>>
>>>>
>>>> It depends a lot on what you are doing on the relay boxes. If you 
>>>> are just relaying messages without modifying them, you should be 
>>>> able to get up to gig-E wire speed on a single box (although we 
>>>> recently found a bottleneck that triggered imudp to be able to be 
>>>> multithreaded to handle that load)
>>>>
>>>> if you are modifying the messages, doing encryption, etc you may 
>>>> find that you run into performance limitations sooner.
>>>>
>>>> If you are running into performance problems, we'd be interested in 
>>>> hearing details and trying to address them.
>>>>
>>>> David Lang
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
>>>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad 
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if 
>>>> you DON'T LIKE THAT.
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
>>>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad 
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if 
>>>> you DON'T LIKE THAT.
>>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
>>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad 
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if 
>>> you DON'T LIKE THAT.
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This 
>> is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our 
>> control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites 
>> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
>> THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This 
> is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our 
> control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE 
> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites 
> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to