Great. Thanks David! I'll definitely read this over.
-- James -----Original Message----- From: David Lang [mailto:[email protected]] Sent: Tuesday, October 22, 2013 7:36 AM To: rsyslog-users Cc: Boylan, James Subject: Re: [rsyslog] Large Scale Rsyslog deployment attached is the article I wrote for ;login magazine earlier this year on the topic of designing enterprise logging. David Lang On Tue, 22 Oct 2013, Boylan, James wrote: > Oh! My apologies. A bit of a miscommunication there. > > That is exactly what I'm looking at doing. The currently layout with two > single central logging servers handling pretty much everything is painful and > not efficient. The First layer Relaying layer in the new design will be the > one that updates the message with the FQDN of the sending server and then > passes it down the line. A rough explanation of my plan below. > > Sending servers - > These are all the clients and devices that are sending logs to be > collected. These go to the relay layer > > Relay Layer - > These servers collect all of the bound logs from all of the front end > devices. Aside from traffic management. (Send these messages to Here, and > here, and here.) This layer will only update the message payload to force the > FQDN into it. It's desitnations are a Rsyslog Layer for parsing the messages > into JSON for Elasticsearch, a collector for Hadoop and the Archive Servers. > > Archive Servers - > These servers build out a log directory structure based on the name of the > application, the server it came from the date/hour and the type of log. It > uses this Dynafile structure to save the logs to disk for archiving onto tape. > > Elasticsearch Layer - > This layer is topped with a set of Rsyslog Instances that receive the > traffic, parse it into JSON and then submit it into Elasticsearch. > > Hadoop Collector - > A custom collector built for collecting log chunks and submitting it into > Hadoop. > > My primary goal was to get an idea of the server counts people were using to > build out this size of infrastructure to make sure I wasn't overdoing the > kind of hardware we are needing based on the use cases. > > -- James > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of David Lang > Sent: Tuesday, October 22, 2013 6:39 AM > To: rsyslog-users > Subject: Re: [rsyslog] Large Scale Rsyslog deployment > > On Tue, 22 Oct 2013, Boylan, James wrote: > >> Unfortunately, do to the nature of how things are output into the log >> line because of decision made prior to me taking over the logging >> infrastructure I'm forced to use the regex for now. This is >> compounded by the fact that I can't yet replace the front end >> syslog-ng instances with Rsyslog due to a custom management application that >> is wrapped around it. > > I'm not suggesting that you change the sending machine, I'm suggesting that > you do this on the first-tier relay boxes. > > it's pretty hard to replace the syslog daemon on a cisco router for > example :-) > > I'm saying that the variables that you are currently extracting on the > central server could be extracted on the relay servers, and then stored as > variables in the JSON message sent to the central server. At that point the > central server has access to the results of the regex without having to > perform the regex itself. > >> I agree that not doing all of the expensive regex would be a better >> solution, and I'm actually in the process of making changes with our >> developers to address that, but for the short term I'm working with >> what we have on hand. My eventual goal is to just have them output in >> JSON. It saves a lot of time long term and works well with parsing >> the messages in both Elasticsearch and Hadoop. > > one point that I'm trying to make is that while it takes a regex to extract > in the current template language, is this something that a bit of C code > could easily find? I have a vague memory of an earlier thread that sounds > like your configuration with lots of complex regex patterns for filenames and > paths. If I am remembering things correctly, you were really trying to > extract specific fields, but in ways that the template config language didn't > easily support. > However for known fields like I'm remembering, the C code to look in a string > and find the correct substring is not that hard (and is much faster than a > general regex engine), so getting a custom string generation (sm) module > written so that you would load the module, then just use the template name > rather than crafting the template via the regexes and variables could be > significantly faster. > > It would mean coding some of your logic in C, which is not as nice to change > as a template, but I'll bet the performance gains would be drastic. > > David Lang > >> -- James >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of David Lang >> Sent: Tuesday, October 22, 2013 6:11 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] Large Scale Rsyslog deployment >> >> On Tue, 22 Oct 2013, David Lang wrote: >> >>> If you are doing a lot of complicated regex stuff for creating >>> strings, strongly consider writing (or asking Adiscon to write) a >>> custom string generation module. It's very likely that a bit of C >>> code can do things far more efficently than what you have to do in >>> the template configuation to create the paths. >>> >>> I'm strongly in favor of pushing message cleanups out to the >>> first-tier relay systems, you probably have more of them, and as >>> such you can allocate more CPU to cleanup work. >>> >>> when you use JSON in your relay layer you can add additional >>> metadata to the log message without confusing the final destination >>> (things like the real source IP of the log message, is this >>> dev/QA/prod/DR, what business unit is this for so alerts can go to >>> the right people, >>> etc) >>> >>> re: TCP vs UDP, if you are just going over a local switch, UDP is >>> very reliable, the more potential chokepoints in the path, the more >>> valuable TCP becomes. RELP really becomes needed when the path >>> becomes long and there are a lot of messages in flight, or you end >>> up with connections with a relatively high probability of silently >>> failing (WAN links, or firewalls that can timeout connections or >>> failover/restart and loose track of existing connections are great >>> examples) >> >> by the way, metadata could be application name or other extracted >> information that you then use directly in your templates on the central >> servers without having to parse it out. >> >> David Lang >> >>> David Lang >>> >>> On Tue, 22 Oct 2013, Boylan, James wrote: >>> >>>> The performance problems are the expected ones. Our current >>>> environment layout has two large log archiving servers using rather >>>> complex regex for generating the dynafile name. Added to that, we >>>> are looking at adding output to elasticsearch, but to do so we need >>>> to use an even more complex set of regex to build the JSON output. >>>> So you can imagine the negative impact to the environment when handling >>>> that many messages per second. >>>> >>>> I've just finished rewriting the config in the new format and with >>>> the patch for re_extract that Rainer sent I've implemented all of >>>> the regex in local variables since there was a lot of duplicate >>>> regex comparisons happening. That's helped a lot. >>>> >>>> I'm looking at breaking it out so the relay layer only acts as >>>> traffic manager and rebuilding the Rsyslog message to force a FQDN >>>> into the server name field. Then it will be passed to the specific >>>> server pools for handling archiving, elasticsearch and hadoop. >>>> >>>> All of this is done with TCP as we needed to make sure that packets >>>> reached the destination as reliably as possible without adding too much >>>> overhead. >>>> (I'm still considering relp on the transmission portion between the >>>> relay layer and the back end services, but I haven't made a >>>> decision >>>> yet.) >>>> >>>> -- James >>>> >>>> ----- Reply message ----- >>>> From: "David Lang" <[email protected]> >>>> To: "rsyslog-users" <[email protected]> >>>> Subject: [rsyslog] Large Scale Rsyslog deployment >>>> Date: Tue, Oct 22, 2013 5:03 am >>>> >>>> >>>> >>>> On Tue, 22 Oct 2013, Boylan, James wrote: >>>> >>>>> I know there are several individuals on the list that manage a >>>>> large scale Rsyslog environment handle 70k to 100k+ messages per second. >>>>> >>>>> I was wondering if they could share roughly the number of Rsyslog >>>>> instances running on their relay layer. I'm hoping to get >>>>> confirmation on the numbers I'm looking at. Commenting on if they >>>>> are using UDP versus TCP would be helpful as well. >>>> >>>> I am vastly overprovisioned on the relay layer as I put in relays >>>> per environment. I use UDP from the application servers to the >>>> relay and then UDP if it's relaying on a local subnet (multi-homed >>>> relay systems in one >>>> datacenter) or >>>> RELP if it's relaying to a remote network (especially if it goes >>>> over a >>>> WAN) >>>> >>>> in my older, all-local setup I put 6 pairs of relays in one >>>> datacenter, in the newer, larger environment I started with 3 pairs >>>> of relays per datacenter, but expect that more will be needed >>>> eventually (I am no longer at the company, so I won't be building >>>> out that system further). I have all of these relay to one pair of >>>> core relay boxes that then distribute the logs to the different >>>> analysis boxes. >>>> >>>> >>>> It depends a lot on what you are doing on the relay boxes. If you >>>> are just relaying messages without modifying them, you should be >>>> able to get up to gig-E wire speed on a single box (although we >>>> recently found a bottleneck that triggered imudp to be able to be >>>> multithreaded to handle that load) >>>> >>>> if you are modifying the messages, doing encryption, etc you may >>>> find that you run into performance limitations sooner. >>>> >>>> If you are running into performance problems, we'd be interested in >>>> hearing details and trying to address them. >>>> >>>> David Lang >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >>>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>> you DON'T LIKE THAT. >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >>>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>> you DON'T LIKE THAT. >>>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >>> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>> you DON'T LIKE THAT. >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This >> is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our >> control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites >> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE >> THAT. >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This > is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our > control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites > beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

