Unfortunately, do to the nature of how things are output into the log line because of decision made prior to me taking over the logging infrastructure I'm forced to use the regex for now. This is compounded by the fact that I can't yet replace the front end syslog-ng instances with Rsyslog due to a custom management application that is wrapped around it.
I agree that not doing all of the expensive regex would be a better solution, and I'm actually in the process of making changes with our developers to address that, but for the short term I'm working with what we have on hand. My eventual goal is to just have them output in JSON. It saves a lot of time long term and works well with parsing the messages in both Elasticsearch and Hadoop. -- James -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of David Lang Sent: Tuesday, October 22, 2013 6:11 AM To: rsyslog-users Subject: Re: [rsyslog] Large Scale Rsyslog deployment On Tue, 22 Oct 2013, David Lang wrote: > If you are doing a lot of complicated regex stuff for creating > strings, strongly consider writing (or asking Adiscon to write) a > custom string generation module. It's very likely that a bit of C code > can do things far more efficently than what you have to do in the > template configuation to create the paths. > > I'm strongly in favor of pushing message cleanups out to the > first-tier relay systems, you probably have more of them, and as such > you can allocate more CPU to cleanup work. > > when you use JSON in your relay layer you can add additional metadata > to the log message without confusing the final destination (things > like the real source IP of the log message, is this dev/QA/prod/DR, > what business unit is this for so alerts can go to the right people, > etc) > > re: TCP vs UDP, if you are just going over a local switch, UDP is very > reliable, the more potential chokepoints in the path, the more > valuable TCP becomes. RELP really becomes needed when the path becomes > long and there are a lot of messages in flight, or you end up with > connections with a relatively high probability of silently failing > (WAN links, or firewalls that can timeout connections or > failover/restart and loose track of existing connections are great > examples) by the way, metadata could be application name or other extracted information that you then use directly in your templates on the central servers without having to parse it out. David Lang > David Lang > > On Tue, 22 Oct 2013, Boylan, James wrote: > >> The performance problems are the expected ones. Our current >> environment layout has two large log archiving servers using rather >> complex regex for generating the dynafile name. Added to that, we are >> looking at adding output to elasticsearch, but to do so we need to >> use an even more complex set of regex to build the JSON output. So >> you can imagine the negative impact to the environment when handling that >> many messages per second. >> >> I've just finished rewriting the config in the new format and with >> the patch for re_extract that Rainer sent I've implemented all of the >> regex in local variables since there was a lot of duplicate regex >> comparisons happening. That's helped a lot. >> >> I'm looking at breaking it out so the relay layer only acts as >> traffic manager and rebuilding the Rsyslog message to force a FQDN >> into the server name field. Then it will be passed to the specific >> server pools for handling archiving, elasticsearch and hadoop. >> >> All of this is done with TCP as we needed to make sure that packets >> reached the destination as reliably as possible without adding too much >> overhead. >> (I'm still considering relp on the transmission portion between the >> relay layer and the back end services, but I haven't made a decision >> yet.) >> >> -- James >> >> ----- Reply message ----- >> From: "David Lang" <[email protected]> >> To: "rsyslog-users" <[email protected]> >> Subject: [rsyslog] Large Scale Rsyslog deployment >> Date: Tue, Oct 22, 2013 5:03 am >> >> >> >> On Tue, 22 Oct 2013, Boylan, James wrote: >> >>> I know there are several individuals on the list that manage a large >>> scale Rsyslog environment handle 70k to 100k+ messages per second. >>> >>> I was wondering if they could share roughly the number of Rsyslog >>> instances running on their relay layer. I'm hoping to get >>> confirmation on the numbers I'm looking at. Commenting on if they >>> are using UDP versus TCP would be helpful as well. >> >> I am vastly overprovisioned on the relay layer as I put in relays per >> environment. I use UDP from the application servers to the relay and >> then UDP if it's relaying on a local subnet (multi-homed relay >> systems in one >> datacenter) or >> RELP if it's relaying to a remote network (especially if it goes over >> a >> WAN) >> >> in my older, all-local setup I put 6 pairs of relays in one >> datacenter, in the newer, larger environment I started with 3 pairs >> of relays per datacenter, but expect that more will be needed >> eventually (I am no longer at the company, so I won't be building out >> that system further). I have all of these relay to one pair of core >> relay boxes that then distribute the logs to the different analysis >> boxes. >> >> >> It depends a lot on what you are doing on the relay boxes. If you are >> just relaying messages without modifying them, you should be able to >> get up to gig-E wire speed on a single box (although we recently >> found a bottleneck that triggered imudp to be able to be >> multithreaded to handle that load) >> >> if you are modifying the messages, doing encryption, etc you may find >> that you run into performance limitations sooner. >> >> If you are running into performance problems, we'd be interested in >> hearing details and trying to address them. >> >> David Lang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >> you DON'T LIKE THAT. >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE >> WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >> you DON'T LIKE THAT. >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE > WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

