> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Monday, June 07, 2010 4:35 PM > To: rsyslog-users > Subject: Re: [rsyslog] discussion request: performance enhancement for > imtcp > > On Mon, 7 Jun 2010, Rainer Gerhards wrote: > > > I have now shifted my focus to enhancing the multi-core utilization > with > > imtcp. So far, we have a single epoll-loop (or select, if epoll is > not > > supported), which obviously limits concurrency in some environments. > I intend > > to remove that limit, or at least move the actual value much further. > > > > There are a couple of things to think about. There is one relatively > simple > > approach, but if it works is pretty much depending on how things are > deployed > > in practice. > > > > So I need your help. I would appreciate if you could read > > > > http://blog.gerhards.net/2010/06/further-improving-tcp-input- > performance.html > > > > and share your comments. I have created a blogpost because that makes > it > > easier for me to keep the text as reference. While the post is not > exactly > > short, it also is not exhaustive, so please don't feel discouraged > from > > reading it just because my thoughts are on a web site rather than > included > > inline. > > for some reason I can't comment on the blogpost, so I'll do it here.
sorry, I forgot to mention comments should go to the mailing list ;) Some Chineese spammer recently rendered the blogger comment function useless, especially as blogger is dumb enough to force you to delete each (of the vast number) of spam comments manually. I did one round of deletion, but gave up after an hour or so... > I'm surprised to see this as a problem (especially as my experiance has > been that the bottlenecks are on the output side, not the input side) > > the data is serialized as it arrives over the wire (at least if you > have a > single ethernet port in use), and with epoll I would expect a single > thread to have no problem pulling the data from the network stack and > putting it somewhere. > At least this is a problem I got from some high performance sites. They had in common that the actual rule processing was very, very easy, like a *.* filter and just write to file actions. These are *extremely fast* (if you disagree, please do so on list, I would be very interested in that). BUT I need to mention that this was in v4, unfortunately not in v5. That meant that the event handling was done by select() and with select's bad performance for larger connection sets, that may be the culprit. HOWEVER, I got from the reports that the CPUs were NOT saturated (and the message rate lower) when listeners were run inside a single instance, but CPUs got saturated (and the message rate higher) when a couple of rsyslog instances ran. The only explanation I have for this is that the single instance actually did not manage to pull the data from the operating system buffers. I also failed to ask if the machines had multiple NICs, what would some more explain the effect seen. I myself unfortunately seem to have an insufficient lab environment to see this effect, that makes it a bit hard for me to judge. > I think that more research needs to be done on what is eating up the > time > in your test cases. > > If it's DNS lookups, they can be disabled (and/or a > name cache can be created as we have discussed before) That was for the cases I have seen hardly an issue -- many message were sent over each connection and for tcp the DNS lookup is only done during connection setup. Still it is a good reminder to finish that part of the code (full dns cache). > It may be that the parsing that's being done is what's taking the time > here, so I would consider soemthing like the following > > one thread to pull the data from the wire and dispatch it to N worker > threads that would parse the message and put the result into the main > queue. a) for v5 and some of v4, parsing is done no longer on the input side (and thus runs via a worker pool, main queue worker pool to be precise) b) this architecture requires more context switches, something I would really like to avoid. I guess it would even lead to far worse performance in the single listener case. > > > even late last year with UDP messages I was able to saturate a Gig-E > network with packets and receive them with <25% of a single cpu. I > would > not expect that TCP would have noticably more overhead. I fully agree, definitely far less (just think that I do need to do one API call for each message with UDP, while a can receive a hundreds of messages with a single API call in the case of TCP -- depending on receive buffer and message size). Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

