Greetings, >> hmm, this could be locking overhead as well. One thing that you did >> early >> in v5 (I don't think it made it into v4) was to allow the UDP receiver >> to >> insert multiple messages into the queue at once. That made a huge >> difference. > > No, I think that was something I did to both versions. At some time, I did > optimizations to both v4 and v5, things like reducing copies, reducing malloc > calls and so on. I am pretty sure submission batching was among them.
I agree with David actually. While multiple tcp threads on the input side certainly would be helpful, I believe the locking overhead is likely the real culprit behind the inability to fully utilize a multi-core machine with a single instance of rsyslog. In my experience, while the input thread was certainly relatively busy, the thread itself wasn't hitting a cpu bottleneck. Reducing some of the latencies around queuing and context switching is probably the best place to spend time if the goal is improved performance. The earlier investigations into lockless queues combined with some batching may help to address these. As it stands, I don't regularly see specific threads hitting cpu bottlenecks (assuming top -H is accurate). Also, if that is the problem (queues and context switching), adding further division of work into imtcp may actually make the problem worse. That said, I'm not against reducing possible bottlenecks to get into the 1-10 gig input levels (at which this would probably become an issue) - but I think the queues should be more closely examined first. -Aaron _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

