It seems mostly idle(off cpu). Have you seen sched:off-cpu backtraces and backtrace-wise latency?
On Wed, Mar 23, 2016 at 12:51 AM, singh.janmejay <[email protected]> wrote: > You meant the first one was with "commented out" version right? (you > said uncommenting edge_relay, so double-checking). > > On Wed, Mar 23, 2016 at 12:38 AM, David Lang <[email protected]> wrote: >> here is vmstat 1 running >> >> # vmstat 1 >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ------cpu----- >> r b swpd free buff cache si so bi bo in cs us sy id >> wa st >> 2 0 62280 2363804 584 27280376 0 1 21 596 0 1 46 3 >> 51 0 0 >> 1 0 62280 2349476 584 27280612 0 0 0 87042 979 891 4 0 >> 95 0 0 >> 1 0 62280 2330488 584 27280908 0 0 0 0 388 258 4 0 >> 96 0 0 >> 1 0 62280 2314744 584 27281148 0 0 0 0 396 255 4 0 >> 96 0 0 >> 1 0 62280 2297736 584 27281388 0 0 0 0 376 245 4 0 >> 96 0 0 >> 2 0 62280 2546260 584 27245272 0 0 0 4 639 568 6 1 >> 94 0 0 >> 2 0 62280 2464300 584 27245928 0 0 0 1716 936 1146 8 0 >> 91 0 0 >> 2 0 62280 2394452 584 27246444 0 0 0 8 687 333 8 0 >> 92 0 0 >> >> >> here is vmstat 1 uncommenting edge_relay >> >> # vmstat 1 >> procs -----------memory---------- ---swap-- -----io---- -system-- >> ------cpu----- >> r b swpd free buff cache si so bi bo in cs us sy id >> wa st >> 1 0 62280 3198672 584 26770624 0 1 21 596 0 1 46 3 >> 51 0 0 >> 0 0 62280 3198904 584 26770664 0 0 0 0 151 247 0 0 >> 100 0 0 >> 0 0 62280 3200332 584 26770664 0 0 0 0 128 233 0 0 >> 100 0 0 >> 0 0 62280 3202244 584 26770664 0 0 0 4 172 315 0 0 >> 100 0 0 >> 0 0 62280 3203132 584 26770664 0 0 0 0 132 218 0 0 >> 100 0 0 >> 0 0 62280 3203912 584 26770668 0 0 0 8 203 347 0 0 >> 100 0 0 >> >> >> things just are not running with it uncommented. >> >> I'm going to send you my config directly, it is a very heavy config. >> >> David Lang >> >> >> >> On Wed, 23 Mar 2016, singh.janmejay wrote: >> >>> On my cluster, each node (those 66 nodes) have 4 threads for ruleset >>> (it does everything, lognorm based parsing, lookups using loopup >>> table, re_extract, string concat, several if-else based conditions and >>> queues up for egress using omkafka and I generally see 3 being >>> utilized). This setup runs at ~37k msg/s (2.5M / 66 = 37k). >>> >>> 300k/Min is still 5k/s which should be doable on a single-thd (unless >>> your ruleset is much heavier than mine). Im thinking for a >>> equivalently complex ruleset 37k / 4 = 9k should be achievable. >>> >>> I'll try to reproduce this. Can you share out the exact commit-id you >>> are working with? >>> >>> On Tue, Mar 22, 2016 at 8:38 PM, David Lang <[email protected]> wrote: >>>> >>>> On Tue, 22 Mar 2016, singh.janmejay wrote: >>>> >>>>> On Tue, Mar 22, 2016 at 5:01 AM, David Lang <[email protected]> wrote: >>>>>> >>>>>> >>>>>> I added this to my configs >>>>>> >>>>>> # grep -B 1 -A 1 dyn_ /etc/rsyslog.conf >>>>>> # custom stats to track in rsyslog >>>>>> dyn_stats(name="msgs_per_host" resettable="on" maxCardinality="3000" >>>>>> unusedMetricLife="1200") >>>>>> dyn_stats(name="msgs_per_edge_relay" resettable="on" >>>>>> maxCardinality="3000" >>>>>> unusedMetricLife="1200") >>>>>> dyn_stats(name="msgs_per_core_relay" resettable="on" >>>>>> maxCardinality="3000" >>>>>> unusedMetricLife="1200") >>>>>> dyn_stats(name="msgs_per_program" resettable="on" maxCardinality="3000" >>>>>> unusedMetricLife="1200") >>>>>> dyn_stats(name="msgs_per_tag" resettable="on" maxCardinality="3000" >>>>>> unusedMetricLife="1200") >>>>>> >>>>>> -- >>>>>> /var/log/sources-messages;sources >>>>>> set $.inc = dyn_inc("msgs_per_host", $hostname); >>>>>> set $.inc = dyn_inc("msgs_per_program", $programname); >>>>>> # if $!trusted!edge!relay != "" then { >>>>>> # set $.inc = dyn_inc("msgs_per_edge_relay", $!trusted!edge!relay); >>>>>> # } >>>>>> # if $!trusted!core!relay != "" then { >>>>>> # set $.inc = dyn_inc("msgs_per_edge_relay", $!trusted!core!relay); >>>>>> # } >>>>>> -- >>>>>> } >>>>>> #set $.inc = dyn_inc("msgs_per_tag", $.custommessage); >>>>>> /var/log/messages-tags;manual >>>>>> >>>>>> if I uncomment any of the lines that refer to $! or $. variables, >>>>>> rsyslog >>>>>> basically stops processing messages (a few messages seem to be >>>>>> processed >>>>>> in >>>>>> spurts, bit a lot of processing never happens) >>>>>> >>>>>> any chance that this has problems with locking around user defined >>>>>> variables? >>>>> >>>>> >>>>> >>>>> Not sure, I run it in production (66 6-cpu(HT) haswell VMs handling an >>>>> aggregate of ~2.5 M-msg / sec (~1.5kB each)) and haven't seen such >>>>> jitters, but then I don't use 3 level deep keys either and in my case, >>>>> the feature is backported to 8.12 branch), so still uses json-c (not >>>>> libfastjson). I do use it with user-defined variables too, just not >>>>> nested as deep. >>>>> >>>>> We don't treat variables differently after dereferencing them in the >>>>> top-most layer handling rainerscript fn-calls, so $hostname and >>>>> $!x!y!z, as far as any rainerscript fn invocation is considered, are >>>>> the same (performance-wise) except for the top-level dereference >>>>> (outside of fn-impl-body). >>>>> >>>>> Some more info may come handy in reproducing it: >>>>> - What is the message throughput (in msg/s) and what is the avg msg >>>>> size? >>>> >>>> >>>> >>>> with these commented out as above, ~50k messages/min normal, ~200K >>>> message/min if I'm replaying queued logs (box saturated, see the other >>>> thread where 8.17 final seems to have slowed down from ~300K messages/min >>>> that I was getting with a 8.16+ git) >>>> >>>>> - Do you see messages being written to sources-messages file, while >>>>> they don't get written to messages-tags file? >>>> >>>> >>>> >>>> Yes, but not all messages. I have an alarm that goes off if >>>> sources-messages >>>> isn't written to in two 35 second checks (70 seconds total), that alarm >>>> never fired. But there would be a burst of messages show up in >>>> sources-messages and then 20 or so seconds before another burst would >>>> show >>>> up. >>>> >>>>> - Do you see a significant jump in voluntary context switches(say per >>>>> 10 seconds or even per second) when you uncomment those lines? How big >>>>> is the jump? >>>>> - How does the observation change if you replace it with this kind of >>>>> variable access pattern: >>>>> >>>>> set $.x = $!trusted!edge!relay; >>>>> if $.x != "" then { >>>>> set $.inc = dyn_inc("msgs_per_edge_relay", $.x); >>>>> } >>>> >>>> >>>> >>>> will check. >>>> >>>>> - Is this stock 8.17 build? >>>> >>>> >>>> >>>> stock config, but against git as of yesterday. I needed to get liblognorm >>>> 2.0 and the current libjsonfast >>>> >>>>> - Is this x86-64 Linux? What distro (and version)? >>>> >>>> >>>> >>>> ubuntu trusty (14.04) 64 bit. >>>> >>>>>> >>>>>> Also, the documentation doesn't say why we are setting $.inc to the >>>>>> result >>>>>> of the command, what does $.inc contain after the command is run? from >>>>>> the >>>>>> documentation, it looks like it contains an error code (at least the >>>>>> way >>>>>> it's used is similar to error code returns in other languages), but I >>>>>> don't >>>>>> see what the errors are in the doc page. >>>>> >>>>> >>>>> >>>>> Yes, it is the error-code. It has value 0 when successful, an non-zero >>>>> when it fails to increment (it may fail due to maxCardinality being >>>>> hit or due to a contended for lock etc (it doesn't wait for the lock, >>>>> it uses try-lock and doesn't increment the counter if it can't get a >>>>> shared-lock over the bucket). >>>> >>>> >>>> >>>> While I do have multiple threads running on the main queue, the places I >>>> am >>>> calling this are on action queues with only one worker, so the only lock >>>> conflict should be when the data is reported. >>>> >>>>> These are rsyslog error-codes. It does an error pass-thru. >>>>> >>>>> I'll add this to documentation (didn't know this was missed). >>>> >>>> >>>> >>>> Thanks. >>>> >>>> Sorry I didn't get a chance to run this pre-release, this is exactly why >>>> I >>>> try to use features like this pre-release to spot the things that the >>>> person >>>> developign them misses because they are too close to the problem. :-) >>>> >>>> >>>> David Lang >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com/professional-services/ >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>>> of >>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >>>> LIKE THAT. >>> >>> >>> >>> >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of >> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >> LIKE THAT. > > > > -- > Regards, > Janmejay > http://codehunk.wordpress.com -- Regards, Janmejay http://codehunk.wordpress.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

