Yep, it looked like that. Its interesting though, its 100% off CPU (very unique).
On Wed, Mar 23, 2016 at 1:02 AM, David Lang <[email protected]> wrote: > Yes, the first one is with the user variables commented out, exactly what I > pasted in from the grep. > > the second is enabling the dyn_inc for edge relays. > > David Lang > > On Wed, 23 Mar 2016, singh.janmejay wrote: > >> Date: Wed, 23 Mar 2016 00:51:18 +0530 >> From: singh.janmejay <[email protected]> >> Reply-To: rsyslog-users <[email protected]> >> To: rsyslog-users <[email protected]> >> Subject: Re: [rsyslog] dynastats problems with user defined variables? >> >> You meant the first one was with "commented out" version right? (you >> said uncommenting edge_relay, so double-checking). >> >> On Wed, Mar 23, 2016 at 12:38 AM, David Lang <[email protected]> wrote: >>> >>> here is vmstat 1 running >>> >>> # vmstat 1 >>> procs -----------memory---------- ---swap-- -----io---- -system-- >>> ------cpu----- >>> r b swpd free buff cache si so bi bo in cs us sy >>> id >>> wa st >>> 2 0 62280 2363804 584 27280376 0 1 21 596 0 1 46 >>> 3 >>> 51 0 0 >>> 1 0 62280 2349476 584 27280612 0 0 0 87042 979 891 4 >>> 0 >>> 95 0 0 >>> 1 0 62280 2330488 584 27280908 0 0 0 0 388 258 4 >>> 0 >>> 96 0 0 >>> 1 0 62280 2314744 584 27281148 0 0 0 0 396 255 4 >>> 0 >>> 96 0 0 >>> 1 0 62280 2297736 584 27281388 0 0 0 0 376 245 4 >>> 0 >>> 96 0 0 >>> 2 0 62280 2546260 584 27245272 0 0 0 4 639 568 6 >>> 1 >>> 94 0 0 >>> 2 0 62280 2464300 584 27245928 0 0 0 1716 936 1146 8 >>> 0 >>> 91 0 0 >>> 2 0 62280 2394452 584 27246444 0 0 0 8 687 333 8 >>> 0 >>> 92 0 0 >>> >>> >>> here is vmstat 1 uncommenting edge_relay >>> >>> # vmstat 1 >>> procs -----------memory---------- ---swap-- -----io---- -system-- >>> ------cpu----- >>> r b swpd free buff cache si so bi bo in cs us sy >>> id >>> wa st >>> 1 0 62280 3198672 584 26770624 0 1 21 596 0 1 46 >>> 3 >>> 51 0 0 >>> 0 0 62280 3198904 584 26770664 0 0 0 0 151 247 0 >>> 0 >>> 100 0 0 >>> 0 0 62280 3200332 584 26770664 0 0 0 0 128 233 0 >>> 0 >>> 100 0 0 >>> 0 0 62280 3202244 584 26770664 0 0 0 4 172 315 0 >>> 0 >>> 100 0 0 >>> 0 0 62280 3203132 584 26770664 0 0 0 0 132 218 0 >>> 0 >>> 100 0 0 >>> 0 0 62280 3203912 584 26770668 0 0 0 8 203 347 0 >>> 0 >>> 100 0 0 >>> >>> >>> things just are not running with it uncommented. >>> >>> I'm going to send you my config directly, it is a very heavy config. >>> >>> David Lang >>> >>> >>> >>> On Wed, 23 Mar 2016, singh.janmejay wrote: >>> >>>> On my cluster, each node (those 66 nodes) have 4 threads for ruleset >>>> (it does everything, lognorm based parsing, lookups using loopup >>>> table, re_extract, string concat, several if-else based conditions and >>>> queues up for egress using omkafka and I generally see 3 being >>>> utilized). This setup runs at ~37k msg/s (2.5M / 66 = 37k). >>>> >>>> 300k/Min is still 5k/s which should be doable on a single-thd (unless >>>> your ruleset is much heavier than mine). Im thinking for a >>>> equivalently complex ruleset 37k / 4 = 9k should be achievable. >>>> >>>> I'll try to reproduce this. Can you share out the exact commit-id you >>>> are working with? >>>> >>>> On Tue, Mar 22, 2016 at 8:38 PM, David Lang <[email protected]> wrote: >>>>> >>>>> >>>>> On Tue, 22 Mar 2016, singh.janmejay wrote: >>>>> >>>>>> On Tue, Mar 22, 2016 at 5:01 AM, David Lang <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> I added this to my configs >>>>>>> >>>>>>> # grep -B 1 -A 1 dyn_ /etc/rsyslog.conf >>>>>>> # custom stats to track in rsyslog >>>>>>> dyn_stats(name="msgs_per_host" resettable="on" maxCardinality="3000" >>>>>>> unusedMetricLife="1200") >>>>>>> dyn_stats(name="msgs_per_edge_relay" resettable="on" >>>>>>> maxCardinality="3000" >>>>>>> unusedMetricLife="1200") >>>>>>> dyn_stats(name="msgs_per_core_relay" resettable="on" >>>>>>> maxCardinality="3000" >>>>>>> unusedMetricLife="1200") >>>>>>> dyn_stats(name="msgs_per_program" resettable="on" >>>>>>> maxCardinality="3000" >>>>>>> unusedMetricLife="1200") >>>>>>> dyn_stats(name="msgs_per_tag" resettable="on" maxCardinality="3000" >>>>>>> unusedMetricLife="1200") >>>>>>> >>>>>>> -- >>>>>>> /var/log/sources-messages;sources >>>>>>> set $.inc = dyn_inc("msgs_per_host", $hostname); >>>>>>> set $.inc = dyn_inc("msgs_per_program", $programname); >>>>>>> # if $!trusted!edge!relay != "" then { >>>>>>> # set $.inc = dyn_inc("msgs_per_edge_relay", >>>>>>> $!trusted!edge!relay); >>>>>>> # } >>>>>>> # if $!trusted!core!relay != "" then { >>>>>>> # set $.inc = dyn_inc("msgs_per_edge_relay", >>>>>>> $!trusted!core!relay); >>>>>>> # } >>>>>>> -- >>>>>>> } >>>>>>> #set $.inc = dyn_inc("msgs_per_tag", $.custommessage); >>>>>>> /var/log/messages-tags;manual >>>>>>> >>>>>>> if I uncomment any of the lines that refer to $! or $. variables, >>>>>>> rsyslog >>>>>>> basically stops processing messages (a few messages seem to be >>>>>>> processed >>>>>>> in >>>>>>> spurts, bit a lot of processing never happens) >>>>>>> >>>>>>> any chance that this has problems with locking around user defined >>>>>>> variables? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Not sure, I run it in production (66 6-cpu(HT) haswell VMs handling an >>>>>> aggregate of ~2.5 M-msg / sec (~1.5kB each)) and haven't seen such >>>>>> jitters, but then I don't use 3 level deep keys either and in my case, >>>>>> the feature is backported to 8.12 branch), so still uses json-c (not >>>>>> libfastjson). I do use it with user-defined variables too, just not >>>>>> nested as deep. >>>>>> >>>>>> We don't treat variables differently after dereferencing them in the >>>>>> top-most layer handling rainerscript fn-calls, so $hostname and >>>>>> $!x!y!z, as far as any rainerscript fn invocation is considered, are >>>>>> the same (performance-wise) except for the top-level dereference >>>>>> (outside of fn-impl-body). >>>>>> >>>>>> Some more info may come handy in reproducing it: >>>>>> - What is the message throughput (in msg/s) and what is the avg msg >>>>>> size? >>>>> >>>>> >>>>> >>>>> >>>>> with these commented out as above, ~50k messages/min normal, ~200K >>>>> message/min if I'm replaying queued logs (box saturated, see the other >>>>> thread where 8.17 final seems to have slowed down from ~300K >>>>> messages/min >>>>> that I was getting with a 8.16+ git) >>>>> >>>>>> - Do you see messages being written to sources-messages file, while >>>>>> they don't get written to messages-tags file? >>>>> >>>>> >>>>> >>>>> >>>>> Yes, but not all messages. I have an alarm that goes off if >>>>> sources-messages >>>>> isn't written to in two 35 second checks (70 seconds total), that alarm >>>>> never fired. But there would be a burst of messages show up in >>>>> sources-messages and then 20 or so seconds before another burst would >>>>> show >>>>> up. >>>>> >>>>>> - Do you see a significant jump in voluntary context switches(say per >>>>>> 10 seconds or even per second) when you uncomment those lines? How big >>>>>> is the jump? >>>>>> - How does the observation change if you replace it with this kind of >>>>>> variable access pattern: >>>>>> >>>>>> set $.x = $!trusted!edge!relay; >>>>>> if $.x != "" then { >>>>>> set $.inc = dyn_inc("msgs_per_edge_relay", $.x); >>>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> will check. >>>>> >>>>>> - Is this stock 8.17 build? >>>>> >>>>> >>>>> >>>>> >>>>> stock config, but against git as of yesterday. I needed to get >>>>> liblognorm >>>>> 2.0 and the current libjsonfast >>>>> >>>>>> - Is this x86-64 Linux? What distro (and version)? >>>>> >>>>> >>>>> >>>>> >>>>> ubuntu trusty (14.04) 64 bit. >>>>> >>>>>>> >>>>>>> Also, the documentation doesn't say why we are setting $.inc to the >>>>>>> result >>>>>>> of the command, what does $.inc contain after the command is run? >>>>>>> from >>>>>>> the >>>>>>> documentation, it looks like it contains an error code (at least the >>>>>>> way >>>>>>> it's used is similar to error code returns in other languages), but I >>>>>>> don't >>>>>>> see what the errors are in the doc page. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Yes, it is the error-code. It has value 0 when successful, an non-zero >>>>>> when it fails to increment (it may fail due to maxCardinality being >>>>>> hit or due to a contended for lock etc (it doesn't wait for the lock, >>>>>> it uses try-lock and doesn't increment the counter if it can't get a >>>>>> shared-lock over the bucket). >>>>> >>>>> >>>>> >>>>> >>>>> While I do have multiple threads running on the main queue, the places >>>>> I >>>>> am >>>>> calling this are on action queues with only one worker, so the only >>>>> lock >>>>> conflict should be when the data is reported. >>>>> >>>>>> These are rsyslog error-codes. It does an error pass-thru. >>>>>> >>>>>> I'll add this to documentation (didn't know this was missed). >>>>> >>>>> >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> Sorry I didn't get a chance to run this pre-release, this is exactly >>>>> why >>>>> I >>>>> try to use features like this pre-release to spot the things that the >>>>> person >>>>> developign them misses because they are too close to the problem. :-) >>>>> >>>>> >>>>> David Lang >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com/professional-services/ >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>> myriad >>>>> of >>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >>>>> DON'T >>>>> LIKE THAT. >>>> >>>> >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com/professional-services/ >>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >>> of >>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T >>> LIKE THAT. >> >> >> >> >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. -- Regards, Janmejay http://codehunk.wordpress.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

