It seems mostly idle(off cpu). Have you seen sched:off-cpu backtraces
and backtrace-wise latency?

On Wed, Mar 23, 2016 at 12:51 AM, singh.janmejay
<[email protected]> wrote:
> You meant the first one was with "commented out" version right? (you
> said uncommenting edge_relay, so double-checking).
>
> On Wed, Mar 23, 2016 at 12:38 AM, David Lang <[email protected]> wrote:
>> here is vmstat 1 running
>>
>> # vmstat 1
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ------cpu-----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
>> wa st
>>  2  0  62280 2363804    584 27280376    0    1    21   596    0    1 46  3
>> 51  0 0
>>  1  0  62280 2349476    584 27280612    0    0     0 87042  979  891  4  0
>> 95  0 0
>>  1  0  62280 2330488    584 27280908    0    0     0     0  388  258  4  0
>> 96  0 0
>>  1  0  62280 2314744    584 27281148    0    0     0     0  396  255  4  0
>> 96  0 0
>>  1  0  62280 2297736    584 27281388    0    0     0     0  376  245  4  0
>> 96  0 0
>>  2  0  62280 2546260    584 27245272    0    0     0     4  639  568  6  1
>> 94  0 0
>>  2  0  62280 2464300    584 27245928    0    0     0  1716  936 1146  8  0
>> 91  0 0
>>  2  0  62280 2394452    584 27246444    0    0     0     8  687  333  8  0
>> 92  0 0
>>
>>
>> here is vmstat 1 uncommenting edge_relay
>>
>> # vmstat 1
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ------cpu-----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
>> wa st
>>  1  0  62280 3198672    584 26770624    0    1    21   596    0    1 46  3
>> 51  0 0
>>  0  0  62280 3198904    584 26770664    0    0     0     0  151  247  0  0
>> 100 0  0
>>  0  0  62280 3200332    584 26770664    0    0     0     0  128  233  0  0
>> 100 0  0
>>  0  0  62280 3202244    584 26770664    0    0     0     4  172  315  0  0
>> 100 0  0
>>  0  0  62280 3203132    584 26770664    0    0     0     0  132  218  0  0
>> 100 0  0
>>  0  0  62280 3203912    584 26770668    0    0     0     8  203  347  0  0
>> 100 0  0
>>
>>
>> things just are not running with it uncommented.
>>
>> I'm going to send you my config directly, it is a very heavy config.
>>
>> David Lang
>>
>>
>>
>> On Wed, 23 Mar 2016, singh.janmejay wrote:
>>
>>> On my cluster, each node (those 66 nodes) have 4 threads for ruleset
>>> (it does everything, lognorm based parsing, lookups using loopup
>>> table, re_extract, string concat, several if-else based conditions and
>>> queues up for egress using omkafka and I generally see 3 being
>>> utilized). This setup runs at ~37k msg/s (2.5M / 66 = 37k).
>>>
>>> 300k/Min is still 5k/s which should be doable on a single-thd (unless
>>> your ruleset is much heavier than mine). Im thinking for a
>>> equivalently complex ruleset 37k / 4  = 9k should be achievable.
>>>
>>> I'll try to reproduce this. Can you share out the exact commit-id you
>>> are working with?
>>>
>>> On Tue, Mar 22, 2016 at 8:38 PM, David Lang <[email protected]> wrote:
>>>>
>>>> On Tue, 22 Mar 2016, singh.janmejay wrote:
>>>>
>>>>> On Tue, Mar 22, 2016 at 5:01 AM, David Lang <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> I added this to my configs
>>>>>>
>>>>>> # grep -B 1 -A 1 dyn_ /etc/rsyslog.conf
>>>>>> # custom stats to track in rsyslog
>>>>>> dyn_stats(name="msgs_per_host" resettable="on" maxCardinality="3000"
>>>>>> unusedMetricLife="1200")
>>>>>> dyn_stats(name="msgs_per_edge_relay" resettable="on"
>>>>>> maxCardinality="3000"
>>>>>> unusedMetricLife="1200")
>>>>>> dyn_stats(name="msgs_per_core_relay" resettable="on"
>>>>>> maxCardinality="3000"
>>>>>> unusedMetricLife="1200")
>>>>>> dyn_stats(name="msgs_per_program" resettable="on" maxCardinality="3000"
>>>>>> unusedMetricLife="1200")
>>>>>> dyn_stats(name="msgs_per_tag" resettable="on" maxCardinality="3000"
>>>>>> unusedMetricLife="1200")
>>>>>>
>>>>>> --
>>>>>>   /var/log/sources-messages;sources
>>>>>>   set $.inc = dyn_inc("msgs_per_host", $hostname);
>>>>>>   set $.inc = dyn_inc("msgs_per_program", $programname);
>>>>>> #  if $!trusted!edge!relay != "" then {
>>>>>> #    set $.inc = dyn_inc("msgs_per_edge_relay", $!trusted!edge!relay);
>>>>>> #  }
>>>>>> #  if $!trusted!core!relay != "" then {
>>>>>> #    set $.inc = dyn_inc("msgs_per_edge_relay", $!trusted!core!relay);
>>>>>> #  }
>>>>>> --
>>>>>>     }
>>>>>>     #set $.inc = dyn_inc("msgs_per_tag", $.custommessage);
>>>>>>     /var/log/messages-tags;manual
>>>>>>
>>>>>> if I uncomment any of the lines that refer to $! or $. variables,
>>>>>> rsyslog
>>>>>> basically stops processing messages (a few messages seem to be
>>>>>> processed
>>>>>> in
>>>>>> spurts, bit a lot of processing never happens)
>>>>>>
>>>>>> any chance that this has problems with locking around user defined
>>>>>> variables?
>>>>>
>>>>>
>>>>>
>>>>> Not sure, I run it in production (66 6-cpu(HT) haswell VMs handling an
>>>>> aggregate of ~2.5 M-msg / sec (~1.5kB each)) and haven't seen such
>>>>> jitters, but then I don't use 3 level deep keys either and in my case,
>>>>> the feature is backported to 8.12 branch), so still uses json-c (not
>>>>> libfastjson). I do use it with user-defined variables too, just not
>>>>> nested as deep.
>>>>>
>>>>> We don't treat variables differently after dereferencing them in the
>>>>> top-most layer handling rainerscript fn-calls, so $hostname and
>>>>> $!x!y!z, as far as any rainerscript fn invocation is considered, are
>>>>> the same (performance-wise) except for the top-level dereference
>>>>> (outside of fn-impl-body).
>>>>>
>>>>> Some more info may come handy in reproducing it:
>>>>> - What is the message throughput (in msg/s) and what is the avg msg
>>>>> size?
>>>>
>>>>
>>>>
>>>> with these commented out as above, ~50k messages/min normal, ~200K
>>>> message/min if I'm replaying queued logs (box saturated, see the other
>>>> thread where 8.17 final seems to have slowed down from ~300K messages/min
>>>> that I was getting with a 8.16+ git)
>>>>
>>>>> - Do you see messages being written to sources-messages file, while
>>>>> they don't get written to messages-tags file?
>>>>
>>>>
>>>>
>>>> Yes, but not all messages. I have an alarm that goes off if
>>>> sources-messages
>>>> isn't written to in two 35 second checks (70 seconds total), that alarm
>>>> never fired. But there would be a burst of messages show up in
>>>> sources-messages and then 20 or so seconds before another burst would
>>>> show
>>>> up.
>>>>
>>>>> - Do you see a significant jump in voluntary context switches(say per
>>>>> 10 seconds or even per second) when you uncomment those lines? How big
>>>>> is the jump?
>>>>> - How does the observation change if you replace it with this kind of
>>>>> variable access pattern:
>>>>>
>>>>> set $.x = $!trusted!edge!relay;
>>>>> if $.x != "" then {
>>>>>  set $.inc = dyn_inc("msgs_per_edge_relay", $.x);
>>>>> }
>>>>
>>>>
>>>>
>>>> will check.
>>>>
>>>>> - Is this stock 8.17 build?
>>>>
>>>>
>>>>
>>>> stock config, but against git as of yesterday. I needed to get liblognorm
>>>> 2.0 and the current libjsonfast
>>>>
>>>>> - Is this x86-64 Linux? What distro (and version)?
>>>>
>>>>
>>>>
>>>> ubuntu trusty (14.04) 64 bit.
>>>>
>>>>>>
>>>>>> Also, the documentation doesn't say why we are setting $.inc to the
>>>>>> result
>>>>>> of the command, what does $.inc contain after the command is run? from
>>>>>> the
>>>>>> documentation, it looks like it contains an error code (at least the
>>>>>> way
>>>>>> it's used is similar to error code returns in other languages), but I
>>>>>> don't
>>>>>> see what the errors are in the doc page.
>>>>>
>>>>>
>>>>>
>>>>> Yes, it is the error-code. It has value 0 when successful, an non-zero
>>>>> when it fails to increment (it may fail due to maxCardinality being
>>>>> hit or due to a contended for lock etc (it doesn't wait for the lock,
>>>>> it uses try-lock and doesn't increment the counter if it can't get a
>>>>> shared-lock over the bucket).
>>>>
>>>>
>>>>
>>>> While I do have multiple threads running on the main queue, the places I
>>>> am
>>>> calling this are on action queues with only one worker, so the only lock
>>>> conflict should be when the data is reported.
>>>>
>>>>> These are rsyslog error-codes. It does an error pass-thru.
>>>>>
>>>>> I'll add this to documentation (didn't know this was missed).
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>> Sorry I didn't get a chance to run this pre-release, this is exactly why
>>>> I
>>>> try to use features like this pre-release to spot the things that the
>>>> person
>>>> developign them misses because they are too close to the problem. :-)
>>>>
>>>>
>>>> David Lang
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of
>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>>>> LIKE THAT.
>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>> LIKE THAT.
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to