On Tue, 22 Mar 2016, singh.janmejay wrote:
On Tue, Mar 22, 2016 at 5:01 AM, David Lang <[email protected]> wrote:
I added this to my configs
# grep -B 1 -A 1 dyn_ /etc/rsyslog.conf
# custom stats to track in rsyslog
dyn_stats(name="msgs_per_host" resettable="on" maxCardinality="3000"
unusedMetricLife="1200")
dyn_stats(name="msgs_per_edge_relay" resettable="on" maxCardinality="3000"
unusedMetricLife="1200")
dyn_stats(name="msgs_per_core_relay" resettable="on" maxCardinality="3000"
unusedMetricLife="1200")
dyn_stats(name="msgs_per_program" resettable="on" maxCardinality="3000"
unusedMetricLife="1200")
dyn_stats(name="msgs_per_tag" resettable="on" maxCardinality="3000"
unusedMetricLife="1200")
--
/var/log/sources-messages;sources
set $.inc = dyn_inc("msgs_per_host", $hostname);
set $.inc = dyn_inc("msgs_per_program", $programname);
# if $!trusted!edge!relay != "" then {
# set $.inc = dyn_inc("msgs_per_edge_relay", $!trusted!edge!relay);
# }
# if $!trusted!core!relay != "" then {
# set $.inc = dyn_inc("msgs_per_edge_relay", $!trusted!core!relay);
# }
--
}
#set $.inc = dyn_inc("msgs_per_tag", $.custommessage);
/var/log/messages-tags;manual
if I uncomment any of the lines that refer to $! or $. variables, rsyslog
basically stops processing messages (a few messages seem to be processed in
spurts, bit a lot of processing never happens)
any chance that this has problems with locking around user defined
variables?
Not sure, I run it in production (66 6-cpu(HT) haswell VMs handling an
aggregate of ~2.5 M-msg / sec (~1.5kB each)) and haven't seen such
jitters, but then I don't use 3 level deep keys either and in my case,
the feature is backported to 8.12 branch), so still uses json-c (not
libfastjson). I do use it with user-defined variables too, just not
nested as deep.
We don't treat variables differently after dereferencing them in the
top-most layer handling rainerscript fn-calls, so $hostname and
$!x!y!z, as far as any rainerscript fn invocation is considered, are
the same (performance-wise) except for the top-level dereference
(outside of fn-impl-body).
Some more info may come handy in reproducing it:
- What is the message throughput (in msg/s) and what is the avg msg size?
with these commented out as above, ~50k messages/min normal, ~200K message/min
if I'm replaying queued logs (box saturated, see the other thread where 8.17
final seems to have slowed down from ~300K messages/min that I was getting with
a 8.16+ git)
- Do you see messages being written to sources-messages file, while
they don't get written to messages-tags file?
Yes, but not all messages. I have an alarm that goes off if sources-messages
isn't written to in two 35 second checks (70 seconds total), that alarm never
fired. But there would be a burst of messages show up in sources-messages and
then 20 or so seconds before another burst would show up.
- Do you see a significant jump in voluntary context switches(say per
10 seconds or even per second) when you uncomment those lines? How big
is the jump?
- How does the observation change if you replace it with this kind of
variable access pattern:
set $.x = $!trusted!edge!relay;
if $.x != "" then {
set $.inc = dyn_inc("msgs_per_edge_relay", $.x);
}
will check.
- Is this stock 8.17 build?
stock config, but against git as of yesterday. I needed to get liblognorm 2.0
and the current libjsonfast
- Is this x86-64 Linux? What distro (and version)?
ubuntu trusty (14.04) 64 bit.
Also, the documentation doesn't say why we are setting $.inc to the result
of the command, what does $.inc contain after the command is run? from the
documentation, it looks like it contains an error code (at least the way
it's used is similar to error code returns in other languages), but I don't
see what the errors are in the doc page.
Yes, it is the error-code. It has value 0 when successful, an non-zero
when it fails to increment (it may fail due to maxCardinality being
hit or due to a contended for lock etc (it doesn't wait for the lock,
it uses try-lock and doesn't increment the counter if it can't get a
shared-lock over the bucket).
While I do have multiple threads running on the main queue, the places I am
calling this are on action queues with only one worker, so the only lock
conflict should be when the data is reported.
These are rsyslog error-codes. It does an error pass-thru.
I'll add this to documentation (didn't know this was missed).
Thanks.
Sorry I didn't get a chance to run this pre-release, this is exactly why I try
to use features like this pre-release to spot the things that the person
developign them misses because they are too close to the problem. :-)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.