It is possible to use global-variables (it'll require some
enhancements, table-support etc), but it'll be very inefficient
compared to this approach. For instance, choice of data-structure etc
allows making the solution a lot more efficient.

Here its possible to locklessly increment counters in most cases, so
its overhead is a lot lesser than global-variables.

Recycle is precisely to allow this lockless mechanism to work. Its
basically saying, it'll track metric-names he has seen in last 1 hour.
If we kill tracking of it as soon as we don't see an increment
(between 2 reporting runs of impstats), it'll lead to unnecessary
churn when low-values are common or load is not uniform in time.

Implementing it on top of global-variables is not only has very high
performance-penalty(it'll be prohibitive for high-throughput
scenarios), it also exposes too much complexity to the user (where
user has to worry about reset etc).

I don't plan to have a scheduler in this implementation.
GetAllStatsLines call will purge the tree instead of reset at that
interval. Its basically a balance between freeing-up memory occupied
by stale-metric-names vs. performance (lockless handling of
increment). So it will be governed by impstat schedule. May be I
should change name to better name (equivalent of
purge_known_keys_after_they_have_been_reported_N_times).


On Tue, Oct 6, 2015 at 4:30 PM, David Lang <[email protected]> wrote:
> On Tue, 6 Oct 2015, singh.janmejay wrote:
>
>> Hi,
>>
>> I am working on support for stats with dynamic-name. This comes handy
>> in situations where metric-name is dependent upon value of a certain
>> attribute of the message.
>>
>> Say, for a central log-aggregation service, its valuable to know what
>> is inbound message-count distribution across application-clusters that
>> send logs to it, or for a shared-server, its valuable to know what is
>> the log-volume generation across users etc.
>>
>> Im thinking of using functions-like interface to support this. It may
>> look similar to this:
>>
>> ====================
>> dyn_stats("user_msg_count")
>>
>> ...
>>
>> ruleset(...) {
>> ...
>> dyn_inc("user_msg_count", $.user)
>> ...
>> }
>> ====================
>>
>> dyn_stats signature looks like:
>> dyn_stats(<name_space>, <resettable: default=true>, <max_cardinality:
>> default=10k>, <recycle_metric_names_after: default=1hr>)
>>
>> dyn_inc signature looks like:
>> dyn_inc(<name_space>, <metric_name>)
>>
>>
>> Reporting would work similar to static-metric via impstats. Mapping:
>> statsobj_s.name = name_space
>> statsobj_s.origin = "dyn"
>> ctr_s.name = "foo" (say $.user had value foo)
>>
>>
>> Thoughts / suggestions?
>
>
> how is this different/better than global variables? (although we may need to
> implement soem functions, atomic inc/dec copy+clear) If you have pstats
> output in json format, you can even piggyback on it's schedule to output the
> data.
>
>
> things like stats can very easily end up being expensive in terms of locking
> (something global variables already have figured out), and it sounds like
> you are proposing adding a scheduler of some sort to output the data.
>
> variables should not need to be 'recycled', either they contain data or they
> don't. If they contain data, you need to keep the data until you do
> something with it, if they don't, you don't have to track them.
>
>
> I am actually doing this sort of thing external to rsyslog in SEC
>
> I have a template in rsyslog that contains hostname, fromhost-ip,
> programname and I output it via improg to SEC. SEC accumulates counters and
> has scheduled outputs to files.
>
> before I started using SEC for this, I used the same template to output to a
> file and then for reports, used cut + sort + uniq -c to extract the data I
> need. When the files only contain the significant data, this is actually not
> bad to do, even at higher volumes.
>
> David Lang
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to