Re: [rsyslog] RFC: dynamic-stats support

Rainer Gerhards Tue, 06 Oct 2015 09:18:58 -0700

2015-10-06 18:04 GMT+02:00 singh.janmejay <[email protected]>:
> Rainer,
>
> I see this as something completely outside the scope of  variables.
> Building stats collector over variables is possible, but then we are
> then talking about a general purpose language which allows building
> such complex things. This increases the scope of Rainerscript and with
> larger scope comes complexity. I feel this is in-line with the other
> Lua discussion where you emphasized that Rainerscript should not
> become a fully-general-purpose language?
>
> Eg. creating an atomic-increment function for variable requires that
> we educate users about what can and can't be done if atomic-increment
> function is used anywhere on a variable. What relationship they can
> expect it to have with other atomic-incrementing variables (which gets
> into memory model).


Maybe I just feel overwhelemed in the moment with keeping track of
everything that is going on. How about this: we can merge it BUT flag
it as experimental. If all works out well, I am free starting early
next year to have a deep look at the overall design and sticking
together all those loose edges. I suspect that I would like to change
a couple of things in the interest of tying it all well together (like
I currently do in liblognorm).

But if I need to carry all this legacy, that's really a burden (e.g.
liblognorm now contains the full v1 code as a copy, which means it
also needs to be somewhat maintained). I want to avoid this. As long
as we document this as an *interim* solution that is not necessarily
here to stay and as such "use at your own risk and it will probably
break next year" I am sufficiently happy with that. We just must be
aware that things may really break and there is a big chance the
actually will. And I don't want to hear about potential vuln or
compatiblity issues or whatever when this code is changed/removed.
Also keep on your mind that I probably need to totally revamp the
variable system, as json-c has many problematic parts for our use
(what I learned when digging deep with liblognorm). So I *know* that
there are big changes coming up next year!

And, full ack: I want to limit the scope of RainerScript. Arrays was a
good sample of why it may be a bad idea to go to boldly forward
without thinking about the big picture.

Rainer
>
>
>
> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
> <[email protected]> wrote:
>> I can't fully dig into this, but I think we must *very carefully*
>> evaluate the overall design. Some time ago we introduced arrays for
>> the limited liblognorm use case, and it hurts us every now and then
>> when folks want to use arrays for other use cases. It may probably
>> make sense to re-think how the variable engine etc behaves before
>> adding more functionality. And make sure that everything works smooth
>> in all use cases. While anything else may take care for some use
>> cases, I fear we may get too fragmented. At least this is what I
>> learned in the past months discussions.
>>
>> Anyone else?
>>
>> Rainer
>>
>> 2015-10-06 17:10 GMT+02:00 singh.janmejay <[email protected]>:
>>> It is possible to use global-variables (it'll require some
>>> enhancements, table-support etc), but it'll be very inefficient
>>> compared to this approach. For instance, choice of data-structure etc
>>> allows making the solution a lot more efficient.
>>>
>>> Here its possible to locklessly increment counters in most cases, so
>>> its overhead is a lot lesser than global-variables.
>>>
>>> Recycle is precisely to allow this lockless mechanism to work. Its
>>> basically saying, it'll track metric-names he has seen in last 1 hour.
>>> If we kill tracking of it as soon as we don't see an increment
>>> (between 2 reporting runs of impstats), it'll lead to unnecessary
>>> churn when low-values are common or load is not uniform in time.
>>>
>>> Implementing it on top of global-variables is not only has very high
>>> performance-penalty(it'll be prohibitive for high-throughput
>>> scenarios), it also exposes too much complexity to the user (where
>>> user has to worry about reset etc).
>>>
>>> I don't plan to have a scheduler in this implementation.
>>> GetAllStatsLines call will purge the tree instead of reset at that
>>> interval. Its basically a balance between freeing-up memory occupied
>>> by stale-metric-names vs. performance (lockless handling of
>>> increment). So it will be governed by impstat schedule. May be I
>>> should change name to better name (equivalent of
>>> purge_known_keys_after_they_have_been_reported_N_times).
>>>
>>>
>>> On Tue, Oct 6, 2015 at 4:30 PM, David Lang <[email protected]> wrote:
>>>> On Tue, 6 Oct 2015, singh.janmejay wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am working on support for stats with dynamic-name. This comes handy
>>>>> in situations where metric-name is dependent upon value of a certain
>>>>> attribute of the message.
>>>>>
>>>>> Say, for a central log-aggregation service, its valuable to know what
>>>>> is inbound message-count distribution across application-clusters that
>>>>> send logs to it, or for a shared-server, its valuable to know what is
>>>>> the log-volume generation across users etc.
>>>>>
>>>>> Im thinking of using functions-like interface to support this. It may
>>>>> look similar to this:
>>>>>
>>>>> ====================
>>>>> dyn_stats("user_msg_count")
>>>>>
>>>>> ...
>>>>>
>>>>> ruleset(...) {
>>>>> ...
>>>>> dyn_inc("user_msg_count", $.user)
>>>>> ...
>>>>> }
>>>>> ====================
>>>>>
>>>>> dyn_stats signature looks like:
>>>>> dyn_stats(<name_space>, <resettable: default=true>, <max_cardinality:
>>>>> default=10k>, <recycle_metric_names_after: default=1hr>)
>>>>>
>>>>> dyn_inc signature looks like:
>>>>> dyn_inc(<name_space>, <metric_name>)
>>>>>
>>>>>
>>>>> Reporting would work similar to static-metric via impstats. Mapping:
>>>>> statsobj_s.name = name_space
>>>>> statsobj_s.origin = "dyn"
>>>>> ctr_s.name = "foo" (say $.user had value foo)
>>>>>
>>>>>
>>>>> Thoughts / suggestions?
>>>>
>>>>
>>>> how is this different/better than global variables? (although we may need 
>>>> to
>>>> implement soem functions, atomic inc/dec copy+clear) If you have pstats
>>>> output in json format, you can even piggyback on it's schedule to output 
>>>> the
>>>> data.
>>>>
>>>>
>>>> things like stats can very easily end up being expensive in terms of 
>>>> locking
>>>> (something global variables already have figured out), and it sounds like
>>>> you are proposing adding a scheduler of some sort to output the data.
>>>>
>>>> variables should not need to be 'recycled', either they contain data or 
>>>> they
>>>> don't. If they contain data, you need to keep the data until you do
>>>> something with it, if they don't, you don't have to track them.
>>>>
>>>>
>>>> I am actually doing this sort of thing external to rsyslog in SEC
>>>>
>>>> I have a template in rsyslog that contains hostname, fromhost-ip,
>>>> programname and I output it via improg to SEC. SEC accumulates counters and
>>>> has scheduled outputs to files.
>>>>
>>>> before I started using SEC for this, I used the same template to output to 
>>>> a
>>>> file and then for reports, used cut + sort + uniq -c to extract the data I
>>>> need. When the files only contain the significant data, this is actually 
>>>> not
>>>> bad to do, even at higher volumes.
>>>>
>>>> David Lang
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>>>> LIKE THAT.
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Janmejay
>>> http://codehunk.wordpress.com
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
>>> LIKE THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
>> LIKE THAT.
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] RFC: dynamic-stats support

Reply via email to