Re: [rsyslog] RFC: dynamic-stats support

singh.janmejay Tue, 06 Oct 2015 09:38:19 -0700

Sure, sounds good.

The ability to gather stats with dynamic-key is important. I am
willing to even help with a rewrite if at some point we feel its best
implemented in a different way in the light of variable-implementation
changes.


On Tue, Oct 6, 2015 at 9:48 PM, Rainer Gerhards
<[email protected]> wrote:
> 2015-10-06 18:04 GMT+02:00 singh.janmejay <[email protected]>:
>> Rainer,
>>
>> I see this as something completely outside the scope of  variables.
>> Building stats collector over variables is possible, but then we are
>> then talking about a general purpose language which allows building
>> such complex things. This increases the scope of Rainerscript and with
>> larger scope comes complexity. I feel this is in-line with the other
>> Lua discussion where you emphasized that Rainerscript should not
>> become a fully-general-purpose language?
>>
>> Eg. creating an atomic-increment function for variable requires that
>> we educate users about what can and can't be done if atomic-increment
>> function is used anywhere on a variable. What relationship they can
>> expect it to have with other atomic-incrementing variables (which gets
>> into memory model).
>
> Maybe I just feel overwhelemed in the moment with keeping track of
> everything that is going on. How about this: we can merge it BUT flag
> it as experimental. If all works out well, I am free starting early
> next year to have a deep look at the overall design and sticking
> together all those loose edges. I suspect that I would like to change
> a couple of things in the interest of tying it all well together (like
> I currently do in liblognorm).
>
> But if I need to carry all this legacy, that's really a burden (e.g.
> liblognorm now contains the full v1 code as a copy, which means it
> also needs to be somewhat maintained). I want to avoid this. As long
> as we document this as an *interim* solution that is not necessarily
> here to stay and as such "use at your own risk and it will probably
> break next year" I am sufficiently happy with that. We just must be
> aware that things may really break and there is a big chance the
> actually will. And I don't want to hear about potential vuln or
> compatiblity issues or whatever when this code is changed/removed.
> Also keep on your mind that I probably need to totally revamp the
> variable system, as json-c has many problematic parts for our use
> (what I learned when digging deep with liblognorm). So I *know* that
> there are big changes coming up next year!
>
> And, full ack: I want to limit the scope of RainerScript. Arrays was a
> good sample of why it may be a bad idea to go to boldly forward
> without thinking about the big picture.
>
> Rainer
>>
>>
>>
>> On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
>> <[email protected]> wrote:
>>> I can't fully dig into this, but I think we must *very carefully*
>>> evaluate the overall design. Some time ago we introduced arrays for
>>> the limited liblognorm use case, and it hurts us every now and then
>>> when folks want to use arrays for other use cases. It may probably
>>> make sense to re-think how the variable engine etc behaves before
>>> adding more functionality. And make sure that everything works smooth
>>> in all use cases. While anything else may take care for some use
>>> cases, I fear we may get too fragmented. At least this is what I
>>> learned in the past months discussions.
>>>
>>> Anyone else?
>>>
>>> Rainer
>>>
>>> 2015-10-06 17:10 GMT+02:00 singh.janmejay <[email protected]>:
>>>> It is possible to use global-variables (it'll require some
>>>> enhancements, table-support etc), but it'll be very inefficient
>>>> compared to this approach. For instance, choice of data-structure etc
>>>> allows making the solution a lot more efficient.
>>>>
>>>> Here its possible to locklessly increment counters in most cases, so
>>>> its overhead is a lot lesser than global-variables.
>>>>
>>>> Recycle is precisely to allow this lockless mechanism to work. Its
>>>> basically saying, it'll track metric-names he has seen in last 1 hour.
>>>> If we kill tracking of it as soon as we don't see an increment
>>>> (between 2 reporting runs of impstats), it'll lead to unnecessary
>>>> churn when low-values are common or load is not uniform in time.
>>>>
>>>> Implementing it on top of global-variables is not only has very high
>>>> performance-penalty(it'll be prohibitive for high-throughput
>>>> scenarios), it also exposes too much complexity to the user (where
>>>> user has to worry about reset etc).
>>>>
>>>> I don't plan to have a scheduler in this implementation.
>>>> GetAllStatsLines call will purge the tree instead of reset at that
>>>> interval. Its basically a balance between freeing-up memory occupied
>>>> by stale-metric-names vs. performance (lockless handling of
>>>> increment). So it will be governed by impstat schedule. May be I
>>>> should change name to better name (equivalent of
>>>> purge_known_keys_after_they_have_been_reported_N_times).
>>>>
>>>>
>>>> On Tue, Oct 6, 2015 at 4:30 PM, David Lang <[email protected]> wrote:
>>>>> On Tue, 6 Oct 2015, singh.janmejay wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am working on support for stats with dynamic-name. This comes handy
>>>>>> in situations where metric-name is dependent upon value of a certain
>>>>>> attribute of the message.
>>>>>>
>>>>>> Say, for a central log-aggregation service, its valuable to know what
>>>>>> is inbound message-count distribution across application-clusters that
>>>>>> send logs to it, or for a shared-server, its valuable to know what is
>>>>>> the log-volume generation across users etc.
>>>>>>
>>>>>> Im thinking of using functions-like interface to support this. It may
>>>>>> look similar to this:
>>>>>>
>>>>>> ====================
>>>>>> dyn_stats("user_msg_count")
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> ruleset(...) {
>>>>>> ...
>>>>>> dyn_inc("user_msg_count", $.user)
>>>>>> ...
>>>>>> }
>>>>>> ====================
>>>>>>
>>>>>> dyn_stats signature looks like:
>>>>>> dyn_stats(<name_space>, <resettable: default=true>, <max_cardinality:
>>>>>> default=10k>, <recycle_metric_names_after: default=1hr>)
>>>>>>
>>>>>> dyn_inc signature looks like:
>>>>>> dyn_inc(<name_space>, <metric_name>)
>>>>>>
>>>>>>
>>>>>> Reporting would work similar to static-metric via impstats. Mapping:
>>>>>> statsobj_s.name = name_space
>>>>>> statsobj_s.origin = "dyn"
>>>>>> ctr_s.name = "foo" (say $.user had value foo)
>>>>>>
>>>>>>
>>>>>> Thoughts / suggestions?
>>>>>
>>>>>
>>>>> how is this different/better than global variables? (although we may need 
>>>>> to
>>>>> implement soem functions, atomic inc/dec copy+clear) If you have pstats
>>>>> output in json format, you can even piggyback on it's schedule to output 
>>>>> the
>>>>> data.
>>>>>
>>>>>
>>>>> things like stats can very easily end up being expensive in terms of 
>>>>> locking
>>>>> (something global variables already have figured out), and it sounds like
>>>>> you are proposing adding a scheduler of some sort to output the data.
>>>>>
>>>>> variables should not need to be 'recycled', either they contain data or 
>>>>> they
>>>>> don't. If they contain data, you need to keep the data until you do
>>>>> something with it, if they don't, you don't have to track them.
>>>>>
>>>>>
>>>>> I am actually doing this sort of thing external to rsyslog in SEC
>>>>>
>>>>> I have a template in rsyslog that contains hostname, fromhost-ip,
>>>>> programname and I output it via improg to SEC. SEC accumulates counters 
>>>>> and
>>>>> has scheduled outputs to files.
>>>>>
>>>>> before I started using SEC for this, I used the same template to output 
>>>>> to a
>>>>> file and then for reports, used cut + sort + uniq -c to extract the data I
>>>>> need. When the files only contain the significant data, this is actually 
>>>>> not
>>>>> bad to do, even at higher volumes.
>>>>>
>>>>> David Lang
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad 
>>>>> of
>>>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>>>>> LIKE THAT.
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Janmejay
>>>> http://codehunk.wordpress.com
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad 
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you 
>>>> DON'T LIKE THAT.
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
>>> LIKE THAT.
>>
>>
>>
>> --
>> Regards,
>> Janmejay
>> http://codehunk.wordpress.com
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
>> LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] RFC: dynamic-stats support

Reply via email to