Re: [rsyslog] RFC: dynamic-stats support

singh.janmejay Tue, 06 Oct 2015 09:06:08 -0700

Rainer,

I see this as something completely outside the scope of  variables.
Building stats collector over variables is possible, but then we are
then talking about a general purpose language which allows building
such complex things. This increases the scope of Rainerscript and with
larger scope comes complexity. I feel this is in-line with the other
Lua discussion where you emphasized that Rainerscript should not
become a fully-general-purpose language?


Eg. creating an atomic-increment function for variable requires that
we educate users about what can and can't be done if atomic-increment
function is used anywhere on a variable. What relationship they can
expect it to have with other atomic-incrementing variables (which gets
into memory model).



On Tue, Oct 6, 2015 at 8:49 PM, Rainer Gerhards
<[email protected]> wrote:
> I can't fully dig into this, but I think we must *very carefully*
> evaluate the overall design. Some time ago we introduced arrays for
> the limited liblognorm use case, and it hurts us every now and then
> when folks want to use arrays for other use cases. It may probably
> make sense to re-think how the variable engine etc behaves before
> adding more functionality. And make sure that everything works smooth
> in all use cases. While anything else may take care for some use
> cases, I fear we may get too fragmented. At least this is what I
> learned in the past months discussions.
>
> Anyone else?
>
> Rainer
>
> 2015-10-06 17:10 GMT+02:00 singh.janmejay <[email protected]>:
>> It is possible to use global-variables (it'll require some
>> enhancements, table-support etc), but it'll be very inefficient
>> compared to this approach. For instance, choice of data-structure etc
>> allows making the solution a lot more efficient.
>>
>> Here its possible to locklessly increment counters in most cases, so
>> its overhead is a lot lesser than global-variables.
>>
>> Recycle is precisely to allow this lockless mechanism to work. Its
>> basically saying, it'll track metric-names he has seen in last 1 hour.
>> If we kill tracking of it as soon as we don't see an increment
>> (between 2 reporting runs of impstats), it'll lead to unnecessary
>> churn when low-values are common or load is not uniform in time.
>>
>> Implementing it on top of global-variables is not only has very high
>> performance-penalty(it'll be prohibitive for high-throughput
>> scenarios), it also exposes too much complexity to the user (where
>> user has to worry about reset etc).
>>
>> I don't plan to have a scheduler in this implementation.
>> GetAllStatsLines call will purge the tree instead of reset at that
>> interval. Its basically a balance between freeing-up memory occupied
>> by stale-metric-names vs. performance (lockless handling of
>> increment). So it will be governed by impstat schedule. May be I
>> should change name to better name (equivalent of
>> purge_known_keys_after_they_have_been_reported_N_times).
>>
>>
>> On Tue, Oct 6, 2015 at 4:30 PM, David Lang <[email protected]> wrote:
>>> On Tue, 6 Oct 2015, singh.janmejay wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on support for stats with dynamic-name. This comes handy
>>>> in situations where metric-name is dependent upon value of a certain
>>>> attribute of the message.
>>>>
>>>> Say, for a central log-aggregation service, its valuable to know what
>>>> is inbound message-count distribution across application-clusters that
>>>> send logs to it, or for a shared-server, its valuable to know what is
>>>> the log-volume generation across users etc.
>>>>
>>>> Im thinking of using functions-like interface to support this. It may
>>>> look similar to this:
>>>>
>>>> ====================
>>>> dyn_stats("user_msg_count")
>>>>
>>>> ...
>>>>
>>>> ruleset(...) {
>>>> ...
>>>> dyn_inc("user_msg_count", $.user)
>>>> ...
>>>> }
>>>> ====================
>>>>
>>>> dyn_stats signature looks like:
>>>> dyn_stats(<name_space>, <resettable: default=true>, <max_cardinality:
>>>> default=10k>, <recycle_metric_names_after: default=1hr>)
>>>>
>>>> dyn_inc signature looks like:
>>>> dyn_inc(<name_space>, <metric_name>)
>>>>
>>>>
>>>> Reporting would work similar to static-metric via impstats. Mapping:
>>>> statsobj_s.name = name_space
>>>> statsobj_s.origin = "dyn"
>>>> ctr_s.name = "foo" (say $.user had value foo)
>>>>
>>>>
>>>> Thoughts / suggestions?
>>>
>>>
>>> how is this different/better than global variables? (although we may need to
>>> implement soem functions, atomic inc/dec copy+clear) If you have pstats
>>> output in json format, you can even piggyback on it's schedule to output the
>>> data.
>>>
>>>
>>> things like stats can very easily end up being expensive in terms of locking
>>> (something global variables already have figured out), and it sounds like
>>> you are proposing adding a scheduler of some sort to output the data.
>>>
>>> variables should not need to be 'recycled', either they contain data or they
>>> don't. If they contain data, you need to keep the data until you do
>>> something with it, if they don't, you don't have to track them.
>>>
>>>
>>> I am actually doing this sort of thing external to rsyslog in SEC
>>>
>>> I have a template in rsyslog that contains hostname, fromhost-ip,
>>> programname and I output it via improg to SEC. SEC accumulates counters and
>>> has scheduled outputs to files.
>>>
>>> before I started using SEC for this, I used the same template to output to a
>>> file and then for reports, used cut + sort + uniq -c to extract the data I
>>> need. When the files only contain the significant data, this is actually not
>>> bad to do, even at higher volumes.
>>>
>>> David Lang
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
>>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
>>> LIKE THAT.
>>
>>
>>
>> --
>> Regards,
>> Janmejay
>> http://codehunk.wordpress.com
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
>> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
>> LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.



-- 
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] RFC: dynamic-stats support

Reply via email to