On Fri, 16 Jan 2015, Dave Caplinger wrote:

On Jan 16, 2015, at 12:25 PM, David Lang <[email protected]> wrote:

On Fri, 16 Jan 2015, Dave Caplinger wrote:

...  It would be handy if we could optionally turn off [out-of-order delivery]
for an output queue in order to deliver the queued messages in-order
even if there is an additional disk write penalty to pay (for longer).

One issue is that disk queues are very slow compared to memory queues, so it's
possible that if you force all messages to be written to the queue while you are
also pulling messages from the queue that this will slow you down so much that
you will never catch up. I think there is room for improvement here, but that
would be pretty major surgery.

I understand; I would want to test things to really understand the performance penalty, but there are mitigating factors for some common cases as well. For example: filesystem buffer can help speed reading data previously written to disk if your outage was short enough to not get "too far" behind, because the data is still actually in RAM so you don't actually have to pay physical IOPS to touch the disk to retrieve it.

the filesystem actions are the super expensive parts, even if things are cached to ram. There are also fsyncs that take place to make the data safe, and they force disk IOPS

These factors are why I was thinking maybe the penalty isn't really as large as I initially thought, for some cases at least. However, the fact that you indicate having this option would be "major surgery" to Rsyslog is dissuading me from wanting to bother going down this path.

having an option to change the order probably isn't that bad (Rainer will have to weigh in), but changing the disk queue itself to be more efficient would be pretty large, and it would involve a lot of care to avoid reliability problems.

To clarify, I'm not looking for *guaranteed* delivery order, just "generally in order." We do perform event correlation, but in some cases it's within time windows. So as you described: A followed by B followed by C, all within T time. Having some variation around a moving "now" pointer in time is fine; the events still wind up within the same (T +/- some small variation) -width window. It's when logs arrive *significantly* out of sequence that you wind up having to manage state for multiple T-width windows for the same scenario, and it means you can't really be confident that you're done with a certain time window (you can be perpetually waiting for the last event in the chain).

something to think about here, what do you use as a time reference (both for 'now' and for the log message you are processing), do you use the current time on the system doing the processing, or the timestamps in the messages.

Using the system time can cause some false positive alerts when logs are catching up (as you have events that happened over a wide timeframe delivered over a short timeframe), but you don't have to deal (much) with time going backwards

Using the timestamp in the log message gets interesting as you deal with machines local times drifting, being in different timezones, or just plain being wrong. And as you say, how do you know when an event is really 'too old' and you can stop tracking it. (what if a redundant box goes down over a long weekend, do you really want to keep the correlations open for days in case it has 'interesting' combinations of events that it will finish delivering when it's fixed??)

I tend to favor using the log processing system time. It's much easier to watch that box and make sure it's times are correct then it is to make sure everything is correct.

David Lang

It's certainly an edge case; normally connectivity interruptions are either "very brief" (absorbed by in-memory queue), or "short" (absorbed by DAQ for a few minutes/hours depending on log volume). But if they are very long, the time difference between the oldest and newest logs (which are being delivered in roughly alternating batches during the DAQ burn-down) can be quite large, like "yesterday, now, yesterday, now, yesterday..."
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to