Re: [rsyslog] message order was: Re: plans for rsyslog 8.8

Rainer Gerhards Sat, 17 Jan 2015 03:22:49 -0800

2015-01-16 23:39 GMT+01:00 Dave Caplinger <[email protected]>:

> On Jan 16, 2015, at 2:51 PM, David Lang <[email protected]> wrote:
> >
> > On Fri, 16 Jan 2015, Dave Caplinger wrote:
> >
> >> ... filesystem buffer can help speed reading data previously written to
> >> disk if your outage was short enough to not get "too far" behind,
> because the
> >> data is still actually in RAM so you don't actually have to pay
> physical IOPS
> >> to touch the disk to retrieve it.
> >
> > the filesystem actions are the super expensive parts, even if things are
> cached
> > to ram. There are also fsyncs that take place to make the data safe, and
> they
> > force disk IOPS
>
> I agree the write path is certainly expensive (and more so by frequent
> fsyncs), but when you come back 'n' minutes later to read it (and it's
> still in the filesystem buffer), I only meant that it's much quicker than
> having to actually seek and read from disk again.  So you're not paying the
> penalty twice in this case.
>
> >> ... time windows ...
> >
> > something to think about here, what do you use as a time reference (both
> for
> > 'now' and for the log message you are processing), do you use the
> current time
> > on the system doing the processing, or the timestamps in the messages.
>
> A combination of receive time at the collector closest to the source
> (which we can control the clocks on) along with current time at the system
> doing the processing.  Lies the source device told about it's time are kept
> as-is but not believed...
>

A couple of things: in some ancient version of rsyslog (v2? v3? - don't
really remember), we had DA queues work the way it was proposed here: when
DA mode was entered, the in-memory queue was shut down, and queuing was
over the disk, only, until the disk queue was empty again. Then, the disk
queue was shut down and the memory queue again being used.

The typical experience in practice was: as soon as the disk queue was used,
the system rarely returned to in-memory mode, because the disk queue is so
much slower. Actually, the system more or less became unusably slow (at
least too slow for the intended purpose). This especially happened when
going to disk was caused by traffic spikes. For low-volume traffic, though,
this was not a real issue. But then, you could use a disk queue in the
first place.

The system was also very complex and had a number of robustness problems.
Especially races when switching between disk and memory mode were a can of
worms. Just think that you need to handle the case where a message arrives
just in that instant when you originally thought you can shutdown the disk
mode -- many subtle issues along this and similar cases.

With the first performance enhancement project, we realized that strict
message order wasn't achievable in any case, and so it made only very
limited sense to try very hard to "preserve" ordering. See section 7 of [1]
for more details. With that, we changed DA queue operations to what it is
today. The end result was extremely better performance (even when just
using the disk, due to fewer mutex locks and checks), greatly reduced
complexity (IIRC roughly 40% of the queue code, the most complex one, was
used for handling mode transitions), greater robustness and greatly
enhanced practical usability of DA queues.

So it doesn't make any sense at all to change the current system back to
the pre-2009 state of affairs.

Does that mean it is impossible to change the way queues work? Of course,
not. In fact, I'd like to do that for quite some times. But it is a *very*
big project. At least, it requires a full redesign and rewrite of the queue
subsytem. It would need to work much more like the OS virtual memory, which
would also remove mode transitions because those would simply be cache
misses. IMO that would also dramatically speed up disk queues. But please
don't let's discuss how exactly such a system would need to be designed,
because I know I can't implement it in the foreseeable future and so this
would just be a waste of time. The changes to the queue system would also
require changes to the way batching works. All in all, I would expect that
roughly 50% of the core engine would need to be redesigned and rewritten.
My gut estimate is that this is a three to six month fulltime job (the
original queue system took roughly 2+ month of extremely hard work).

We every now and then tried to find a sponsor (anyone up for it?), but this
didn't work out. Adiscon is not willing to fund that work. We even thought
about doing a commercial queue extension system, so that we could spread
the cost over multiple customers, but that's at least currently a no-go due
to licensing. End result: I don't see it happen any time soon. The core
issue is still on my internal todo list, and I try to sneak in parts of the
rewrite at every possibility. But that's a very slow process and it still
means we need one big time slot to rewrite the core queue system.

If one really thinks this out, one may also come to the conclusion that
this is not well spent time. Most of it can be achieved with current
rsyslog just by configuring the system with gigantic swap space and let
rsyslog use "insanely" large amounts of "main" memory for it's in-memory
queue. The OS would then do exactly what a new queue system would do
otherwise. The only potential problem would be system shutdown, where
persisting unprocessed items to disk could take quite (too) long.

Hope that clarifies,
Rainer

[1] http://www.gerhards.net/download/LinuxKongress2010rsyslog.pdf
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] message order was: Re: plans for rsyslog 8.8

Reply via email to