Hi David,

>Whatever you do with your logging system, at some point it is going to break 

An interesting premise, and looking at the options in front of me I would be 
inclined to agree (at least to some extent), but I don't think it "has" to
be that way.

>As a result, the application really needs to double log.

Absolutely, at the moment I'm logging to local storage and to a remote central
server. I don't like the idea of local logging, there are many issues with
data storage and persistence with regards to automatic (re-)provisioning, and
I would far rather double log to two network targets .. however .. (!)

First issue with this is signing. If you don't have a local (signed) copy of 
your 
logs but instead have two (remote) copies, the problem is proving that
logging information that's passed over a network connection hasn't been tampered
with in transit. (not necessarily a technical issue, more of a 'convince a 
court'
type issue)

Second issue, if you end up with two remote logs that differ, how to you prove 
which is authoritative and moreover, given they differ, how to you prove the 
system itself is trustable, which leads on to triple logging etc etc.

[insert more issues here ...]

I guess 'my' ideal solution would be (ok, so this is an off-the-cuff design);

o Sign logs as they happen [stream/sliding window etc]
o Hold local copies for buffering / backlogging only
o Double log the data to two remotes
o Independently double log the signing data to two (other) remotes
o Transit should ensure no data loss
o Storage should ensure retries / retransmits in the event of any sort of 
failure
o A comprehensive tool for 'proving' the integrity is a specific message
o System should gracefully shut-down [with no data loss] @ 99% disk usage

Maybe a 'pie in the sky' solution, and I may have missed out a lot, but if 
something
appeared with this spec I'd be more than a little inclined to try it ...:-)

As far as disk flushing/speed is concerned, I'm happy to call that a hardware 
issue. You
can have real speed and data integrity, but it needs a device with a battery 
backed RAM cache
[things are heading in that direction, RAPID mode on Samsung SSD's for example]
for writes, which is a platform issue and not my problem .. ;-)

Regards,
Gareth.

----- Original Message -----
From: "David Lang" <[email protected]>
To: "rsyslog-users" <[email protected]>
Sent: Tuesday, 9 September, 2014 8:16:58 PM
Subject: Re: [rsyslog] Logging to central server / data loss ....

On Tue, 9 Sep 2014, Gareth Bult wrote:

> Hi Rainer,
>
> Many thanks for looking, I appreciate you're busy.
>
> If it looked trivial I might've tried to patch it, but it "looks" like
> it's pulling from the queue and then running the send plugins, so my initial
> impression is that various bits of code need reordering - which is too much
> for me. I would guess it needs to be peeking the queue and only de-queueing
> once all the output modules have been satisfied ..
>
> It's interesting how things develop, back in "the good-old-days" central 
> logging
> was useful to spotting problems without sshing to lots of boxes, and some data
> loss / the use of udp was quite acceptable. Today however, people seem to be
> using it for collecting 'important' information where 100% accuracy and log
> signing are critical .. a paradigm shift in "use-case" really ...

been there, done that, and found that people didn't really want what they 
claimed they wanted :-)

Whatever you do with your logging system, at some point it is going to break 
(disk fills up, fails, etc)

A question that you have to ask your users/management is "what do you want to 
happen when a log cannot be written?" If the anwer is that they would rather 
have the application fail and present the user with an error than to take an 
action that's not logged, then they are potentially a candidate for what I call 
"Audit grade logging". Keep in mind that the application includes login and ssh 
if you do this to all logs.

When you shift to using Audit Grade logging, things slow down a LOT, something 
on the order of 1000x. I was doing benchmarking of this a few years ago, and 
with a high end PCI SSD drive, I was able to get between 2-8K logs/sec 
(depending on filesystem, ext3 being 2k) compared to 400K logs/sec on the same 
system with a simple SATA driver for normal logging.

Also think about failure modes of the application. If it logs before it takes 
the action, then something may happen before the action is taken and the log is 
telling you that something happened that didn't.

If the application takes an action and the logs it, it may take the action and 
then die before sending out the log.

As a result, the application really needs to double log.

First, log "I intend to take action X", then take the action and log "I 
succeeded/failed to take action X". You then need to watch for the first 
message 
without the second and investigate if the action did or did not take place in 
those cases.

If you are still wanting to pursue this, we can talk more and get into more 
details about what this requires.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to