(Note: This message discusses some implementation details of the new
syslog on a Unix system, but only in as much as it relates to protocol
design issues.)
The general mode that we've been using for discussing the new syslog
protocol is the transfer of logs from one machine to another. In this
case, one syslog daemon is the sender, another is the receiver, and
the protocol is used to communicate between them. We can assume that
both daemons have a message queue, so that if the receiver is
unavailable, the sender will be able to queue the messages without
interrupting anything else it's doing. This problem is therefore
reduced to an easily-solved "hot potato" algorithm like SMTP, where a
message is only deleted from a queue when it's ACK'ed by the next hop,
at which point it's safely in the next queue.
However, we also need to address the issue of how processes initially
submit log messages to their local syslog daemon. In this case, the
receiver is still a syslog daemon, but the sender is an application
process which doesn't have a queue and can't be expected to retry
transmission or handle asynchronous notification of submission
failure. The API would likely be implemented as a library call which
accepts a timeout parameter and blocks until either the syslog server
ACKs the message or the timeout expires.
Under a low system load, this mechanism should work fine because there
are plenty of system resources to field these requests. But in a
high-load situation, the response time for the syslog daemon to ACK
the message can grow quite rapidly. If the process submitting the
message is a network-based data provider, blocking on log submission
will cause the congestion problem to get much worse, because the
request queue will keep piling up at the same rate while the daemons
are blocking on syslog instead of fielding more requests.
If the process is something like a webserver, it might choose to
submit best-effort messages like the current syslog so that it can go
on serving webpages. OTOH, if someone is trying to break into the
machine, /bin/login would probably want to use the guarunteed-delivery
capabilities so that the security information would not get lost.
What this boils down to is that the application programmer is in the
best position to determine whether or not guarunteed delivery is the
right thing to do, and the protocol needs to support both.
Another possible difference between syslogd-syslogd transfers and
process-syslogd transfers is the fsync() interval. Since the
receiving syslogd cannot send an ACK until it's fsync()'ed its queue
file(s), we will probably want to implement a mechanism whereby the
server can accept a number of messages before fsync()'ing, and then
ACK the last one to tell the client that all of the messages up to
that point have been received. This obviously doesn't work for
initial message submission, because an application process needs an
ACK immediately in order to stop blocking in the library call.
The solution is to include some sort of negotiation in the protocol
which allows to the sender to tell the receiver if it requires an ACK
immediately or if it can be deferred. We might implement this as a
simple flag in the message which says "give me an ACK immediately", or
we might include a more complex negotiation phase during initial
handshaking which determines the maximum number of messages/bytes/time
which can go by before an ACK is received.
Comments...?
--
Mark D. Roth <[EMAIL PROTECTED]>
http://www.feep.net/~roth/