2014-07-13 16:35 GMT+08:00 Sean Conner <[email protected]>: > It was thus said that the Great Justin Lee once stated: >> 2014-07-13 3:31 GMT+08:00 David Lang <[email protected]>: >> > On Sat, 12 Jul 2014, Justin Lee wrote: >> > >> >> Hello List, >> >> >> >> I noticed that local messages are passed by socket rather than >> >> directly written to log files. The evidence is as follows: >> > >> > >> > Yes, this is how syslog works. >> > >> > >> >> Since log mechanism is an important infrastructure of many programs, >> >> it is supposed to be as fast as possible. So do we have some >> >> configurations or settings in rsyslog which make local messages >> >> written directly into log files without passing through socket? >> > >> > >> > actually, writing to disk can be slower than writing to a socket, so it's >> > not clear that it's a performance win to write to disk directly. >> >> I think that writing to disk is supposed to be the final step of >> rsyslogd in processing local messages. After all, we couldn't see the >> messages in log files if rsyslogd didn't write to disk at all, right? >> But in addition to disk writing, use of socket by syslog to transfer >> local log messages introduces extra CPU time or memory to do the >> following things: > > A program (and for now, let's just assume a program under Unix to keep > things simple) has a few options for logging: > > 1. open a file and log stuff to that file. Quick and simple. Well, not so > simple actually. Because of buffering (at the C stdio layer), if a program > were to suddenly die, some portion of the logged messages would be lost. > You can minimize this by disallowing C stdio buffering, so a sudden crash of > the program won't lose logs, but this could slow down the program > (buffering was invented to speed things up). Also, just because you do > > fh = open("logfile.txt",O_WRONLY); > write(fh,"Thar she blows\n",15); > close(fh); > > doesn't mean that the message "Thar she blows" will actually be stored on > spinning metal disks. After the close happens the computer looses all power > (hey, we were planning on installing UPSes) then again, log messages lost, > due to buffering in the kernel (and Linux does some agressive file > buffering). Sure, you can add a call to fsync() after the write, but, if > the operating system bothers to honor the semantics, then you really loose > performance of the program. > > A secondary problem with this is log rotation, the ability to stop recording > in one file (so it can be archived if need be, or just deleted) and start > recoding in a new file. You have a few solutions here: the program itself > can log rotate, but now the log rotation is either hard coded into the > program; there's a configuration option or a command line option (or both), > or it responds to SIGINT---all require more code, more testing, and more > places for things to go boom.
Right, log rotation may be hard coded into the user program and I believe log rotation is also hard coded into our rsyslog daemon program. But there is yet another place to put log rotation code - the library. Put log rotation code in libraries may be another choice. Library is the best place to put shared code and/or functionalities IMO. It should also save you from unnecessary testing since everyone calls the library so only the library itself need to be tested. > > Also, if you want to send the logs elsewhere, you'll need an external > program to copy the logs over or, to sit there, reading the log file and > sending the logs are they're appended (with a potential delay if C stdio > buffering is in use). > > 2. send the log to another process to deal with the information. This is > where syslog() lives and yes, the current implementation uses local (Unix) > sockets to transfer the data, but really, your options are limited here: > > a) sockets (local or loopback or whatever) > b) named pipes > c) message queues > d) shared memory > > The first three all require the kernel to copy the data from the sending > process to the receiving process; the last one can skip the copying bit, but > there are nasty synchronizations issues that would need to be worked out > that would probably obliviate any savings of copying the data. Don't know why... but every time when talking about IPCs most people tend to select a) b) c) but not shared memory. I don't see how hard the synchronizations issues can be to implement a first-in-first-out queue (which is similar to socket buffer if I understand correctly) in shared memory. Unix socket scheme can not save the time due to scheduling latency, because it needs a separate process to listen to the socket. By contrast, share memory scheme doesn't need such daemon/server process. As I have mentioned previously, we cannot guarantee that the daemon process gains CPU time immediately (since there could be other processes cut in, according to scheduling policy) right after a user program has sent out a log message to the socket. Therefore the latency of logging a message is high. As far as I know, only making use of shared memory or moving rsyslog to the kernel can avoid the situation. It is quite similar to the issues which D-Bus and X Window are facing. Both of them have daemon/server process as their central facility to handle all requests from local client programs. And both of them are suffering from being low efficiency. The D-Bus team was trying to move D-Bus to Linux kernel. And there is a project known as Wayland display protocol which is under development with potential to replace X Window in the future. Wayland project tries to offload works form server process to the client processes and let client processes access graphics directly (DRI, Direct Rendering Infrastructure). Which is very similar to let log client programs write to log files directly without passing through rsyslog daemon. > > You will also need to work out how the client and server establish a > connection. For a sockets and named pipes, it's easy---the client just > establishes a connection with a well-known named resource (in the case of > syslog(), it's typically "/dev/log"). It's less clear how one would do that > in the case of message queues and shared memory (which, in my experience, > tend to be used where a master process creates children processes and use > message queues/shared memory to communicate with, rather than with > independent processes). > > So it comes down to sockets or named pipes. syslog() works with a datagram > based local socket, which does mean a system call, but only until either the > data is copied out, or, if the socket buffer is full, just returns (I > think---it's a datagram and the semantics of a datagram are typically "best > effort") so there's no potential blocking of the client. Even if the server > isn't running, the packets will just be tossed into the bit bucket. With a > named pipe, there does exist the possiblilty of blocking the client (and a > definite possibility of blocking if the logging server isn't running). > > Now, since you already have another process to receive the logs, it's not > that big of a deal to have it forward the logs elsewhere, and if you are > doing it over a local socket, that typically means the logic is already > there to transmit it over a network socket with very little work. > > -spc > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

