On Sun, 13 Jul 2014, Justin Lee wrote:
2014-07-13 3:31 GMT+08:00 David Lang <[email protected]>:
On Sat, 12 Jul 2014, Justin Lee wrote:
Hello List,
I noticed that local messages are passed by socket rather than
directly written to log files. The evidence is as follows:
Yes, this is how syslog works.
Since log mechanism is an important infrastructure of many programs,
it is supposed to be as fast as possible. So do we have some
configurations or settings in rsyslog which make local messages
written directly into log files without passing through socket?
actually, writing to disk can be slower than writing to a socket, so it's
not clear that it's a performance win to write to disk directly.
I think that writing to disk is supposed to be the final step of
rsyslogd in processing local messages. After all, we couldn't see the
messages in log files if rsyslogd didn't write to disk at all, right?
no, local messages may get written to disk files, they may get sent to other
programs for analysis, they may get sent remotely to another server, they may
get thrown away as not being important.
Actually writing them to a local file is the least useful thing that you can do
with logs. It's the "Hello World" view of logging.
But in addition to disk writing, use of socket by syslog to transfer
local log messages introduces extra CPU time or memory to do the
following things:
1. To handle socket related stuff, such as copying log messages in and
out of socket buffer. Buffer copying introduces overhead.
2. We need a thread or daemon process (rsyslogd) to listen to and
receive log messages from the socket. It increases the memory
footprint of our log infrastructure because of the daemon process
itself.
3. At least one context switch is required to transfer a log message
from user program to rsyslog daemon, because user program and the
syslog are in two different processes. Most context switches flush the
CPU pipeline and hence affecting efficiency.
4. Including a daemon process in mediating log affairs makes message
logging dominated by kernel's scheduling policy and number of running
processes in the system. Because CPU resources and CPU time are shared
by all the processes running in the system, the more running processes
in the system the less CPU time a single process (rsyslogd in our
case) can get to accomplish its job.
5. Because all running processes in the system share the same CPU, the
kernel must execute each process in turn (let each process get a
chance to be executed) such that CPU time are divided and distributed
among all running processes. The order of process execution is
determined by kernel's scheduling policy. The execution order could be
like the steps below:
(1) a user program writes a log message to the syslog socket.
(2) the kernel decides to execute two other processes in the
system first before executing the rsyslog daemon since they have
higher priority or due to the scheduling policy.
that assumes that you don't give the rsyslog process a high enough priority,
which is a configuration error.
(3) the kernel than finally execute rsyslog daemon so that the log
message sent by user program at step (1) is received by the daemon
from syslog socket.
as more and more processes run in the system, the number of processes
get executed at step (2) could be increased. So the time of a log
message getting propagated from user program to rsyslog daemon becomes
longer and longer. And the latency to transfer a log message becomes
higher and higher. I think this kills the performance the most.
but in practice it doesn't, transferring log is cheap, processing the logs,
formatting them, and writing them to disk is actually more expensive.
one thing that you are missing is that disks are very ineffient in doing small
writes. you can write somewhere around 1000 logs to disk in less time that you
can write two logs independently.
Your memory and CPU are virtually unlimited compared to your disk I/O capacity.
David Lang
6. Log messages may need to stay in socket buffer for longer time due
the high latency which implies a bigger socket buffer is required
which implies a greater memory footprint. The case gets worse when
system loading is high and increase the risk of socket buffer overflow
and message loss.
then you have the problem of what happens to the log. If you just write to
disk, you throw away a tremendous amount of flexibility in determining what
happens to the log.
If you send the log through the syslog infrastructure, there are a lot of
things that you can end up doing with it.
You can:
combine the logs from differnet programs into one file
split the logs from one program into multiple files
filter the logs
send the logs to a remote machine
put the logs into a database
etc.
so having your program write directly to disk is less work overall for the
system to do, but it can be slower for your application, and it's far less
flexible.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
Justin Lee
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.