Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On 12/14/2013 05:12 PM, Zbigniew Jędrzejewski-Szmek wrote: On Fri, Dec 13, 2013 at 10:16:16PM +0100, Karol Lewandowski wrote: One of the problems I see, though, is that no matter how deep I make the queue (`max_dgram_qlen') I still see process sleeping on send() way earlier that configured queue depth would suggest. Are you sure that the sysctl is set early enough, before the listening socket is created? I'm doing most of my testing in nspawn-able container, so it's ok here. I will take special precautions to make sure it's set early enough on our handheld, though. Thanks for the hint! Karol ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On 12/14/2013 04:47 AM, Lennart Poettering wrote: On Fri, 13.12.13 22:16, Karol Lewandowski (lmc...@gmail.com) wrote: On Fri, Dec 13, 2013 at 03:45:36PM +0100, Lennart Poettering wrote: On Fri, 13.12.13 12:46, Karol Lewandowski (k.lewando...@samsung.com) wrote: One of the problems I see, though, is that no matter how deep I make the queue (`max_dgram_qlen') I still see process sleeping on send() way earlier that configured queue depth would suggest. It would be interesting to find out why this happens. I mean, there are three parameters here I could think of that matter: the qlen, SO_SNDBUF on the sender, and SO_RCVBUF on the receiver (though the latter two might actually change the same value on AF_UNIX? or maybe one of the latter two is a NOP on AF_UNIX?). If any of them reaches the limit then the sender will necessarily have to block. (SO_SNDBUF and SO_RCVBUF can also be controlled via /proc/sys/net/core/rmem* and ../wmem*... For testing purposes it might be easier to play around with these and set them to ludicrously high values...) That's it. While journal code tries to set buffer size via SO_SNDBUF/SO_RCVBUF options to 8MB, kernel limits these to wmem_max/rmem_max. On machines I've tested respective values are quite small - around 150-200kB each. Increasing these did reduce context switches considerably - preliminary tests show that I can now queue thousands of messages (~5k) without problems. I will test this thoroughly in next few days. I do wonder what is the rationale behind such low limits... Thanks a lot! Karol ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Mon, 16.12.13 15:36, Karol Lewandowski (k.lewando...@samsung.com) wrote: On 12/14/2013 04:47 AM, Lennart Poettering wrote: On Fri, 13.12.13 22:16, Karol Lewandowski (lmc...@gmail.com) wrote: On Fri, Dec 13, 2013 at 03:45:36PM +0100, Lennart Poettering wrote: On Fri, 13.12.13 12:46, Karol Lewandowski (k.lewando...@samsung.com) wrote: One of the problems I see, though, is that no matter how deep I make the queue (`max_dgram_qlen') I still see process sleeping on send() way earlier that configured queue depth would suggest. It would be interesting to find out why this happens. I mean, there are three parameters here I could think of that matter: the qlen, SO_SNDBUF on the sender, and SO_RCVBUF on the receiver (though the latter two might actually change the same value on AF_UNIX? or maybe one of the latter two is a NOP on AF_UNIX?). If any of them reaches the limit then the sender will necessarily have to block. (SO_SNDBUF and SO_RCVBUF can also be controlled via /proc/sys/net/core/rmem* and ../wmem*... For testing purposes it might be easier to play around with these and set them to ludicrously high values...) That's it. While journal code tries to set buffer size via SO_SNDBUF/SO_RCVBUF options to 8MB, kernel limits these to wmem_max/rmem_max. On machines I've tested respective values are quite small - around 150-200kB each. Hmm, so on the journald's side we actually use SO_RCVBUFFORCE to override that kernel limit. If I understood you correctly though then SO_SNDBUF on the sending side is the issue here, not SO_RCVBUF on the receiving side. We could certainly update src/journal/journal-send.c to also use SO_SNDBUFFORCE on the client side, but that would leave unpriviliged clients and traditional /dev/log clients in the cold, since the SO_SNDBUFFORCE requires privs, and the client side for /dev/log lives in glibc, not in systemd. Increasing these did reduce context switches considerably - preliminary tests show that I can now queue thousands of messages (~5k) without problems. I will test this thoroughly in next few days. I do wonder what is the rationale behind such low limits... Well, usually the logic is to keep things conservative until you notice that this creates issues. To fix this properly, and comprehensively I'd really like to see three changes in the kernel: - Introduce a pair of SO_QLEN and SO_QLENFORCE sockopts to the kernel so that we can set the qlen per-socket - Make the defaults for the rwmem configurable at kernel compile time - Increase the defaults of the kernel Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Mon, 16.12.13 16:54, Lennart Poettering (lenn...@poettering.net) wrote: That's it. While journal code tries to set buffer size via SO_SNDBUF/SO_RCVBUF options to 8MB, kernel limits these to wmem_max/rmem_max. On machines I've tested respective values are quite small - around 150-200kB each. Hmm, so on the journald's side we actually use SO_RCVBUFFORCE to override that kernel limit. If I understood you correctly though then SO_SNDBUF on the sending side is the issue here, not SO_RCVBUF on the receiving side. We could certainly update src/journal/journal-send.c to also use SO_SNDBUFFORCE on the client side, but that would leave unpriviliged clients and traditional /dev/log clients in the cold, since the SO_SNDBUFFORCE requires privs, and the client side for /dev/log lives in glibc, not in systemd. I made such a change in git now. But again, this is only a very partial solution, as it only covers native clients with priviliges. It does not cover unpriviliged clients or traditional syslog() clients... Also, it cannot influence the qlen. But I fear the rest of this really needs to be fixed in the kernel (and glibc), we cannot do much about this from the systemd side... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Sat, Dec 14, 2013 at 8:12 AM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: Are you sure that the sysctl is set early enough, before the listening socket is created? Perhaps this is why your suggestion [1] for the journal bootup issue didn't help. [1] https://bugs.freedesktop.org/show_bug.cgi?id=71483#c10 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Fri, Dec 13, 2013 at 10:16:16PM +0100, Karol Lewandowski wrote: One of the problems I see, though, is that no matter how deep I make the queue (`max_dgram_qlen') I still see process sleeping on send() way earlier that configured queue depth would suggest. Are you sure that the sysctl is set early enough, before the listening socket is created? Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On 12/12/2013 07:05 PM, Lennart Poettering wrote: On Thu, 12.12.13 17:42, Karol Lewandowski (lmc...@gmail.com) wrote: Hmm, but this wouldn't be any different from old syslog, right? I mean, old syslog also uses sendv() and recv() on AF_UNIX SOCK_DGRAM sockets... That's true, however, we aren't migrating from syslog, but from Android Logger. Android provides /dev/log_{main,radio,...} device nodes which are simple in-kernel circular buffers apps can write to at any time. This was invented when /dev/kmsg wasn't writable and serves more-or-less same purpose. From 2.6.29 it's in mailine kernel under drivers/staging/android/logger.c But if the the socket layer is really that much slower than android logging, then that's kinda sad, and the kernel socket stuff should probably be fixed? It's not that slower when we take throughput as only parameter: packet | | size [B] | Adnroid [kB] | DGRAM [kB] -+--+ 64 |110640| 84827 128 |212602|166866 256 |423880|328440 512 |815094|644499 1024 | 1543326| 1246914 2048 | 2751674| 2359654 4096 | 2821260| 4145395 However, if take context switches into account picture starts to look differently: | Android | DGRAM -+---+-- context switches | 1818 | 580443 (involuntary + voluntary) | | system time | 149.97| 118.62 [seconds] user time| 4.03| 7.45 [secodns] (Columns contain cumulative data: client + server) Attached raw results and test program. This is why I decided to try to make logging completely async and see if/what changes. Well, if AF_UNIX is slow, then we really should try to fix that in the kernel instead of bypassing it in userspace... Whether we rely on the socket buffer of the AF_UNIX socket or put together our own buffer in /dev/shm shouldn't be much of a difference if they both have the same size and can take the same number of entries... I do agree. Truth is I have looked at linux/net/ code once but I didn't grok it well. I guess it's the time to take a second look. ;) Here's another option: extend journald to use kdbus as additional transport. This is something we want to do anyway since the kdbus transport will attach the metadata we need without race to each packet. Given that kdbus ultimately is just a way to write into an mmaped tmpfs that some other process owns this should not be much worse than the android logger in performance. I doubt it would help as other side would still be woken up on _every_ message, right? Cheers, Karol Client: /usr/bin/time --verbose ./logger-client Average stats: packet size = Bytes/s 64 =113295768 Bytes/s = 110640 kB/s 128 =217704976 Bytes/s = 212602 kB/s 256 =434053344 Bytes/s = 423880 kB/s 512 =834656704 Bytes/s = 815094 kB/s 1024 = 1580365824 Bytes/s = 1543326 kB/s 2048 = 2817714432 Bytes/s = 2751674 kB/s 4096 = 2888970749 Bytes/s = 2821260 kB/s Command being timed: ./logger-client User time (seconds): 2.20 System time (seconds): 74.80 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:17.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 624 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 204 Voluntary context switches: 1064 Involuntary context switches: 368 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Server: /usr/bin/time --verbose ./logger-server Command being timed: ./logger-server User time (seconds): 1.83 System time (seconds): 75.17 Percent of CPU this job got: 30% Elapsed (wall clock) time (h:mm:ss or m:ss): 4:08.72 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 480 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 167 Voluntary context switches: 242 Involuntary context switches: 144 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Fri, 13.12.13 12:46, Karol Lewandowski (k.lewando...@samsung.com) wrote: From 2.6.29 it's in mailine kernel under drivers/staging/android/logger.c But if the the socket layer is really that much slower than android logging, then that's kinda sad, and the kernel socket stuff should probably be fixed? It's not that slower when we take throughput as only parameter: packet | | size [B] | Adnroid [kB] | DGRAM [kB] -+--+ 64 |110640| 84827 128 |212602|166866 256 |423880|328440 512 |815094|644499 1024 | 1543326| 1246914 2048 | 2751674| 2359654 4096 | 2821260| 4145395 However, if take context switches into account picture starts to look differently: | Android | DGRAM -+---+-- context switches | 1818 | 580443 (involuntary + voluntary) | | system time | 149.97| 118.62 [seconds] user time| 4.03| 7.45 [secodns] (Columns contain cumulative data: client + server) Attached raw results and test program. Well, are you suggesting that the AF_UNIX/SOCK_DGRAM code actually hands off the timeslice to the other side as soon as it queued something in? THat would be news to me. The kernel does context switches when the timeslice is over not earlier afaik, and you'll get that aynway... This is why I decided to try to make logging completely async and see if/what changes. Well, if AF_UNIX is slow, then we really should try to fix that in the kernel instead of bypassing it in userspace... Whether we rely on the socket buffer of the AF_UNIX socket or put together our own buffer in /dev/shm shouldn't be much of a difference if they both have the same size and can take the same number of entries... I do agree. Truth is I have looked at linux/net/ code once but I didn't grok it well. I guess it's the time to take a second look. ;) Here's another option: extend journald to use kdbus as additional transport. This is something we want to do anyway since the kdbus transport will attach the metadata we need without race to each packet. Given that kdbus ultimately is just a way to write into an mmaped tmpfs that some other process owns this should not be much worse than the android logger in performance. I doubt it would help as other side would still be woken up on _every_ message, right? Well, it gets woken up if its waiting for that. But the kernel will only give CPU time to it when the senders timeslice is over... There's very little difference to mmap... I mean, you need to tell the other side that it should look in the buffer, how do you want to do that otherwise? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Fri, Dec 13, 2013 at 03:45:36PM +0100, Lennart Poettering wrote: On Fri, 13.12.13 12:46, Karol Lewandowski (k.lewando...@samsung.com) wrote: Well, are you suggesting that the AF_UNIX/SOCK_DGRAM code actually hands off the timeslice to the other side as soon as it queued something in? If by other side you mean receiving one, then no - qnx seems to do that, but it isn't the case here. What I'm trying to say here is that kernel puts the process doing send(2) into sleep when (a) queue fills up and (b) fd is blocking one (otherwise we just get EAGAIN). That's expected, I presume. One of the problems I see, though, is that no matter how deep I make the queue (`max_dgram_qlen') I still see process sleeping on send() way earlier that configured queue depth would suggest. THat would be news to me. The kernel does context switches when the timeslice is over not earlier afaik, and you'll get that aynway... If you take a look at Results.sock_dgram from my previous mail you will find this for sock-client: Voluntary context switches: 577696 Involuntary context switches: 633 Please correct me if I'm wrong but I think that involuntary context switch happens when process' timeslice is over, where voluntary is caused by syscall, probably doing some form of IO (read/send/select/etc.) If I make sock-server sleep(5) just before poll and strace the client I can clearly see that it blocks on send(). After receiving side picks packet up, then the clients is able to sent another one. This what I think is happening, please let me know if this sounds like utter nonsense to you. :) Here's another option: extend journald to use kdbus as additional transport. This is something we want to do anyway since the kdbus transport will attach the metadata we need without race to each packet. Given that kdbus ultimately is just a way to write into an mmaped tmpfs that some other process owns this should not be much worse than the android logger in performance. I doubt it would help as other side would still be woken up on _every_ message, right? Well, it gets woken up if its waiting for that. But the kernel will only give CPU time to it when the senders timeslice is over... I'm either doing something terribly wrong or timeslice can end due to the call to send(2) and friends. There's very little difference to mmap... I mean, you need to tell the other side that it should look in the buffer, how do you want to do that otherwise? I don't. I just want the buffer to be huge enough to not cause client to block, effectively waiting for receiving side pick stuff up. Cheers, Karol ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
Folks, We are trying to apply journald as default (and only) logging system for our ARM-based handhelds. There is one major problem, though - performance. We have found that journald can consume considerable amount of CPU time under moderate load (10-20% of CPU per, for say 100-300msg/s). We have profiled journald and there are few places that could benefit from _some_ optimizations but I have come to conclusion it isn't journald code that is real bottleneck - it's, or seems to be, synchronous nature of send()/recv(). In default configuration each send() can cause context-switch which is especially costly on ARM. This is why I decided to try to make logging completely async and see if/what changes. Following patchset (based on top of f20c84c15f3) implements pickup logic in journald [patch 1] and uncoditionally enables it in client api [patch 2]. In short - instead of sending DGRAMs I do drop files to /dev/shm which are then picked up by journald at specified intervals. (Precisely, I have (ab)used mechanism used to pass fd via DGRAM when message is too big.) Patch 3 contains standalone test program, which floods journald with specified number of messages per second. On my development (x86) machine I made following test with good old time(1): # send 2000 messages (each of size 100 bytes) to journal per second # for ~5mins ./journal-feeder 2000 100 sleep $((60*5 - 10)) kill $! Unpatched systemd: == /usr/bin/time --verbose ./systemd-journald sleep $((60*5)) kill $(pidof systemd-journald) User time (seconds): 16.87 System time (seconds): 39.38 Percent of CPU this job got: 18% Elapsed (wall clock) time (h:mm:ss or m:ss): 5:00.06 Major (requiring I/O) page faults: 30 Minor (reclaiming a frame) page faults: 1061740 Voluntary context switches: 496757 Involuntary context switches: 497 File system inputs: 0 File system outputs: 50592 Deferred pickup every 5 secs (logs passed via /dev/shm/FILE rather than DGRAMs): /usr/bin/time --verbose ./systemd-journald sleep $((60*5)) kill $(pidof systemd-journald) User time (seconds): 8.27 System time (seconds): 8.40 Percent of CPU this job got: 5% Elapsed (wall clock) time (h:mm:ss or m:ss): 5:00.36 Major (requiring I/O) page faults: 180 Minor (reclaiming a frame) page faults: 38976 Voluntary context switches: 157 Involuntary context switches: 1933 File system inputs: 8 File system outputs: 307312 Deferred pickup every 1 sec (logs passed via /dev/shm/FILE rather than DGRAMs): === /usr/bin/time --verbose ./systemd-journald sleep $((60*5)) kill $(pidof systemd-journald) User time (seconds): 9.54 System time (seconds): 9.56 Percent of CPU this job got: 6% Elapsed (wall clock) time (h:mm:ss or m:ss): 5:00.15 Major (requiring I/O) page faults: 180 Minor (reclaiming a frame) page faults: 39034 Voluntary context switches: 382 Involuntary context switches: 2359 File system inputs: 0 File system outputs: 307768 This is quite naive test as a far more changes when we pass fd instead of DGRAM (ucred is NULL, which means we won't gather information from proc at all). However, we verified that cost of this is rather minor and not that important in this test. Of course, I'm not proposing to include it as is (uncoditionally enable it for all messages) but I would just like to hear your opinion about this idea. If I read these numbers correctly there seems to be considerable benefit in terms of CPU time. Something like that wouldn't be useful for all messages, but I think that gazzilions of debug messages that our apps produce could be send in non-synchronous and low-overhead manner. RFC. Thanks in advance! Karol Lewandowski (3): journald: Add deferred log processing logic journal: Try to pass log via /dev/shm to avoid waking up journal journal-feeder: trivial test program to flood journald journal-feeder.c | 64 +++ src/journal/journal-send.c | 39 ++ src/journal/journald-gperf.gperf | 2 + src/journal/journald-server.c| 107 +++ src/journal/journald-server.h| 5 ++ src/journal/journald.conf| 2 + 6 files changed, 219 insertions(+) create mode 100644 journal-feeder.c -- 1.8.4.rc3 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Thu, 12.12.13 14:12, Karol Lewandowski (k.lewando...@samsung.com) wrote: Folks, We are trying to apply journald as default (and only) logging system for our ARM-based handhelds. There is one major problem, though - performance. We have found that journald can consume considerable amount of CPU time under moderate load (10-20% of CPU per, for say 100-300msg/s). We have profiled journald and there are few places that could benefit from _some_ optimizations but I have come to conclusion it isn't journald code that is real bottleneck - it's, or seems to be, synchronous nature of send()/recv(). In default configuration each send() can cause context-switch which is especially costly on ARM. Hmm, but this wouldn't be any different from old syslog, right? I mean, old syslog also uses sendv() and recv() on AF_UNIX SOCK_DGRAM sockets... Note that on Linux the number of datagrams you can queue in an AF_UNIX/SOCK_DGRAM socket is very low (15). THis is controlled via /proc/sys/net/unix/max_dgram_qlen. When this limit is hit the client needs to wait for the server to finish processing. Unfortunately there's only that system-global way to increase the qlen. For a long time it has been on our wishlist to make this a per-socket tunable with a new sockopt() that behaves similar to SO_SNDBUF and friends... This is why I decided to try to make logging completely async and see if/what changes. Well, if AF_UNIX is slow, then we really should try to fix that in the kernel instead of bypassing it in userspace... Whether we rely on the socket buffer of the AF_UNIX socket or put together our own buffer in /dev/shm shouldn't be much of a difference if they both have the same size and can take the same number of entries... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Thu, Dec 12, 2013 at 03:51:05PM +0100, Lennart Poettering wrote: On Thu, 12.12.13 14:12, Karol Lewandowski (k.lewando...@samsung.com) wrote: (OP here from private email.) Folks, We are trying to apply journald as default (and only) logging system for our ARM-based handhelds. There is one major problem, though - performance. We have found that journald can consume considerable amount of CPU time under moderate load (10-20% of CPU per, for say 100-300msg/s). We have profiled journald and there are few places that could benefit from _some_ optimizations but I have come to conclusion it isn't journald code that is real bottleneck - it's, or seems to be, synchronous nature of send()/recv(). In default configuration each send() can cause context-switch which is especially costly on ARM. Hmm, but this wouldn't be any different from old syslog, right? I mean, old syslog also uses sendv() and recv() on AF_UNIX SOCK_DGRAM sockets... That's true, however, we aren't migrating from syslog, but from Android Logger. Android provides /dev/log_{main,radio,...} device nodes which are simple in-kernel circular buffers apps can write to at any time. This was invented when /dev/kmsg wasn't writable and serves more-or-less same purpose. From 2.6.29 it's in mailine kernel under drivers/staging/android/logger.c Note that on Linux the number of datagrams you can queue in an AF_UNIX/SOCK_DGRAM socket is very low (15). THis is controlled via /proc/sys/net/unix/max_dgram_qlen. When this limit is hit the client needs to wait for the server to finish processing. Unfortunately there's only that system-global way to increase the qlen. For a long time it has been on our wishlist to make this a per-socket tunable with a new sockopt() that behaves similar to SO_SNDBUF and friends... We have played this tunable and it didn't bring considerable/measurable improvements, unfortunately. This is why I decided to try to make logging completely async and see if/what changes. Well, if AF_UNIX is slow, then we really should try to fix that in the kernel instead of bypassing it in userspace... Whether we rely on the socket buffer of the AF_UNIX socket or put together our own buffer in /dev/shm shouldn't be much of a difference if they both have the same size and can take the same number of entries... I do agree. Truth is I have looked at linux/net/ code once but I didn't grok it well. I guess it's the time to take a second look. ;) Cheers, Karol ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead
On Thu, 12.12.13 17:42, Karol Lewandowski (lmc...@gmail.com) wrote: Hmm, but this wouldn't be any different from old syslog, right? I mean, old syslog also uses sendv() and recv() on AF_UNIX SOCK_DGRAM sockets... That's true, however, we aren't migrating from syslog, but from Android Logger. Android provides /dev/log_{main,radio,...} device nodes which are simple in-kernel circular buffers apps can write to at any time. This was invented when /dev/kmsg wasn't writable and serves more-or-less same purpose. From 2.6.29 it's in mailine kernel under drivers/staging/android/logger.c But if the the socket layer is really that much slower than android logging, then that's kinda sad, and the kernel socket stuff should probably be fixed? This is why I decided to try to make logging completely async and see if/what changes. Well, if AF_UNIX is slow, then we really should try to fix that in the kernel instead of bypassing it in userspace... Whether we rely on the socket buffer of the AF_UNIX socket or put together our own buffer in /dev/shm shouldn't be much of a difference if they both have the same size and can take the same number of entries... I do agree. Truth is I have looked at linux/net/ code once but I didn't grok it well. I guess it's the time to take a second look. ;) Here's another option: extend journald to use kdbus as additional transport. This is something we want to do anyway since the kdbus transport will attach the metadata we need without race to each packet. Given that kdbus ultimately is just a way to write into an mmaped tmpfs that some other process owns this should not be much worse than the android logger in performance. Special care needs to be taken to make this work though, as we should do this only on kdbus systems. On non-kdbus systems journald can never talk to dbus, because dbus-daemon is a client of journald and things would deadlock... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel