On Tue, 10.05.16 11:43, Ricardo Fraile (rfra...@idealista.com) wrote: > No, there aren't any new messages in that time. This example was > reproduced in a testing server without load, but in other server with > nearly 10Mb of logs each minute the behaviour is the same. > > Following the way to the root cause I fall into a high kworker I/O usage > each minute, some related with ext4 journal said the following command: > > while true ; do clear ; grep -n ^ /proc/$kworker_PID/stack | sort -rn | > cut -d: -f 2- ; sleep 1 ; done > > A search in google reported some similar issues with this filesystem, > but a kernel update from 3.16.7 to 4.5.0 didn't fix anything. > > Only removing the journal support in the ext4 log partition solved it, > but I still don't understand why if SyncIntervalSec sync each 5 minutes, > for example, the journal is so aggressive each minute in a server > without disk writes, apart of logs. > > Anyway, the problem is close to the filesystem side, I don't know if any > other options defined in journald.conf can mitigate it.
syncing and the file change time have little to do with each other. The journal uses memory mapped file access to the journal files. It will make normal memory writes to append new entries to the journal files. On linux, writing to mmap()ed files will not result in inotify/mtime updates. Since we use inotify for "journactl --follow" and similar logic we actually manually schedule ftruncate() calls as a way to trigger inotify/mtime updates. Now, as it turns out doing this frequently actually is bad for the IO load, hence we'll actually don't this all the time, but schedule it with a bit of lag, doing only few delayed calls instead of instant, frequent calls after appending log entries. Now, that's one side of the story: the mtime updates. The other side of the story is fsync()s: in order to keep loss of data minimal when the system is abruptly shut down we'll actually sync the files fully to disk in regular intervals. These intervals may be influenced via SyncIntervalSec=, but note that this timer actually is defined as "sync to disk at the latest after a log entry write". It hence gives the guarantee that whenever something is logged it will not take any longer than the specified time to actually safelly hit the disk. Syncs are also done in other conditions, for example immediately after any log message written with EMERGENCY/CRITIAL/ALERT log level. Long story short: mtime has little to do with SyncIntervalSec=, there's no direct connection and it#s really not the only reason why mtimes are bumped, and mtime updates will happen much later than the actual log event. If you want to figure out what the journal is doing, my recommendation would be to profile it with something like "prof", or alternatively just "strace"ing it. Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel