On Fri, 30.09.11 11:54, Paolo Bonzini (bonz...@gnu.org) wrote: > When enabling readahead on my system (which has a 5400rpm hard drive) > "systemd-analyze blame" output is like this: > > 19507ms udev.service > 18336ms fedora-storage-init.service > 13254ms var-lock.mount > 12960ms var-run.mount > 12871ms media.mount > > This matches visual feedback from systemd's boot log (the > "Starting..." and "Started..." messages on the console appeared quite > slowly). "systemd-analyze plot" shows that the serialization point > is udev. > > Basically, readahead-replay is starving udev and everything else running > early in the boot process. udev cannot simply load the modules and > programs it needs; rather, it has to wait for readahead to fetch them. > The pack file shows things such as libX11, libgio and libglib very close > to the beginning of the file, while kernel modules are more towards > the end. The problem is that updated kernel modules are often installed > months after the root partition was formatted, while large files might > be installed at the beginning of the drive and stay there forever. > > The attached file adds a simple heuristic to readahead-collect: break the > files in two groups, reading first the files that are not in /usr, and > then those that are in /usr. This is all but perfect, as it may delay > some files and still load others too early. It may delay some files > because systemd will read from /usr/lib/binfmt.d early at startup (this > sounds clearly wrong, since /usr may not even be mounted at that point!). > Similarly, it will not delay loading GLib (which has to be in /lib because > some programs in /sbin use it) even though in my case it is not needed. > > Still, it was enough to save 5 more seconds, bringing the total to 20. > "systemd-analyze blame" was also more satisfying: > > 8154ms fedora-storage-init.service > 7067ms udev.service > 6064ms var-lock.mount > 6057ms var-run.mount > 6043ms media.mount > > A better heuristic, perhaps involving some kind of topological sort > would likely duplicate the size of readahead-collect, so I went for the > low-hanging fruit.
Hmpf. I can't say I am a particular fan of changes with hardcoded rules like this. readahead currently stricly loads files in the order they are stored on disk (with FS_IOC_FIEMAP, only on rotating media), resp. the order they are used (on SSD). Normally this should really do the right thing for you unless a stream of late-used stuff for some reason ends up at the beginning of the disk. It would be interesting to figure out for your specific file system how the files are laid out on disk there. Note that systemd's readahead implementation is far from ideal. Other implements go much further. For example Ubuntu's ureadahead includes an ext3 parser and not only looks on the location of files on disks but also of directories. This gives them a strategic advantage, but I am strictly against adding any knowledge of low-level file systems into our own systemd implementation, simply because I want to be able to maintain the code. (That said, I think ext4 actually supports FIEMAP on directories nowadays too, so I'd be happy to merge a patch for that, which should fix this problem.) Also systemd at boot opens quite a few of its small unit files before starting the readahead logic. It might make sense to spawn readahead earlier to cover that as well, but then it would become a special process and I'd would probably be better not too have too many of those, especially given that the whole concept of readahead is primarily something to deal with hardware that is more of yesteryear than of the future (i.e. rotating media). I guess what I am trying to say here: there's a lot of stuff to minimize here, and before we add arbitrary rules like the one you suggest i'd very much prefer to see other optimizations done, and most importantly figure out why exactly the simple rule "follow order on disk" doesn't work for you. (Of course, in an ideal world we'd probably not have any readahead-reply process, but simply reorder things on disk according to what we measured, which we actually do for btrfs). Lennart -- Lennart Poettering - Red Hat, Inc. _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel