On Wed, 17.02.16 14:35, Avi Kivity (a...@scylladb.com) wrote: > We are using systemd to supervise our NoSQL database and are generally > happy.
Thank you for the feedback! We are always interested in good feedback like yours. > A few things will help even more: > > 1. log core dumps immediately rather than after the dump completes > > A database will often consume all memory on the machine; dumping 120GB can > take a lot of time, especially if compression is enabled. As the situation > is now, there is a period of time where it is impossible to know what is > happening. > > (I saw that 229 improves core dumps, but did not see this > specifically) With 229 the coredump hook will collect a bit information and then pass things off (including the pipe the coredump is streamed in on) to a mini service that then processes the crash, extracts the stacktrace and writes it to disk. This means you should see the coredump processing as a normal service in "systemctl" and "systemd-cgtop" and similar tools. You should see normal logs about this service being started now, and you can do resource management on it. > 2. parallel compression of core dumps > > As well as consuming all of memory, we also consume all cpus. Once we dump > core we may as well use those cores for compressing the huge dump. We get the stuff via a pipe from the kernel. I am not sure whether gz or lz4 can distribute work on multiple CPUs if the data is flowing in strictly sequentially and there's no random access to the input data. But if the compressors support that then we should definitely make use of it! > 3. watchdog during startup > > Sometimes we need to perform expensive operations during startup (log > replay, rebuild from network replica) before we can start serving. Rather > than configure a huge start timeout, I'd prefer to have the service report > progress to systemd so that it knows that startup is still in > progress. Interesting. How would you suggest this precisely looks like? I mean, you say "report progress", does this mean you want a textual string like "STATUS=" in sd_notify() – which you already have really? Or do you mean behaviour like the existing "WATCHDOG=1" logic, i.e. that start-up is aborted if the keep-alive messages are missing? I think adding a WatchdogMode= setting that allows optional configuration to require regular WATCHDOG=1 notifications even in the start and stop phase of a service certainly makes sense, if that's what you are asking for. > Hope this is useful, Yes, it is! Thanks! Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel