On 02/17/2016 03:56 PM, Zbigniew Jędrzejewski-Szmek wrote:
On Wed, Feb 17, 2016 at 02:35:55PM +0200, Avi Kivity wrote:
We are using systemd to supervise our NoSQL database and are
generally happy.

A few things will help even more:

1. log core dumps immediately rather than after the dump completes

A database will often consume all memory on the machine; dumping
120GB can take a lot of time, especially if compression is enabled.
As the situation is now, there is a period of time where it is
impossible to know what is happening.

(I saw that 229 improves core dumps, but did not see this specifically)
The coredump is logged afterwards because that's the only way to
include all information (including the compressed file name) in one
log message.

Maybe we can log two messages if we can detect that the core is very large or if we detect that it will take more that a couple of seconds to store it.

  But there are two changes which might mitigate the problem:
- semi-recently we switched to lz4, which compresses significantly faster,
   have you tried that?

I think I haven't yet, but consider that memory sizes are growing rapidly (e.g. byte-addressable non-volatile memory), core counts are large; I don't think improvements in compression can catch up to this.


- recently the responsibility of writing core dumps was split out to
   a service. I'm not sure how that influences the time when the log
   message is written.

I'll try it out; may take some time because I don't want to upgrade my large machines for F24 yet.

btw I hope that with this change the service is only restarted after the dump is complete, or oom is likely.


2. parallel compression of core dumps

As well as consuming all of memory, we also consume all cpus.  Once
we dump core we may as well use those cores for compressing the huge
dump.
This should be implemented in the compression library. The compressor
does not seem to be threaded, but it was we would try to make use of it.
OTOH, single-threaded lz4 is able to produce ~500MB/s of compressed
output, so you'd need a really fast disk to go above that.

I happen to have a really fast disk, reaching 4X that, and this is common for database users.




3. watchdog during startup

Sometimes we need to perform expensive operations during startup
(log replay, rebuild from network replica) before we can start
serving. Rather than configure a huge start timeout, I'd prefer to
have the service report progress to systemd so that it knows that
startup is still in progress.
Zbyszek

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to