Responding to several points below. Tom Lane wrote: > David's report isn't too clear: did the syslogger process actually run > out of memory and exit of its own volition after an ENOMEM, or did it get > killed by the dreaded OOM killer? In either case, it's unclear whether > it was really using an excessive amount of memory. We have not heard > reports suggesting a memory leak in the syslogger, but maybe there is > one under unusual circumstances?
I'm running on illumos, where there is no OOM killer. I expect what happened is that the syslogger process attempted to allocate memory, failed because the system was low, and explicitly exited. That's consistent with an exited process, no core file generated, and the "FATAL" "out of memory" entries in the log I posted. Of course, that doesn't mean the syslogger was using much memory. It means only that it attempted to allocate any memory in a context where it could not handle failing. Andres Freund wrote: > I'm somewhat inclined to say that one has to live with this if the > system is so resource constrainted that processes barely using memory > get killed. Again, there's no OOM killer here, so the process has the opportunity to respond differently. I can understand the database not functioning well when the system is out of memory, but this half-functioning, half-deadlocked state is quite disruptive. By comparison, while not ideal, if the syslogger had raised SIGABRT instead, that would have been significantly better for my use-case. Our process manager would have terminated all of the processes associated with the database and restarted it. Tom Lane wrote: > Hm, so that's another angle David didn't report on: is it possible that > his workload could have resulted in a very large volume of incomplete > in-progress log messages? Yes. I mentioned in my Nov 6 mail that large log messages over a short period appear to have been a major contributing factor: > Here are all of the entries in the PostgreSQL log from 23:19:12 until the top of > the next hour: > https://gist.githubusercontent.com/davepacheco/c5541bb464532075f2da761dd990a457/raw/2ba242055aca2fb374e9118045a830d08c590e0a/gistfile1.txt ... > There are also many log entries for some very long SQL queries. I'm sure that > contributed to this problem by filling up the pipe. I was able to extract the > contents of the pipe while the system was hung, and it was more of these giant > query strings. Andres Freund wrote: > Note that there's plenty of cases where you could run into this even > without being unable to fork new processes. You'd e.g. could also run > into this while logging the exit of some other subprocess or such, > there's enough ereports in postmaster. This is a critical point. As far as I can tell, all that's necessary for this deadlock to occur is: - the syslogger is unable to make forward progress (e.g., because it segfaulted) - enough other processes write to the pipe that it fills up before the postmaster can restart the syslogger - the postmaster attempts to write to the log as part of any of its signal handling operations It seems to me that the same thing can happen if there were a bug in the syslogger that resulted in a segfault around the same time that the postmaster attempted to start an autovacuum worker, for example (if there was also reasonably high log traffic). Thanks, Dave