On 6 June 2013 16:00, Heikki Linnakangas <hlinnakan...@vmware.com> wrote:
> In the "Redesigning checkpoint_segments" thread, many people opined that
> there should be a hard limit on the amount of disk space used for WAL:
> http://www.postgresql.org/message-id/CA+TgmoaOkgZb5YsmQeMg8ZVqWMtR=6s4-ppd+6jiy4oq78i...@mail.gmail.com.
> I'm starting a new thread on that, because that's mostly orthogonal to
> redesigning checkpoint_segments.
> The current situation is that if you run out of disk space while writing
> WAL, you get a PANIC, and the server shuts down. That's awful. We can try to
> avoid that by checkpointing early enough, so that we can remove old WAL
> segments to make room for new ones before you run out, but unless we somehow
> throttle or stop new WAL insertions, it's always going to be possible to use
> up all disk space. A typical scenario where that happens is when
> archive_command fails for some reason; even a checkpoint can't remove old,
> unarchived segments in that case. But it can happen even without WAL
> archiving.

I don't see we need to prevent WAL insertions when the disk fills. We
still have the whole of wal_buffers to use up. When that is full, we
will prevent further WAL insertions because we will be holding the
WALwritelock to clear more space. So the rest of the system will lock
up nicely, like we want, apart from read-only transactions.

Instead of PANICing, we should simply signal the checkpointer to
perform a shutdown checkpoint. That normally requires a WAL insertion
to complete, but it seems easy enough to make that happen by simply
rewriting the control file, after which ALL WAL files are superfluous
for crash recovery and can be deleted. Once that checkpoint is
complete, we can begin deleting WAL files that are archived/replicated
and continue as normal. The previously failing WAL write can now be
made again and may succeed this time - if it does, we continue, if not
- now we PANIC.

Note that this would not require in-progress transactions to be
aborted. They can continue normally once wal_buffers re-opens.

We don't really want anything too drastic, because if this situation
happens once it may happen many times - I'm imagining a flaky network
etc.. So we want the situation to recover quickly and easily, without
too many consequences.

The above appears to be very minimal change from existing code and
doesn't introduce lots of new points of breakage.

> I've seen a case, where it was even worse than a PANIC and shutdown. pg_xlog
> was on a separate partition that had nothing else on it. The partition
> filled up, and the system shut down with a PANIC. Because there was no space
> left, it could not even write the checkpoint after recovery, and thus
> refused to start up again. There was nothing else on the partition that you
> could delete to make space. The only recourse would've been to add more disk
> space to the partition (impossible), or manually delete an old WAL file that
> was not needed to recover from the latest checkpoint (scary). Fortunately
> this was a test system, so we just deleted everything.

Doing shutdown checkpoints via the control file would exactly solve
that issue. We already depend upon the readability of the control file
anyway, so this changes nothing. (And if you regard it does, then we
can have multiple control files, or at least a backup control file at

We can make the shutdown checkpoint happen always at EOF of a WAL
segment, so at shutdown we don't need any WAL files to remain at all.

> So we need to somehow stop new WAL insertions from happening, before it's
> too late.

I don't think we do.

What might be sensible is to have checkpoints speed up as WAL volume
approaches a predefined limit, so that we minimise the delay caused
when wal_buffers locks up.

Not suggesting anything here for 9.4, since we're midCF.

 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to