Andres Freund <and...@2ndquadrant.com> writes:
> On 2014-01-21 18:59:13 -0500, Tom Lane wrote:
>> Another thing to think about is whether we couldn't put a hard limit on
>> WAL record size somehow.  Multi-megabyte WAL records are an abuse of the
>> design anyway, when you get right down to it.  So for example maybe we
>> could split up commit records, with most of the bulky information dumped
>> into separate records that appear before the "real commit".  This would
>> complicate replay --- in particular, if we abort the transaction after
>> writing a few such records, how does the replayer realize that it can
>> forget about those records?  But that sounds probably surmountable.

> I think removing the list of subtransactions from commit records would
> essentially require not truncating pg_subtrans after a restart
> anymore.

I'm not suggesting that we stop providing that information!  I'm just
saying that we perhaps don't need to store it all in one WAL record,
if instead we put the onus on WAL replay to be able to reconstruct what
it needs from a series of WAL records.

> We could relatively easily split of logging the dropped files from
> commit records and log them in groups afterwards, we already have
> several races allowing to leak files.

I was thinking the other way around: emit the subsidiary records before the
atomic commit or abort record, indeed before we've actually committed.
Part of the point is to reduce the risk that lack of WAL space would
prevent us from fully committing.  Also, writing those records afterwards
increases the risk of a post-commit failure, which is a bad thing.

Replay would then involve either accumulating the subsidiary records in
memory, or being willing to go back and re-read them when the real commit
or abort record is seen.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to