On Mon, Apr 15, 2019 at 10:22 PM Bruce Momjian <br...@momjian.us> wrote: > > > I am thinking tools could retain modblock files along with WAL, could > > > pull full-page-writes from WAL, or from PGDATA. It avoids the need to > > > scan 16MB WAL files, and the WAL files and modblock files could be > > > expired independently. > > > > That is pretty much exactly what I was intending to propose. > > OK, good. Some of your wording was vague so I was unclear exactly what > you were suggesting.
Well, I guess the part that isn't like what I was suggesting is the idea that there should be exactly one modified block file per segment. The biggest problem with that idea is that a single WAL record can be split across two segments (or, in pathological cases, perhaps more). I think it makes sense to talk about the blocks modified by WAL between LSN A and LSN B, but it doesn't make much sense to talk about the block modified by the WAL in segment XYZ. You can make it kinda make sense by saying "the blocks modified by records *beginning in* segment XYZ" or alternatively "the blocks modified by records *ending in* segment XYZ", but that seems confusing to me. For example, suppose you decide on the first one -- 000000010000000100000068.modblock will contain all blocks modified by records that begin in 000000010000000100000068. Well, that means that to generate the 000000010000000100000068.modblock, you will need access to 000000010000000100000068 AND probably also 000000010000000100000069 and in rare cases perhaps 00000001000000010000006A or even later files. I think that's actually pretty confusing. It seems better to me to give the files names like ${TLI}.${STARTLSN}.${ENDLSN}.modblock, e.g. 00000001.0000000168000058.00000001687DBBB8.modblock, so that you can see exactly which *records* are covered by that segment. And I suspect it may also be a good idea to bunch up the records from several WAL files. Especially if you are using 16MB WAL files, collecting all of the block references from a single WAL file is going to produce a very small file. I suspect that the modified block files will end up being 100x smaller than the WAL itself, perhaps more, and I don't think anybody will appreciate us adding another PostgreSQL systems that spews out huge numbers of tiny little files. If, for example, somebody's got a big cluster that is churning out a WAL segment every second, they would probably still be happy to have a new modified block file only, say, every 10 seconds. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company