On Fri, Oct 25, 2013 at 7:57 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> However, I'm leery about the idea of using a relation fork for this. >> I'm not sure whether that's what you had it mind, but it gives me the >> willies. First, it adds distributed overhead to the system, as >> previously discussed; and second, I think the accounting may be kind >> of tricky, especially in the face of multiple rewrites. I'd be more >> inclined to find a separate place to store the mappings. Note that, >> AFAICS, there's no real need for the mapping file to be >> block-structured, and I believe they'll be written first (with no >> readers) and subsequently only read (with no further writes) and >> eventually deleted. > > I was thinking of storing it along other data used during logical > decoding and let decoding's cleanup clean up that data as well. All the > information for that should be there.
That seems OK. > There's one snag I currently can see, namely that we actually need to > prevent that a formerly dropped relfilenode is getting reused. Not > entirely sure what the best way for that is. I'm not sure in detail, but it seems to me that this all part of the same picture. If you're tracking changed relfilenodes, you'd better track dropped ones as well. Completely aside from this issue, what keeps a relation from being dropped before we've decoded all of the changes made to its data before the point at which it was dropped? (I hope the answer isn't "nothing".) >> One possible objection to this is that it would preclude decoding on a >> standby, which seems like a likely enough thing to want to do. So >> maybe it's best to WAL-log the changes to the mapping file so that the >> standby can reconstruct it if needed. > > The mapping file probably can be one big wal record, so it should be > easy enough to do. It might be better to batch it, because if you rewrite a big relation, and the record is really big, everyone else will be frozen out of inserting WAL for as long as that colossal record is being written and synced. If it's inserted in reasonably-sized chunks, the rest of the system won't be starved as badly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers