On Wed, Nov 9, 2022 at 2:02 PM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > I don't think walsenders fetching segment from archive is totally > stupid. With that feature, we can use fast and expensive but small > storage for pg_wal, while avoiding replciation from dying even in > emergency.
It seems like a useful feature to have at least as an option and it saves a lot of work - failovers, expensive rebuilds of standbys/subscribers, manual interventions etc. If you're saying that even the walsedners serving logical replication subscribers would go fetch from the archive location for the removed WAL files, it mandates enabling archiving on the subscribers. And we know that the archiving is not cheap and has its own advantages and disadvantages, so the feature may or may not help. If you're saying that only the walsedners serving streaming replication standbys would go fetch from the archive location for the removed WAL files, it's easy to implement, however it is not a complete feature and doesn't solve the problem for logical replication. With the feature, it'll be something like 'you, as primary/publisher, archive the WAL files and when you don't have them, you'll restore them', it may not sound elegant, however, it can solve the lost replication slots problem. And, the cost of restoring WAL files from the archive might further slow down the replication thus increasing the replication lag. And, one need to think, how many such WAL files are restored and kept, whether they'll be kept in pg_wal or some other directory, how will the disk full, fetching too old or too many WAL files for replication slots lagging behind, removal of unnecessary WAL files etc. be handled. I'm not sure about other implications at this point of time. Perhaps, implementing this feature as a core/external extension by introducing segment_open() or other necessary hooks might be worth it. If implemented in some way, I think the scope of replication slot invalidation/max_slot_wal_keep_size feature gets reduced or it can be removed completely, no? > However, supposing that WalSndSegmentOpen() fetches segments from > archive as the fallback and that succeeds, the slot can survive > missing WAL in pg_wal in the first place. So this patch doesn't seem > to be needed for the purpose. That is a simple solution one can think of and provide for streaming replication standbys, however, is it worth implementing it in the core as explained above? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com