I also am realizing belatedly that my solution of dropping the subscriber
probably won't work anyway, since I'd lose the changes on the publisher for
the duration of the upgrade. Maybe I could drop the subscription while
keeping the slot on the publisher, and then create a new subscription after
the upgrade using that slot and copy_data=False? Getting wonky.

On Fri, May 19, 2023 at 8:17 AM Mike Lissner <mliss...@michaeljaylissner.com>
wrote:

> Hi all,
>
> In AWS RDS, we are using logical replication between a postgresql 14
> publisher and a postgresql 10 subscriber. The subscriber is rather old, so
> yesterday I tried to update it using AWS's built in upgrade tool (which
> uses pg_upgrade behind the scenes).
>
> I did a pretty thorough test run before beginning, but the live run went
> pretty poorly. My process was:
>
> 1. Disable the subscription to pg10.
> 2. Run RDS's upgrade (which runs pg_upgrade).
> 3. Re-Enable the subscription to the newly upgraded server.
>
> The idea was that the publisher could still be live and collect changes,
> and then on step 3, those changes would flush to the newly upgraded server.
>
> When I hit step three, things went awry. From what I can tell, it seems
> like pg_upgrade might have wiped out the LSN location of the subscriber,
> because I was getting many messages in the logs saying:
>
> 2023-05-19 01:01:09 
> UTC:100.20.224.120(56536):django@courtlistener:[29669]:STATEMENT: 
> CREATE_REPLICATION_SLOT "pg_18278_sync_86449755_7234675743763347169" LOGICAL 
> pgoutput USE_SNAPSHOT2023-05-19 01:01:09 
> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: replication 
> slot "pg_18278_sync_16561_7234675743763347169" does not exist2023-05-19 
> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:STATEMENT: 
> DROP_REPLICATION_SLOT pg_18278_sync_16561_7234675743763347169 WAIT2023-05-19 
> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: all 
> replication slots are in use2023-05-19 01:01:09 
> UTC:100.20.224.120(56550):django@courtlistener:[29670]:HINT: Free one or 
> increase max_replication_slots.
>
> I followed those instructions, and upped max_replication_slots to 200.
> That fixed that error, but then I had errors about COPY commands failing,
> and looking in the publisher I saw about 150 slots like:
>
> select * from pg_replication_slots ;
>                  slot_name                  |  plugin  | slot_type | datoid | 
>   database    | temporary | active | active_pid | xmin | catalog_xmin | 
> restart_lsn  | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase 
> --------------------------------------------+----------+-----------+--------+---------------+-----------+--------+------------+------+--------------+--------------+---------------------+------------+---------------+-----------
>  pg_18278_sync_86449408_7234675743763347169 | pgoutput | logical   |  16428 | 
> courtlistener | f         | t      |       6906 |      |    859962500 | 
> EA5/954A9F18 |                     | reserved   |               | f
>  pg_18278_sync_20492279_7234675743763347169 | pgoutput | logical   |  16428 | 
> courtlistener | f         | f      |            |      |    859962448 | 
> EA5/9548EDF0 | EA5/9548EE28        | reserved   |               | f
>  pg_18278_sync_16940_7234675743763347169    | pgoutput | logical   |  16428 | 
> courtlistener | f         | f      |            |      |    859962448 | 
> EA5/9548EE60 | EA5/9548EE98        | reserved   |               | f
>
>
> So this looks like it's trying to sync all of the existing tables all over
> again when I re-enabled the subscription.
>
> Does that make sense? In the future, I'll DROP the subscription and then
> create a new one with copy_data=False, but this was a real gotcha.
>
> Anybody know what's going on here?
>
> Thanks,
>
> Mike
>

Reply via email to