Re: Synchronizing slots from primary to standby

2022-01-21 Thread Hsu, John
   > I might be missing something but isn’t it okay even if the new primary
   > server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
   > restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
   > LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
   > transactions that were committed after the remote_lsn. So the
   > subscriber can resume logical replication with the new primary without
   > any data loss.

Maybe I'm misreading, but I thought the purpose of this to make 
sure that the logical subscriber does not have data that has not been
replicated to the new primary. The use-case I can think of would be
if synchronous_commit were enabled and fail-over occurs. If
we didn't have this set, isn't it possible that this logical subscriber
has extra commits that aren't present on the newly promoted primary?

And sorry I accidentally started a new thread in my last reply. 
Re-pasting some of my previous questions/comments:

wait_for_standby_confirmation does not update standby_slot_names once it's
in a loop and can't be fixed with SIGHUP. Similarly, synchronize_slot_names 
isn't updated once the worker is launched.

If a logical slot was dropped on the writer, should the worker drop logical 
slots that it was previously synchronizing but are no longer present? Or 
should we leave that to the user to manage? I'm trying to think why users 
would want to sync logical slots to a reader but not have that be dropped 
as well if it's no longer present.

Is there a reason we're deciding to use one-worker syncing per database 
instead of one general worker that syncs across all the databases? 
I imagine I'm missing something obvious here. 

As for how standby_slot_names should be configured, I'd prefer the 
flexibility similar to what we have for synchronus_standby_names since 
that seems the most analogous. It'd provide flexibility for failovers, 
which I imagine is the most common use-case.

On 1/20/22, 9:34 PM, "Masahiko Sawada"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut
 wrote:
>
> On 31.10.21 11:08, Peter Eisentraut wrote:
> > I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> > rebased it, added a bit of testing.  It basically works, but as
> > mentioned in [0], there are various issues to work out.
> >
> > The idea is that the standby runs a background worker to periodically
> > fetch replication slot information from the primary.  On failover, a
> > logical subscriber would then ideally find up-to-date replication slots
> > on the new publisher and can just continue normally.
>
> > So, again, this isn't anywhere near ready, but there is already a lot
> > here to gather feedback about how it works, how it should work, how to
> > configure it, and how it fits into an overall replication and HA
> > architecture.
>
> The second,
> standby_slot_names, is set on the primary.  It holds back logical
> replication until the listed physical standbys have caught up.  That
> way, when failover is necessary, the promoted standby is not behind the
> logical replication consumers.

I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.

The new primary should not be ahead of the subscribers because it
forwards the logical replication start LSN to the slot’s
confirm_flush_lsn in this case. But it cannot happen since the remote
LSN of the subscriber’s origin is always updated first, then the
confirm_flush_lsn of the slot on the primary is updated, and then the
confirm_flush_lsn of the slot on the standby is synchronized.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/





Synchronizing slots from primary to standby

2022-01-18 Thread Hsu, John
Hello,

The doc mentions that standby_slot_names is to only be used for waiting on 
physical slots. However, it seems like we calculate the flush_pos for logical 
slots as well in wait_for_standby_confirmation?

Re-posting some of my previous comment since it seems like it got lost...

In wait_for_standby_confirmation, should we move standby_slot_names -> namelist 
into the while (true) block so if we have wrong values set we fix it with a 
SIGHUP? Similarly, in slotsync.c, we never update standby_slot_names once the 
worker is launched. 

If a logical slot was dropped on the writer, should the worker drop logical 
slots that it was previously synchronizing but are no longer present? Or should 
we leave that to the user to manage? I'm trying to think why users would want 
to sync logical slots to a reader but not have that be dropped if we're able to 
detect they're no longer present on the writer. Maybe if there was a use-case 
to set standby_slot_names one-at-a-time, you wouldn't want other logical slots 
to be dropped, but dropping sounds like reasonable behavior for '*'?

Is there a reason we're deciding to use one-worker syncing per database instead 
of one general worker that syncs across all the databases? I imagine I'm 
missing something obvious here.

As for how standby_slot_names should be configured, I'd prefer the flexibility 
similar to what we have for synchronus_standby_names since that seems the most 
analogous. It'd provide flexibility for failovers, which I imagine is the most 
common use-case. 

John H

On 1/3/22, 5:49 AM, "Peter Eisentraut"  
wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Here is an updated patch to fix some build failures.  No feature changes.

On 14.12.21 23:12, Peter Eisentraut wrote:
> On 31.10.21 11:08, Peter Eisentraut wrote:
>> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
>> rebased it, added a bit of testing.  It basically works, but as
>> mentioned in [0], there are various issues to work out.
>>
>> The idea is that the standby runs a background worker to periodically
>> fetch replication slot information from the primary.  On failover, a
>> logical subscriber would then ideally find up-to-date replication
>> slots on the new publisher and can just continue normally.
>
>> So, again, this isn't anywhere near ready, but there is already a lot
>> here to gather feedback about how it works, how it should work, how to
>> configure it, and how it fits into an overall replication and HA
>> architecture.
>
> Here is an updated patch.  The main changes are that I added two
> configuration parameters.  The first, synchronize_slot_names, is set on
> the physical standby to specify which slots to sync from the primary. By
> default, it is empty.  (This also fixes the recovery test failures that
> I had to disable in the previous patch version.)  The second,
> standby_slot_names, is set on the primary.  It holds back logical
> replication until the listed physical standbys have caught up.  That
> way, when failover is necessary, the promoted standby is not behind the
> logical replication consumers.
>
> In principle, this works now, I think.  I haven't made much progress in
> creating more test cases for this; that's something that needs more
> attention.
>
> It's worth pondering what the configuration language for
> standby_slot_names should be.  Right now, it's just a list of slots that
> all need to be caught up.  More complicated setups are conceivable.
> Maybe you have standbys S1 and S2 that are potential failover targets
> for logical replication consumers L1 and L2, and also standbys S3 and S4
> that are potential failover targets for logical replication consumers L3
> and L4.  Viewed like that, this setting could be a replication slot
> setting.  The setting might also have some relationship with
> synchronous_standby_names.  Like, if you have synchronous_standby_names
> set, then that's a pretty good indication that you also want some or all
> of those standbys in standby_slot_names.  (But note that one is slots
> and one is application names.)  So there are a variety of possibilities.



Re: Synchronizing slots from primary to standby

2021-12-15 Thread Hsu, John
Hello,

I started taking a brief look at the v2 patch, and it does appear to work for 
the basic case. Logical slot is synchronized across and I can connect to the 
promoted standby and stream changes afterwards.

It's not clear to me what the correct behavior is when a logical slot that has 
been synced to the replica and then it gets deleted on the writer. Would we 
expect this to be propagated or leave it up to the end-user to manage?

> +   rawname = pstrdup(standby_slot_names);
> +   SplitIdentifierString(rawname, ',', );
> +
> +   while (true)
> +   {
> +   int wait_slots_remaining;
> +   XLogRecPtr  oldest_flush_pos = InvalidXLogRecPtr;
> +   int rc;
> +
> +   wait_slots_remaining = list_length(namelist);
> +
> +   LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +   for (int i = 0; i < max_replication_slots; i++)
> +   {

Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the 
value. If we have a wrong entry in there, the backend becomes stuck until we 
re-establish the logical connection. Adding "postmaster/interrupt.h" with 
ConfigReloadPending / ProcessConfigFile does seem to work.

Another thing I noticed is that once it starts waiting in this block, Ctrl+C 
doesn't seem to terminate the backend?

pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f -
..
^Cpg_recvlogical: error: unexpected termination of replication stream: 

The logical backend connection is still present:

ps aux | grep 51263
   hsuchen 51263 80.7  0.0 320180 14304 ?Rs   01:11   3:04 postgres: 
walsender hsuchen [local] START_REPLICATION

pstack 51263
#0  0x7ffee99e79a5 in clock_gettime ()
#1  0x7f8705e88246 in clock_gettime () from /lib64/libc.so.6
#2  0x0075f141 in WaitEventSetWait ()
#3  0x0075f565 in WaitLatch ()
#4  0x00720aea in ReorderBufferProcessTXN ()
#5  0x007142a6 in DecodeXactOp ()
#6  0x0071460f in LogicalDecodingProcessRecord ()

It can be terminated with a pg_terminate_backend though.

If we have a physical slot with name foo on the standby, and then a logical 
slot is created on the writer with the same slot_name it does error out on the 
replica although it prevents other slots from being synchronized which is 
probably fine.

2021-12-16 02:10:29.709 UTC [73788] LOG:  replication slot synchronization 
worker for database "postgres" has started
2021-12-16 02:10:29.713 UTC [73788] ERROR:  cannot use physical replication 
slot for logical decoding
2021-12-16 02:10:29.714 UTC [73037] DEBUG:  unregistering background worker 
"replication slot synchronization worker"

On 12/14/21, 2:26 PM, "Peter Eisentraut"  
wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On 28.11.21 07:52, Bharath Rupireddy wrote:
> 1) Instead of a new LIST_SLOT command, can't we use
> READ_REPLICATION_SLOT (slight modifications needs to be done to make
> it support logical replication slots and to get more information from
> the subscriber).

I looked at that but didn't see an obvious way to consolidate them.
This is something we could look at again later.

> 2) How frequently the new bg worker is going to sync the slot info?
> How can it ensure that the latest information exists say when the
> subscriber is down/crashed before it picks up the latest slot
> information?

The interval is currently hardcoded, but could be a configuration
setting.  In the v2 patch, there is a new setting that orders physical
replication before logical so that the logical subscribers cannot get
ahead of the physical standby.

> 3) Instead of the subscriber pulling the slot info, why can't the
> publisher (via the walsender or a new bg worker maybe?) push the
> latest slot info? I'm not sure we want to add more functionality to
> the walsender, if yes, isn't it going to be much simpler?

This sounds like the failover slot feature, which was rejected.





Re: Physical slot restart_lsn advances incorrectly requiring restore from archive

2020-07-16 Thread Hsu, John
Hi Horiguchi-san,

I'll take a look at that thread and see if I can reproduce with the attached 
patch.
It seems like it would directly address this issue. Thanks for taking a look. 

Cheers,
John H

On Thu, Jul 16, 2020 at 11:00 AM Kyotaro Horiguchi  
wrote:
Hello, John.

At Fri, 10 Jul 2020 20:44:30 +, "Hsu, John"  wrote in 
> Hi hackers,
>  
> We believe we’re seeing a problem with how physical slot’s restart_lsn is 
> advanced leading to the replicas needing to restore from archive in order for 
> replication to resume. 
> The logs below are from reproductions against 10.13. I’m still working on 
> reproducing it for 12.3.
>  
> WAL write spans two WAL segments . 
> Write to first WAL segment is complete but not to the second segment. 
> Write to first WAL segment is acknowledged as flushed from the Postgres 
> replica.
> Primary restarts before the write to second segment completes. It also means 
> the complete WAL was never written. 
> Crash recovery finishes at a record before the incomplete WAL write. 
> Though now replica start the slot at the next WAL segment, since the previous 
> WAL was already acknowledged as flushed.
...
> Redo finishes at 0/2BB0 even though the flush we received from
> the replica is already at 0/2C00.
> This is problematic because the replica reconnects to the slot
> telling it to start past the new redo point.

Yeah, that is a problem not only related to restart_lsn. The same
cause leads to aother issue of inconsistent archive as discussed in
[1].

1: 
https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com

> The attached patch (against 10) attempts to address this by keeping
> track of the first flushLsn in the current segNo, and wait until we
> receive one after that before updating. This prevents the WAL from
> rotating out of the primary and a reboot from the replica will fix
> it instead of needing to restore from archive.

On the other hand we can and should advance restart_lsn when we know
that the last record is complete. I think a patch in the thread [2]
would fix your issue. With the patch primary doesn't send a
continuation record at the end of a segment until the whole record is
flushed into WAL file.

2: 
https://www.postgresql.org/message-id/20200625.153532.379700510444980240.horikyota.ntt%40gmail.com


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center