Re: Include sequence relation support in logical replication

Cary Huang Thu, 26 Mar 2020 15:34:21 -0700

Hi Andres



thanks for your reply and your patch review. Please see my comments below



>On 2020-03-24 16:19:21 -0700, Cary Huang wrote: 

>> I have shared a patch that allows sequence relation to be supported in 

>> logical replication via the decoding plugin ( test_decoding for 

>> example ); it does not support sequence relation in logical 

>> replication between a PG publisher and a PG subscriber via pgoutput 

>> plugin as it will require much more work. 

>

> Could you expand on "much more work"? Once decoding support is there, 

> that shouldn't be that much? 



By much more work, I meant more source files will need to be changed to have 
sequence replication 

supported between a PG subscriber and publisher using pgoutput plugin. About 10 
more source file changes.

Idea is similar though.




>> Sequence changes caused by other sequence-related SQL functions like 

>> setval() or ALTER SEQUENCE xxx, will always emit a WAL update, so 

>> replicating changes caused by these should not be a problem. 

>

> I think this really would need to handle at the very least setval to 

> make sense. 



yes, sure



>> For the replication to make sense, the patch actually disables the WAL 

>> update at every 32 nextval() calls, so every call to nextval() will 

>> emit a WAL update for proper replication. This is done by setting 

>> SEQ_LOG_VALS to 0 in sequence.c 

>

> Why is that needed? ISTM updating in increments of 32 is fine for 

> replication purposes? It's good imo, because sending out more granular 

> increments would increase the size of the WAL stream? 



yes, updating WAL at every 32 increment is good and have huge performance 
benefits according 

to Michael, but when it is replicated logically to subscribers, the sequence 
value they receive would not 

make much sense to them.

For example, 



If i have a Sequence called "seq" with current value = 100 and increment = 5. 
The nextval('seq') call will

return 105 to the client but it will write 260 to WAL record ( 100 + (5*32) ), 
because that is the value after 32

increments and internally it is also maintaining a "log_cnt" counter that 
tracks how many nextval() calls have been invoked

since the last WAL write, so it could kind of derive backwards to find the 
proper next value to return to client. 



But the subscriber for this sequence will receive a change of 260 instead of 
105, and it does not represent the current

sequence status. Subscriber is not able to derive backwards because it does not 
know the increment size in its schema.



Setting SEQ_LOG_VALS to 0 in sequence.c basically disables this 32 increment 
behavior and makes WAL update at every nextval() call

and therefore the subscriber to this sequence will receive the same value (105) 
as the client, as a cost of more WAL writes.



I would like to ask if you have some suggestions or ideas that can make 
subscriber receives the current value without the need to

disabling the 32 increment behavior?



>> diff --git a/contrib/test_decoding/test_decoding.c 
>> b/contrib/test_decoding/test_decoding.c 

>> index 93c948856e..7a7e572d6c 100644 

>> --- a/contrib/test_decoding/test_decoding.c 

>> +++ b/contrib/test_decoding/test_decoding.c 

>> @@ -466,6 +466,15 @@ pg_decode_change(LogicalDecodingContext *ctx, 
>> ReorderBufferTXN *txn, 

>>                                      &change->data.tp.oldtuple->tuple, 

>>                                      true); 

>>              break; 

>> +        case REORDER_BUFFER_CHANGE_SEQUENCE: 

>> +                    appendStringInfoString(ctx->out, " SEQUENCE:"); 

>> +                    if (change->data.sequence.newtuple == NULL) 

>> +                        appendStringInfoString(ctx->out, " 
>> (no-tuple-data)"); 

>> +                    else 

>> +                        tuple_to_stringinfo(ctx->out, tupdesc, 

>> +                                            
>> &change->data.sequence.newtuple->tuple, 

>> +                                            false); 

>> +                    break; 

>>          default: 

>>              Assert(false); 

>>      } 

>

> You should also add tests - the main purpose of contrib/test_decoding is 

> to facilitate writing those... 



thanks, I will add




>> +    ReorderBufferXidSetCatalogChanges(ctx->reorder, 
>> XLogRecGetXid(buf->record), buf->origptr); 

>

> Huh, why are you doing this? That's going to increase overhead of logical 

> decoding by many times? 



This is needed to allow snapshot to be built inside DecodeCommit() function. 
Regular changes caused by INSERT also has this 

function called so I assume it is needed to ensure proper decoding. Without 
this, a snapshot will not be built and the change 

transaction will not be logged




>> +                case REORDER_BUFFER_CHANGE_SEQUENCE: 

>> +                    Assert(snapshot_now); 

>> + 

>> +                    reloid = 
>> RelidByRelfilenode(change->data.sequence.relnode.spcNode, 

>> +                                                
>> change->data.sequence.relnode.relNode); 

>> + 

>> +                    if (reloid == InvalidOid && 

>> +                        change->data.sequence.newtuple == NULL) 

>> +                        goto change_done; 

>

> I don't think this path should be needed? There's afaict no valid ways 

> we should be able to end up here without a tuple? 



yeah you are right, I can remove the tuple check



>> +                    if (!RelationIsLogicallyLogged(relation)) 

>> +                        goto change_done; 

>

> Similarly, this seems superflous and should perhaps be an assertion? 



I think it should be ok to check a relation like this, because it also will 
check the persistence of the relation and whether

wal_level is set to 'logical'. It is commonly used in the regular INSERT cases 
so I thought it would make sense to use it

for sequence.



>> +                    /* user-triggered change */ 

>> +                    if (!IsToastRelation(relation)) 

>> +                    { 

>> +                        ReorderBufferToastReplace(rb, txn, relation, 
>> change); 

>> +                        rb->apply_change(rb, txn, relation, change); 

>> +                    } 

>> +                    break; 

>>              } 

>>          } 

>> 

>

> This doesn't make sense either. 



agreed, it should not be here.



>>  /* forward declaration */ 

>> @@ -149,6 +150,15 @@ typedef struct ReorderBufferChange 

>>              CommandId    cmax; 

>>              CommandId    combocid; 

>>          }            tuplecid; 

>> +        /* 

>> +         * Truncate data for REORDER_BUFFER_CHANGE_SEQUENCE representing 
>> one 

>> +         * set of relations to be truncated. 

>> +         */ 

>

> What? 



Will fix the comment




>> +        struct 

>> +        { 

>> +            RelFileNode relnode; 

>> +            ReorderBufferTupleBuf *newtuple; 

>> +        }            sequence; 

>>      }            data; 

>> 

>>      /* 

>

> I don't think we should expose sequence changes via their tuples - 

> that'll unnecessarily expose a lot of implementation details. 



Can you elaborate more on this? Sequence writes its tuple data in WAL and 
triggers a change

event to logical decoding logic. What else can I use to expose a sequence 
change?



Best




Cary Huang

-------------

HighGo Software Inc. (Canada)

mailto:[email protected]

http://www.highgo.ca








---- On Wed, 25 Mar 2020 12:27:28 -0700 Andres Freund <[email protected]> 
wrote ----


Hi, 
 
On 2020-03-24 16:19:21 -0700, Cary Huang wrote: 
> I have shared a patch that allows sequence relation to be supported in 
> logical replication via the decoding plugin ( test_decoding for 
> example ); it does not support sequence relation in logical 
> replication between a PG publisher and a PG subscriber via pgoutput 
> plugin as it will require much more work. 
 
Could you expand on "much more work"? Once decoding support is there, 
that shouldn't be that much? 
 
 
> Sequence changes caused by other sequence-related SQL functions like 
> setval() or ALTER SEQUENCE xxx, will always emit a WAL update, so 
> replicating changes caused by these should not be a problem. 
 
I think this really would need to handle at the very least setval to 
make sense. 
 
 
> For the replication to make sense, the patch actually disables the WAL 
> update at every 32 nextval() calls, so every call to nextval() will 
> emit a WAL update for proper replication. This is done by setting 
> SEQ_LOG_VALS to 0 in sequence.c 
 
Why is that needed? ISTM updating in increments of 32 is fine for 
replication purposes? It's good imo, because sending out more granular 
increments would increase the size of the WAL stream? 
 
 
 
> diff --git a/contrib/test_decoding/test_decoding.c 
> b/contrib/test_decoding/test_decoding.c 
> index 93c948856e..7a7e572d6c 100644 
> --- a/contrib/test_decoding/test_decoding.c 
> +++ b/contrib/test_decoding/test_decoding.c 
> @@ -466,6 +466,15 @@ pg_decode_change(LogicalDecodingContext *ctx, 
> ReorderBufferTXN *txn, 
>                                      &change->data.tp.oldtuple->tuple, 
>                                      true); 
>              break; 
> +        case REORDER_BUFFER_CHANGE_SEQUENCE: 
> +                    appendStringInfoString(ctx->out, " SEQUENCE:"); 
> +                    if (change->data.sequence.newtuple == NULL) 
> +                        appendStringInfoString(ctx->out, " 
> (no-tuple-data)"); 
> +                    else 
> +                        tuple_to_stringinfo(ctx->out, tupdesc, 
> +                                            
> &change->data.sequence.newtuple->tuple, 
> +                                            false); 
> +                    break; 
>          default: 
>              Assert(false); 
>      } 
 
You should also add tests - the main purpose of contrib/test_decoding is 
to facilitate writing those... 
 
 
> +    ReorderBufferXidSetCatalogChanges(ctx->reorder, 
> XLogRecGetXid(buf->record), buf->origptr); 
 
Huh, why are you doing this? That's going to increase overhead of logical 
decoding by many times? 
 
 
> +                case REORDER_BUFFER_CHANGE_SEQUENCE: 
> +                    Assert(snapshot_now); 
> + 
> +                    reloid = 
> RelidByRelfilenode(change->data.sequence.relnode.spcNode, 
> +                                                
> change->data.sequence.relnode.relNode); 
> + 
> +                    if (reloid == InvalidOid && 
> +                        change->data.sequence.newtuple == NULL) 
> +                        goto change_done; 
 
I don't think this path should be needed? There's afaict no valid ways 
we should be able to end up here without a tuple? 
 
 
> +                    if (!RelationIsLogicallyLogged(relation)) 
> +                        goto change_done; 
 
Similarly, this seems superflous and should perhaps be an assertion? 
 
> +                    /* user-triggered change */ 
> +                    if (!IsToastRelation(relation)) 
> +                    { 
> +                        ReorderBufferToastReplace(rb, txn, relation, 
> change); 
> +                        rb->apply_change(rb, txn, relation, change); 
> +                    } 
> +                    break; 
>              } 
>          } 
> 
 
This doesn't make sense either. 
 
 
 
> diff --git a/src/include/replication/reorderbuffer.h 
> b/src/include/replication/reorderbuffer.h 
> index 626ecf4dc9..cf3fd45c5f 100644 
> --- a/src/include/replication/reorderbuffer.h 
> +++ b/src/include/replication/reorderbuffer.h 
> @@ -62,7 +62,8 @@ enum ReorderBufferChangeType 
>      REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID, 
>      REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT, 
>      REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM, 
> -    REORDER_BUFFER_CHANGE_TRUNCATE 
> +    REORDER_BUFFER_CHANGE_TRUNCATE, 
> +    REORDER_BUFFER_CHANGE_SEQUENCE, 
>  }; 
> 
>  /* forward declaration */ 
> @@ -149,6 +150,15 @@ typedef struct ReorderBufferChange 
>              CommandId    cmax; 
>              CommandId    combocid; 
>          }            tuplecid; 
> +        /* 
> +         * Truncate data for REORDER_BUFFER_CHANGE_SEQUENCE representing one 
> +         * set of relations to be truncated. 
> +         */ 
 
What? 
 
> +        struct 
> +        { 
> +            RelFileNode relnode; 
> +            ReorderBufferTupleBuf *newtuple; 
> +        }            sequence; 
>      }            data; 
> 
>      /* 
 
I don't think we should expose sequence changes via their tuples - 
that'll unnecessarily expose a lot of implementation details. 
 
Greetings, 
 
Andres Freund

Re: Include sequence relation support in logical replication

Reply via email to