On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignes...@gmail.com> wrote: > Thanks for committing this patch, here is a rebased version of the > remaining patches. >
Hi Vignesh, thanks for the rebased patches. Here are my review comments for patch v1-0001. ====== Commit message. 1. The commit message text is stale, so needs some updates. For example, it is still saying "Generated column values are not currently replicated..." but that is not correct anymore since the recent push of the previous v46-0001 patch [1], which already implemented replication of generated columns when they are specified in a publication column list.. ====== doc/src/sgml/ddl.sgml 2. <para> - Generated columns are skipped for logical replication and cannot be - specified in a <command>CREATE PUBLICATION</command> column list. + Generated columns are allowed to be replicated during logical replication + according to the <command>CREATE PUBLICATION</command> option + <link linkend="sql-createpublication-params-with-publish-generated-columns"> + <literal>include_generated_columns</literal></link>. </para> This explanation is incomplete because generated columns can also be specified in a publication column list which has nothing to do with the new option. In fact, lack of mention about the column lists seems like an oversight which should have been in the previous patch [1]. I already posted another mail about this [2]. ====== doc/src/sgml/protocol.sgml 3. <para> - Next, one of the following submessages appears for each column: + Next, one of the following submessages appears for each column (except generated columns): Hmm. Now that generated column replication is supported is this change still required? ====== doc/src/sgml/ref/create_publication.sgml 4. + + <varlistentry id="sql-createpublication-params-with-publish-generated-columns"> + <term><literal>publish_generated_columns</literal> (<type>boolean</type>)</term> + <listitem> + <para> + Specifies whether the generated columns present in the tables + associated with the publication should be replicated. + The default is <literal>false</literal>. + </para> + </listitem> + </varlistentry> + I know that the subsequent DOCS patch V1-0002 will explain more about this, but as a stand-alone patch 0001 maybe you need to clarify that a publication column list will override this 'publish_generated_columns' parameter. ====== src/backend/catalog/pg_publication.c has_column_list_defined: 5. +/* + * Returns true if the relation has column list associated with the publication, + * false otherwise. + */ +bool +has_column_list_defined(Publication *pub, Oid relid) +{ + HeapTuple cftuple = NULL; + bool isnull = true; + + if (pub->alltables) + return false; + + cftuple = SearchSysCache2(PUBLICATIONRELMAP, + ObjectIdGetDatum(relid), + ObjectIdGetDatum(pub->oid)); + if (HeapTupleIsValid(cftuple)) + { + /* Lookup the column list attribute. */ + (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple, + Anum_pg_publication_rel_prattrs, + &isnull); + if (!isnull) + { + ReleaseSysCache(cftuple); + return true; + } + + ReleaseSysCache(cftuple); + } + + return false; +} + 5a. It might be tidier if you checked for !HeapTupleIsValid(cftuple) and do early return false, instead of needing an indented if-block. ~ 5b. The code can be rearranged and simplified -- you don't need multiple calls to ReleaseSysCache. SUGGESTION: /* Lookup the column list attribute. */ (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple, Anum_pg_publication_rel_prattrs, &isnull); ReleaseSysCache(cftuple); /* Was a column list found? */ return isnull ? false : true; ~~~ pub_getallcol_bitmapset: 6. +/* + * Return a column list bitmap for the specified table. + * + * Generated columns are included if pubgencols is true. + * + * If mcxt isn't NULL, build the bitmapset in that context. + */ +Bitmapset * +pub_getallcol_bitmapset(Relation relation, bool pubgencols, + MemoryContext mcxt) IIUC this is a BMS of the table columns to be published. The function comment seems confusing to me when it says "column list bitmap" because IIUC this function is not really anything to do with a publication "column list", which is an entirely different thing. ====== src/backend/replication/logical/proto.c 7. static void logicalrep_write_attrs(StringInfo out, Relation rel, - Bitmapset *columns); + Bitmapset *columns, bool pubgencols); static void logicalrep_write_tuple(StringInfo out, Relation rel, TupleTableSlot *slot, - bool binary, Bitmapset *columns); + bool binary, Bitmapset *columns, + bool pubgencols); The meaning of all these new 'pubgencols' are ambiguous. e.g. Are they (a) The value of the CREATE PUBLICATION 'publish_generate_columns' parameter, or does it mean (b) Just some generated column is being published (maybe via column list or maybe not). I think it means (a) but, if true, that could be made much more clear by changing all of these names to 'pubgencols_option' or something similar. Actually, now I have doubts about that also -- I think this might be magically assigned to false if no generated columns exist in the table. Anyway, please do whatever you can to disambiguate this. ~~~ logicalrep_should_publish_column: 8. The function comment is stale. It is still only talking about generated columns in column lists. SUGGESTION Note that generated columns can be published only when present in a publication column list, or (if there is no column list), when the publication parameter 'publish_generated_columns' is true. ~~~ 9. bool logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns, bool pubgencols) { if (att->attisdropped) return false; /* * Skip publishing generated columns if they are not included in the * column list or if the option is not specified. */ if (!columns && !pubgencols && att->attgenerated) return false; /* * Check if a column is covered by a column list. */ if (columns && !bms_is_member(att->attnum, columns)) return false; return true; } Same as mentioned before in my previous v46-0001 review comments, I feel that the conditionals of this function are over-complicated and that there are more 'return' points than necessary. The alternative code below looks simpler to me. SUGGESTION bool logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns, bool pubgencols_option) { if (att->attisdropped) return false; if (columns) { /* * Has a column list: * Publish only cols named in that list. */ return bms_is_member(att->attnum, columns); } else { /* * Has no column list: * Publish generated cols only if 'publish_generated_cols' is true. * Publish all non-generated cols. */ return att->attgenerated ? pubgencols_option : true; } } ====== src/backend/replication/pgoutput/pgoutput.c 10. + /* Include publishing generated columns */ + bool pubgencols; + There is similar ambiguity here with this field-name as was mentioned about for other 'pbgencols' function params. I had initially thought that this this just caries around same value as the publication option 'publish_generated_columns' but now (after looking at function check_and_init_gencol) I think that might not be the case because I saw it can be assigned false (overriding the publication option?). Anyway, the comment needs to be made much clearer about what is the true meaning of this field. Or, rename it if there is a better name. ~~~ 11. +static void send_relation_and_attrs(Relation relation, TransactionId xid, + LogicalDecodingContext *ctx, + RelationSyncEntry *relentry); Was there some reason to move this static? Maybe it is better just to change the existing static in-place rather than moving code around at the same time. ~~~ send_relation_and_attrs: 12. - if (!logicalrep_should_publish_column(att, columns)) + if (!logicalrep_should_publish_column(att, columns, relentry->pubgencols)) continue; It seemed a bit strange/inconsistent that 'columns' was assigned to a local var, but 'pubgencols' was not, given they are both fields of the same struct. Maybe removing this 'columns' var would be consistent with other code in this patch. ~~~ 13. check_and_init_gencol: nit - missing periods for comments. ~~~ 14. + /* There is no generated columns to be published * /There is no generated columns/There are no generated columns/ ~~~ 15. + foreach(lc, publications) + { + Publication *pub = lfirst(lc); AFAIK this can be re-written using a different macro to avoid needing the 'lc' var. ~~~ pgoutput_column_list_init: 16. + bool collistpubexist = false; This seemed like not a very good name, How about 'found_pub_with_collist'; ~~~ 17. bool pub_no_list = true; nit - Not caused by this patch, but it's closely related; In passing we should declare this variable at a lower scope, and rename it to 'isnull' which is more in keeping with the comments around it. ~~~ 18. + /* + * For non-column list publications—such as TABLE (without a column + * list), ALL TABLES, or ALL TABLES IN SCHEMA publications consider + * all columns of the table, including generated columns, based on the + * pubgencols option. + */ Some minor tweaks. SUGGESTION For non-column list publications — e.g. TABLE (without a column list), ALL TABLES, or ALL TABLES IN SCHEMA, we consider all columns of the table (including generated columns when 'publish_generated_columns' option is true). ~~~ 19. + Assert(pub->pubgencols == entry->pubgencols); + + /* + * Retrieve the columns if they haven't been prepared yet, or if + * there are multiple publications. + */ + if (!relcols && (list_length(publications) > 1)) + { + pgoutput_ensure_entry_cxt(data, entry); + relcols = pub_getallcol_bitmapset(relation, entry->pubgencols, + entry->entry_cxt); + } 19a. Is that Assert correct? I ask only because AFAICT I saw in previous function (check_and_init_gencol) there is code that might change entry->pubgencols = false; even if the 'publish_generated_columns' option is true but there were not generated columns found in the table. ~ 19b. The comment says "or if there are multiple publications" but the code says &&. Something seems wrong. ~~~ 20. + /* + * If no column list publications exit, columns will be selected later + * according to the generated columns option. + */ 20a. typo - /exit/exist/ ~ 20b. There is a GENERAL problem that applies for lots of comments of this patch (including this comment) because the new publication option is referred to inconsistently in many different ways: e.g. - the generated columns option. - if the option is not specified - publish_generated_columns option. - the pubgencols option - 'publish_generated_columns' option All these references should be made the same. My personal preference is the last one ('publish_generated_columns' option). ~~~ get_rel_sync_entry: 21. + /* Check whether to publish to generated columns. */ + check_and_init_gencol(data, rel_publications, entry); + typo in comment - "publishe to"? ====== src/include/catalog/pg_publication.h 22. extern Bitmapset *pub_collist_to_bitmapset(Bitmapset *columns, Datum pubcols, MemoryContext mcxt); +extern Bitmapset *pub_getallcol_bitmapset(Relation relation, bool pubgencols, + MemoryContext mcxt); Maybe a better name for this new function is 'pub_allcols_bitmapse'. That would then be consistent with the other BMS function which doesn't include the word "get". ====== Some of my suggested updates above are already implemented in the attached nitpicks diff. Please refer to it. ====== [1] push support for gencols in column list -- https://github.com/postgres/postgres/commit/745217a051a9341e8c577ea59a87665d331d4af0 [2] docs oversight in commit -- https://www.postgresql.org/message-id/CAHut%2BPtBVSxNDph-mHP_SE4%2BWw%2BxJ0SKhVupnx5uVhK3V_SDHw%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c index a662a45..0d76674 100644 --- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -241,22 +241,17 @@ has_column_list_defined(Publication *pub, Oid relid) cftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid), ObjectIdGetDatum(pub->oid)); - if (HeapTupleIsValid(cftuple)) - { - /* Lookup the column list attribute. */ - (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple, - Anum_pg_publication_rel_prattrs, - &isnull); - if (!isnull) - { - ReleaseSysCache(cftuple); - return true; - } + if (!HeapTupleIsValid(cftuple)) + return false; - ReleaseSysCache(cftuple); - } + /* Lookup the column list attribute. */ + (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple, + Anum_pg_publication_rel_prattrs, + &isnull); + ReleaseSysCache(cftuple); - return false; + /* Was a column list found for this relation? */ + return isnull ? false : true; } /* diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c index 62b79cf..e928d94 100644 --- a/src/backend/replication/logical/proto.c +++ b/src/backend/replication/logical/proto.c @@ -1251,30 +1251,34 @@ logicalrep_message_type(LogicalRepMsgType action) /* * Check if the column 'att' of a table should be published. * - * 'columns' represents the column list specified for that table in the - * publication. + * 'columns' represents the publication column list (if any) for that table. * - * Note that generated columns can be present only in 'columns' list. + * Note that generated columns can be published only when present in a + * publication column list, or (if there is no column list), when the + * publication parameter 'publish_generated_columns' it true. */ bool logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns, - bool pubgencols) + bool pubgencols_option) { if (att->attisdropped) return false; - /* - * Skip publishing generated columns if they are not included in the - * column list or if the option is not specified. - */ - if (!columns && !pubgencols && att->attgenerated) - return false; - - /* - * Check if a column is covered by a column list. - */ - if (columns && !bms_is_member(att->attnum, columns)) - return false; - - return true; + if (columns) + { + /* + * Has a column list: + * Publish only cols named in that list. + */ + return bms_is_member(att->attnum, columns); + } + else + { + /* + * Has no column list: + * Publish generated cols only if 'publish_generated_cols' is true. + * Publish all non-generated cols. + */ + return att->attgenerated ? pubgencols_option : true; + } } diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index d94e120..6da57e2 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -1022,7 +1022,7 @@ check_and_init_gencol(PGOutputData *data, List *publications, ListCell *lc; bool first = true; - /* Check if there is any generated column present */ + /* Check if there is any generated column present. */ for (int i = 0; i < desc->natts; i++) { Form_pg_attribute att = TupleDescAttr(desc, i); @@ -1034,7 +1034,7 @@ check_and_init_gencol(PGOutputData *data, List *publications, } } - /* There is no generated columns to be published */ + /* There are no generated columns to be published. */ if (!gencolpresent) { entry->pubgencols = false; @@ -1080,7 +1080,7 @@ pgoutput_column_list_init(PGOutputData *data, List *publications, ListCell *lc; bool first = true; Relation relation = RelationIdGetRelation(entry->publish_as_relid); - bool collistpubexist = false; + bool found_pub_with_collist = false; Bitmapset *relcols = NULL; /* @@ -1111,8 +1111,6 @@ pgoutput_column_list_init(PGOutputData *data, List *publications, */ if (!pub->alltables) { - bool pub_no_list = true; - /* * Check for the presence of a column list in this publication. * @@ -1126,17 +1124,19 @@ pgoutput_column_list_init(PGOutputData *data, List *publications, if (HeapTupleIsValid(cftuple)) { + bool isnull; + /* Lookup the column list attribute. */ cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple, Anum_pg_publication_rel_prattrs, - &pub_no_list); + &isnull); /* Build the column list bitmap in the per-entry context. */ - if (!pub_no_list) /* when not null */ + if (!isnull) /* when not null */ { pgoutput_ensure_entry_cxt(data, entry); - collistpubexist = true; + found_pub_with_collist = true; cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt); } @@ -1146,10 +1146,10 @@ pgoutput_column_list_init(PGOutputData *data, List *publications, } /* - * For non-column list publications—such as TABLE (without a column - * list), ALL TABLES, or ALL TABLES IN SCHEMA publications consider - * all columns of the table, including generated columns, based on the - * pubgencols option. + * For non-column list publications — e.g. TABLE (without a column list), + * ALL TABLES, or ALL TABLES IN SCHEMA, we consider all columns of the + * table (including generated columns if 'publish_generated_columns' + * option is true). */ if (!cols) { @@ -1186,7 +1186,7 @@ pgoutput_column_list_init(PGOutputData *data, List *publications, * If no column list publications exit, columns will be selected later * according to the generated columns option. */ - if (!collistpubexist) + if (!found_pub_with_collist) entry->columns = NULL; RelationClose(relation);