On Mon, 3 Jun 2024 at 13:03, Shubham Khanna <khannashubham1...@gmail.com> wrote: > > On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2...@gmail.com> wrote: > > > > Here are some review comments for the patch v1-0001. > > > > ====== > > GENERAL > > > > G.1. Use consistent names > > > > It seems to add unnecessary complications by having different names > > for all the new options, fields and API parameters. > > > > e.g. sometimes 'include_generated_columns' > > e.g. sometimes 'publish_generated_columns' > > > > Won't it be better to just use identical names everywhere for > > everything? I don't mind which one you choose; I just felt you only > > need one name, not two. This comment overrides everything else in this > > post so whatever name you choose, make adjustments for all my other > > review comments as necessary. > > I have updated the name to 'include_generated_columns' everywhere in the > Patch. > > > ====== > > > > G.2. Is it possible to just use the existing bms? > > > > A very large part of this patch is adding more API parameters to > > delegate the 'publish_generated_columns' flag value down to when it is > > finally checked and used. e.g. > > > > The functions: > > - logicalrep_write_insert(), logicalrep_write_update(), > > logicalrep_write_delete() > > ... are delegating the new parameter 'publish_generated_column' down to: > > - logicalrep_write_tuple > > > > The functions: > > - logicalrep_write_rel() > > ... are delegating the new parameter 'publish_generated_column' down to: > > - logicalrep_write_attrs > > > > AFAICT in all these places the API is already passing a "Bitmapset > > *columns". I was wondering if it might be possible to modify the > > "Bitmapset *columns" BEFORE any of those functions get called so that > > the "columns" BMS either does or doesn't include generated cols (as > > appropriate according to the option). > > > > Well, it might not be so simple because there are some NULL BMS > > considerations also, but I think it would be worth investigating at > > least, because if indeed you can find some common place (somewhere > > like pgoutput_change()?) where the columns BMS can be filtered to > > remove bits for generated cols then it could mean none of those other > > patch API changes are needed at all -- then the patch would only be > > 1/2 the size. > > I will analyse and reply to this in the next version. > > > ====== > > Commit message > > > > 1. > > Now if include_generated_columns option is specified, the generated > > column information and generated column data also will be sent. > > > > Usage from pgoutput plugin: > > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > > 'proto_version', '1', 'publication_names', 'pub1', > > 'include_generated_columns', 'true'); > > > > Usage from test_decoding plugin: > > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > > 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include_generated_columns', '1'); > > > > ~ > > > > I think there needs to be more background information given here. This > > commit message doesn't seem to describe anything about what is the > > problem and how this patch fixes it. It just jumps straight into > > giving usages of a 'include_generated_columns' option. > > > > It also doesn't say that this is an option that was newly *introduced* > > by the patch -- it refers to it as though the reader should already > > know about it. > > > > Furthermore, your hacker's post says "Currently it is not supported as > > a subscription option because table sync for the generated column is > > not possible as copy command does not support getting data for the > > generated column. If this feature is required we can remove this > > limitation from the copy command and then add it as a subscription > > option later." IMO that all seems like the kind of information that > > ought to also be mentioned in this commit message. > > I have updated the Commit message mentioning the suggested changes. > > > ====== > > contrib/test_decoding/sql/ddl.sql > > > > 2. > > +-- check include_generated_columns option with generated column > > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > > AS (a * 2) STORED); > > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include_generated_columns', '1'); > > +DROP TABLE gencoltable; > > + > > > > 2a. > > Perhaps you should include both option values to demonstrate the > > difference in behaviour: > > > > 'include_generated_columns', '0' > > 'include_generated_columns', '1' > > Added the other option values to demonstrate the difference in behaviour: > > > 2b. > > I think you maybe need to include more some test combinations where > > there is and isn't a COLUMN LIST, because I am not 100% sure I > > understand the current logic/expectations for all combinations. > > > > e.g. When the generated column is in a column list but > > 'publish_generated_columns' is false then what should happen? etc. > > Also if there are any special rules then those should be mentioned in > > the commit message. > > Test case is added and the same is mentioned in the documentation. > > > ====== > > src/backend/replication/logical/proto.c > > > > 3. > > For all the API changes the new parameter name should be plural. > > > > /publish_generated_column/publish_generated_columns/ > > Updated the name to 'include_generated_columns' > > > 4. logical_rep_write_tuple: > > > > - if (att->attisdropped || att->attgenerated) > > + if (att->attisdropped) > > continue; > > > > - if (!column_in_column_list(att->attnum, columns)) > > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) > > + continue; > > + > > + if (att->attgenerated && !publish_generated_column) > > continue; > > That code seems confusing. Shouldn't the logic be exactly as also in > > logicalrep_write_attrs()? > > > > e.g. Shouldn't they both look like this: > > > > if (att->attisdropped) > > continue; > > > > if (att->attgenerated && !publish_generated_column) > > continue; > > > > if (!column_in_column_list(att->attnum, columns)) > > continue; > > Fixed. > > > ====== > > src/backend/replication/pgoutput/pgoutput.c > > > > 5. > > static void send_relation_and_attrs(Relation relation, TransactionId xid, > > LogicalDecodingContext *ctx, > > - Bitmapset *columns); > > + Bitmapset *columns, > > + bool publish_generated_column); > > > > Use plural. /publish_generated_column/publish_generated_columns/ > > Updated the name to 'include_generated_columns' > > > 6. parse_output_parameters > > > > bool origin_option_given = false; > > + bool generate_column_option_given = false; > > > > data->binary = false; > > data->streaming = LOGICALREP_STREAM_OFF; > > data->messages = false; > > data->two_phase = false; > > + data->publish_generated_column = false; > > > > I think the 1st var should be 'include_generated_columns_option_given' > > for consistency with the name of the actual option that was given. > > Updated the name to 'include_generated_columns_option_given' > > > ====== > > src/include/replication/logicalproto.h > > > > 7. > > (Same as a previous review comment) > > > > For all the API changes the new parameter name should be plural. > > > > /publish_generated_column/publish_generated_columns/ > > Updated the name to 'include_generated_columns' > > > ====== > > src/include/replication/pgoutput.h > > > > 8. > > bool publish_no_origin; > > + bool publish_generated_column; > > } PGOutputData; > > > > /publish_generated_column/publish_generated_columns/ > > Updated the name to 'include_generated_columns' > > The attached Patch contains the suggested changes.
Thanks for the updated patch, few comments: 1) The option name seems wrong here: In one place include_generated_column is specified and other place include_generated_columns is specified: + else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) && + strcmp(defel->defname, "include_generated_column") == 0) + { + if (IsSet(opts->specified_opts, SUBOPT_INCLUDE_GENERATED_COLUMN)) + errorConflictingDefElem(defel, pstate); + + opts->specified_opts |= SUBOPT_INCLUDE_GENERATED_COLUMN; + opts->include_generated_column = defGetBoolean(defel); + } diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index d453e224d9..e8ff752fd9 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH("binary", "connect", "copy_data", "create_slot", "disable_on_error", "enabled", "failover", "origin", "password_required", "run_as_owner", "slot_name", - "streaming", "synchronous_commit", "two_phase"); + "streaming", "synchronous_commit", "two_phase","include_generated_columns"); 2) This small data table need not have a primary key column as it will create an index and insertion will happen in the index too. +-- check include-generated-columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); 3) Please add a test case for this: + set to <literal>false</literal>. If the subscriber-side column is also a + generated column then this option has no effect; the replicated data will + be ignored and the subscriber column will be filled as normal with the + subscriber-side computed or default data. 4) You can use a new style of ereport to remove the brackets around errcode 4.a) + else if (!parse_bool(strVal(elem->arg), &data->include_generated_columns)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("could not parse value \"%s\" for parameter \"%s\"", + strVal(elem->arg), elem->defname))); 4.b) similarly here too: + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + /*- translator: both %s are strings of the form "option = value" */ + errmsg("%s and %s are mutually exclusive options", + "copy_data = true", "include_generated_column = true"))); 4.c) similarly here too: + if (include_generated_columns_option_given) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); 5) These variable names can be changed to keep it smaller, something like gencol or generatedcol or gencolumn, etc +++ b/src/include/catalog/pg_subscription.h @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW * slots) in the upstream database are enabled * to be synchronized to the standbys. */ + bool subincludegeneratedcolumn; /* True if generated columns must be published */ + #ifdef CATALOG_VARLEN /* variable-length fields start here */ /* Connection string to the publisher */ text subconninfo BKI_FORCE_NOT_NULL; @@ -157,6 +159,7 @@ typedef struct Subscription List *publications; /* List of publication names to subscribe to */ char *origin; /* Only publish data originating from the * specified origin */ + bool includegeneratedcolumn; /* publish generated column data */ } Subscription; Regards, Vignesh