On Thu, Aug 21, 2025, at 6:08 AM, Shubham Khanna wrote: > Attachments: > * v3-0001-Support-tables-via-pg_createsubscriber.patch > * v3-0002-Support-WHERE-clause-and-COLUMN-list-in-table-arg.patch
+ <term><option>--table=<replaceable class="parameter">table</replaceable></option></term> + <listitem> + <para> + Adds one or more specific tables to the publication for the most recently + specified <option>--database</option>. This option can be given multiple + times to include additional tables. + </para> I think it is a really bad idea to rely on the order of options to infer which tables are from which database. The current design about publication and subscription names imply order but it doesn't mean subscription name must be the next option after the publication name. Your explanation from the initial email: > The attached patch introduces a new '--table' option that can be > specified after each '--database' argument. It allows users to > selectively replicate specific tables within a database instead of > defaulting to all tables. The syntax is like that used in 'vacuumdb' > and supports multiple '--table' arguments per database, including > optional column lists and row filters. vacuumdb doesn't accept multiple --table options if you specify multiple databases. $ vacuumdb -a -t public.pgbench_branches -t public.pgbench_tellers -d db1 -d db2 vacuumdb: error: cannot vacuum all databases and a specific one at the same time However, it accepts multiple --table options if you specify --all options. (Although, I think it needs better error handling.) $ vacuumdb -a -t public.pgbench_branches vacuumdb: vacuuming database "contrib_regression" vacuumdb: error: query failed: ERROR: relation "public.pgbench_branches" does not exist LINE 2: VALUES ('public.pgbench_branches'::pg_catalog.regclass, NU... ^ vacuumdb: detail: Query was: WITH listed_objects (object_oid, column_list) AS ( VALUES ('public.pgbench_branches'::pg_catalog.regclass, NULL::pg_catalog.text) ) Let's recap the initial goal: pg_createsubscriber creates a new logical replica from a physical standby server. Your proposal is extending the tool to create a partial logical replica but doesn't mention what you would do with the other part; that is garbage after the conversion. I'm not convinced that the current proposal is solid as-is. + <para> + The argument must be a fully qualified table name in one of the + following forms: + <itemizedlist><listitem><para><literal>schema.table</literal></para></listitem> + <listitem><para><literal>db.schema.table</literal></para></listitem></itemizedlist> + If the database name is provided, it must match the most recent + <option>--database</option> argument. + </para> Why do you want to include the database if you already specified it? + <para> + A table specification may also include an optional column list and/or + row filter: + <itemizedlist> + <listitem> + <para> + <literal>schema.table(col1, col2, ...)</literal> — publishes + only the specified columns. + </para> + </listitem> + <listitem> + <para> + <literal>schema.table WHERE (predicate)</literal> — publishes + only rows that satisfy the given condition. + </para> + </listitem> + <listitem> + <para> + Both forms can be combined, e.g. + <literal>schema.table(col1, col2) WHERE (id > 100)</literal>. + </para> + </listitem> + </itemizedlist> + </para> This is a bad idea for some cases. Let's say your setup involves a filter that uses only 10% of the rows from a certain table. It is better to do a manual setup. Besides that, expressions in options can open a can of worms. In the column list case, there might be corner cases like if you have a constraint in a certain column and that column was not included in the column list, the setup will fail; there isn't a cheap way to detect such cases. It seems this proposal doesn't serve a general purpose. It is copying a *whole* cluster to use only a subset of tables. Your task with pg_createsubscriber is more expensive than doing a manual logical replication setup. If you have 500 tables and want to replicate only 400 tables, it doesn't seem productive to specify 400 -t options. There are some cases like a small set of big tables that this feature makes sense. However, I'm wondering if a post script should be used to adjust your setup. There might be cases that involves only dozens of tables but my experience says it is rare. My expectation is that this feature is useful for avoiding some specific tables. Hence, the copy of the whole cluster is worthwhile. -- Euler Taveira EDB https://www.enterprisedb.com/