Hi hackers, Currently, pg_createsubscriber supports converting streaming replication to logical replication for selected databases or all databases. However, there is no provision to replicate only a few selected tables. For such cases, users are forced to manually set up logical replication using individual SQL commands (CREATE PUBLICATION, CREATE SUBSCRIPTION, etc.), which can be time-consuming and error-prone. Extending pg_createsubscriber to support table-level replication would significantly improve the time taken to perform the setup. The attached patch introduces a new '--table' option that can be specified after each '--database' argument. It allows users to selectively replicate specific tables within a database instead of defaulting to all tables. The syntax is like that used in 'vacuumdb' and supports multiple '--table' arguments per database, including optional column lists and row filters. Example usage: ./pg_createsubscriber \ --database db1 \ --table 'public.t1' \ --table 'public.t2(a,b) WHERE a > 100' \ --database db2 \ --table 'public.t3'
I conducted tests comparing the patched pg_createsubscriber with standard logical replication under various scenarios to assess performance and flexibility. All test results represent the average of five runs. Scenario pg_createsubscriber Logical Replication Improvement Two databases (postgres and db1 each having 100 tables), replicate all 100 in postgres, 50 tables in db1 (100MB/table) total 15GB data 2m4.823s 7m23.294s 71.85% One DB, 100 tables, replicate 50 tables (200 MB/table) total 10GB data 2m47.703s 4m58.003s 43.73% One DB, 200 tables, replicate 100 tables (100 MB/table) total 10GB data 3m6.476s 4m35.130s 32.22% One DB, 100 tables, replicate 50 tables (100MB/table) total 5GB data 1m54.384s 2m23.719s 20.42% These results demonstrate that pg_createsubscriber consistently outperforms standard logical replication by 20.42% for 5GB data to 71.85% for 15GB data, the time taken reduces as the data increases. The attached test scripts were used for all experiments. Scenario 1 (Logical replication setup involving 50 tables across 2 databases, each containing 100 tables with 100 MB of data per table): pg_createsubscriber_setup_multi_db.sh was used for setup, followed by pg_createsubscriber_test_multi_db.sh to measure performance. For logical replication, the setup was done using logical_replication_setup_multi_db.sh, with performance measured via logical_replication_test_multi_db.sh. Scenario 2 and 3: The pg_createsubscriber_setup_single_db.sh (uncomment appropriate scenario mentioned in comments) script was used, with configuration changes specific to Scenario 2 and Scenario 3. In both cases, pg_createsubscriber_test_single_db.sh (uncomment appropriate scenario mentioned in comments) was used for measuring performance. Logical replication followed the same pattern, using logical_replication_setup_single_db.sh (uncomment appropriate scenario mentioned in comments) and logical_replication_test_single_db.sh (uncomment appropriate scenario mentioned in comments) for measurement. Scenario 4 (Logical replication setup on 50 tables from a database containing 100 tables, each with 100 MB of data): pg_createsubscriber_setup_single_db.sh (without modifications) was used for setup, and pg_createsubscriber_test_single_db.sh (without modifications) was used for performance measurement. Logical replication used logical_replication_setup_single_db.sh (without modifications) for setup and logical_replication_test_single_db.sh (without modifications) for measurement. Thoughts? Thanks and regards, Shubham Khanna.
v1-0001-Support-tables-via-pg_createsubscriber.patch
Description: Binary data
<<attachment: Scripts.zip>>