Hi, On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain < vijaykumarjain.git...@gmail.com> wrote:
> On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil.db...@gmail.com> wrote: > >> Hi, >> >> Thank you for the suggestion. >> >> We tried by dropping indexes and it worked faster compared to what we saw >> earlier. We wanted to know if anybody has done any other changes that helps >> speed-up initial data load without dropping indexes. >> >> > You could leverage pg_basbeackup or pg_dump with parallel jobs taken from a Standby (preferably replication paused if pg_dump, anyways pg_basebackup should be straight-forward) or taken even from Primary, for the purpose of initial data load. As you are able to drop indexes and make some schema changes, I would assume that you could pause your app temporarily. If that's the case you may look into the simple steps i am posting here that demonstrates pg_dump/pg_restore instead. If you cannot pause the app, then, you could look into how you could use pg_replication_origin_advance <https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE> Step 1 : Pause App Step 2 : Create Publication on the Primary CREATE PUBLICATION <some_pub_name> FOR ALL TABLES; Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step 4 : Create Subscription but do not enable the Subscription CREATE SUBSCRIPTION <some_sub_name> CONNECTION 'host=<some_host> dbname=<some_db> user=postgres password=secret port=5432' PUBLICATION <some_pub_name> WITH (copy_data = false, create_slot=false, enabled=false, slot_name=<some_slot_name>); Step 5 : Initiate pg_dump. We can take a parallel backup for a faster restore. $ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory> -- If its several hundreds of GBs or TBs, you may rather utilize one of your Standby that has been paused from replication using -> select pg_wal_replay_pause(); Step 6 : Don't need to wait until pg_dump completes, you may start the App. -- Hope the app does not perform changes that impact the pg_dump or gets blocked due to pg_dump. Step 7 : Restore the dump if you used pg_dump. pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step 8 : Enable subscription. ALTER SUBSCRIPTION <some_sub_name> ENABLE; If you have not stopped your app then you must advance the lsn using pg_replication_origin_advance <https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE> These are all hand-written steps while drafting this email, so, please test it on your end as some typos or adjustments are definitely expected. PS: i have not tested this in production level loads, it was just some exp > i did on my laptop. > > one option would be to use pglogical extension (this was shared by > Dharmendra in one the previous mails, sharing the same), > and then use pglogical_create_subscriber cli to create the initial copy > via pgbasebackup and then carry on from there. > I ran the test case similar to one below in my local env, and it seems to > work fine. of course i do not have TB worth of load to test, but it looks > promising, > especially since they introduced it to the core. > pglogical/010_pglogical_create_subscriber.pl at REL2_x_STABLE · > 2ndQuadrant/pglogical (github.com) > <https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/t/010_pglogical_create_subscriber.pl> > Once you attain some reasonable sync state, you can drop the pglogical > extension, and check if things continue fine. > I have done something similar when upgrading from 9.6 to 11 using > pglogical and then dropping the extension and it was smooth, > maybe you need to try this out and share if things works fine. > and > The 1-2-3 for PostgreSQL Logical Replication Using an RDS Snapshot - > Percona Database Performance Blog > <https://www.percona.com/blog/postgresql-logical-replication-using-an-rds-snapshot/> > > -- Regards, Avinash Vallarapu (Avi) CEO, MigOps, Inc.