Hi,

On Thu, Aug 5, 2021 at 11:28 AM Vijaykumar Jain <
vijaykumarjain.git...@gmail.com> wrote:

> On Thu, 5 Aug 2021 at 10:27, Nikhil Shetty <nikhil.db...@gmail.com> wrote:
>
>> Hi,
>>
>> Thank you for the suggestion.
>>
>> We tried by dropping indexes and it worked faster compared to what we saw
>> earlier. We wanted to know if anybody has done any other changes that helps
>> speed-up initial data load without dropping indexes.
>>
>>
> You could leverage pg_basbeackup or pg_dump with parallel jobs
taken from a Standby (preferably replication paused if pg_dump, anyways
pg_basebackup should be straight-forward) or taken even from
Primary, for the purpose of initial data load.

As you are able to drop indexes and make some schema changes, I would
assume that you could pause your app temporarily. If that's the case
you may look into the simple steps i am posting here that demonstrates
pg_dump/pg_restore instead.

If you cannot pause the app, then, you could look into how you
could use pg_replication_origin_advance
<https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>


Step 1 : Pause App
Step 2 : Create Publication on the Primary CREATE PUBLICATION
<some_pub_name> FOR ALL TABLES;
Step 3 : Create Logical Replication Slot on the Primary SELECT * FROM
pg_create_logical_replication_slot('<some_slot_name>', 'pgoutput'); Step 4
: Create Subscription but do not enable the Subscription
CREATE SUBSCRIPTION <some_sub_name> CONNECTION
'host=<some_host> dbname=<some_db> user=postgres
password=secret port=5432' PUBLICATION <some_pub_name>
WITH (copy_data = false, create_slot=false, enabled=false,
slot_name=<some_slot_name>);

Step 5 : Initiate pg_dump. We can take a parallel backup for a faster
restore.

$ pg_dump -d <some_db> -Fd -j 4 -n <some_schema> -f <some_unique_directory>
-- If its several hundreds of GBs or TBs, you may rather utilize one of
your Standby that has been paused from replication using -> select
pg_wal_replay_pause();

Step 6 : Don't need to wait until pg_dump completes, you may start the App.
-- Hope the app does not perform changes that impact the pg_dump or
gets blocked due to pg_dump.
Step 7 : Restore the dump if you used pg_dump.
pg_restore -d <some_db> -j <some_numer_of_parallel_jobs> <some_directory> Step
8 : Enable subscription.
ALTER SUBSCRIPTION <some_sub_name> ENABLE;

If you have not stopped your app then you must advance the lsn using
pg_replication_origin_advance
<https://www.postgresql.org/docs/13/functions-admin.html#PG-REPLICATION-ORIGIN-ADVANCE>


These are all hand-written steps while drafting this email, so,
please test it on your end as some typos or adjustments are definitely
expected.

PS: i have not tested this in production level loads, it was just some exp
> i did on my laptop.
>
> one option would be to use pglogical extension (this was shared by
> Dharmendra in one the previous mails, sharing the same),
> and then use pglogical_create_subscriber cli to create the initial copy
> via pgbasebackup and then carry on from there.
> I ran the test case similar to one below in my local env, and it seems to
> work fine. of course i do not have TB worth of load to test, but it looks
> promising,
> especially since they introduced it to the core.
> pglogical/010_pglogical_create_subscriber.pl at REL2_x_STABLE ·
> 2ndQuadrant/pglogical (github.com)
> <https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/t/010_pglogical_create_subscriber.pl>
> Once you attain some reasonable sync state, you can drop the pglogical
> extension, and check if things continue fine.
> I have done something similar when upgrading from 9.6 to 11 using
> pglogical and then dropping the extension and it was smooth,
> maybe you need to try this out and share if things works fine.
> and
> The 1-2-3 for PostgreSQL Logical Replication Using an RDS Snapshot -
> Percona Database Performance Blog
> <https://www.percona.com/blog/postgresql-logical-replication-using-an-rds-snapshot/>
>
>

-- 
Regards,
Avinash Vallarapu (Avi)
CEO,
MigOps, Inc.

Reply via email to