On 7/1/19 12:55 PM, Jamison, Kirk wrote:
> On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote:
>> As far as I remember, you should see "relation" wait events (type lock) on
>> standby server. This is due to startup process acquiring AccessExclusiveLock
>> for the truncation and other backend waiting to acquire a lock to read the
>> table.
> 
> Hi Adrien, thank you for taking time to reply.
> 
> I understand that RelationTruncate() can block read-only queries on
> standby during redo. However, it's difficult for me to reproduce the 
> test case where I need to catch that wait for relation lock, because
> one has to execute SELECT within the few milliseconds of redoing the
> truncation of one table.

Yes, that why your test by measuring vacuum execution time is better as it is
more reproductible.

> 
> Instead, I just measured the whole recovery time, smgr_redo(),
> to show the recovery improvement compared to head. Please refer below.
> 
> [Recovery Test]
> I used the same stored functions and configurations in the previous email
> & created "test" db.
> 
> $ createdb test
> $ psql -d test
> 
> 1. [Primary] Create 10,000 relations.
>       test=# SELECT create_tables(10000);
> 
> 2. [P] Insert one row in each table.
>       test=# SELECT insert_tables(10000);
> 
> 3. [P] Delete row of each table.
>       test=# SELECT delfrom_tables(10000);
> 
> 4. [Standby] WAL application is stopped at Standby server.
>       test=# SELECT pg_wal_replay_pause();
> 
> 5. [P] VACUUM is executed at Primary side, and measure its execution time.    
>         
>       test=# \timing on
>       test=# VACUUM;
> 
>       Alternatively, you may use:
>       $ time psql -d test -c 'VACUUM;'
>       (Note: WAL has not replayed on standby because it's been paused.)
> 
> 6. [P] Wait until VACUUM has finished execution. Then, stop primary server. 
>       test=# pg_ctl stop -w
> 
> 7. [S] Resume WAL replay, then promote standby (failover).
> I used a shell script to execute recovery & promote standby server
> because it's kinda difficult to measure recovery time. Please refer to the 
> script below.
> - "SELECT pg_wal_replay_resume();" is executed and the WAL application is 
> resumed.
> - "pg_ctl promote" to promote standby.
> - The time difference of "select pg_is_in_recovery();" from "t" to "f" is 
> measured.
> 
> shell script:
> 
> PGDT=/path_to_storage_directory/
> 
> if [ "$1" = "resume" ]; then
>       psql -c "SELECT pg_wal_replay_resume();" test
>       date +%Y/%m/%d_%H:%M:%S.%3N
>       pg_ctl promote -D ${PGDT}
>       set +x
>       date +%Y/%m/%d_%H:%M:%S.%3N
>       while [ 1 ]
>       do
>               RS=`psql -Atc "select pg_is_in_recovery();" test`               
>               if [ ${RS} = "f" ]; then
>                       break
>               fi
>       done
>       date +%Y/%m/%d_%H:%M:%S.%3N
>       set -x
>       exit 0
> fi
> 
> 
> [Test Results]
> shared_buffers = 24GB
> 
> 1. HEAD
> (wal replay resumed)
> 2019/07/01_08:48:50.326
> server promoted
> 2019/07/01_08:49:50.482
> 2019/07/01_09:02:41.051
> 
>  Recovery Time:
>  13 min 50.725 s -> Time difference from WAL replay to complete recovery
>  12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" 
> to "f"
> 
> 2. PATCH
> (wal replay resumed)
> 2019/07/01_07:34:26.766
> server promoted
> 2019/07/01_07:34:57.790
> 2019/07/01_07:34:57.809
> 
>  Recovery Time:       
>  31.043 s -> Time difference from WAL replay to complete recovery
>  00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f"
>  
> [Conclusion]
> The recovery time significantly improved compared to head
> from 13 minutes to 30 seconds.
> 
> Any thoughts?
> I'd really appreciate your comments/feedback about the patch and/or test.
> 
> 

Thanks for the time you spend on this test, it is a huge win!
Although creating 10k tables and deleting tuples is not a common use case, it is
still good to know how your patch performs.
I will try to look deeper in your patch, but my knowledge on postgres internal
are limited :)

-- 
Adrien


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to