[ADMIN] Some replication-related notes and questions

Scott Whitney Tue, 30 Apr 2013 12:43:39 -0700

We recently moved to PG 9.2.4 (from 8.4.4) to take advantage of replication, 
and I have to say it's pretty awesome.



I ran into some things that I was hoping someone could clarify. 


a) There appears to be no way to tell how "far behind" my standby servers are. 
That is, I can find a checkpoint with pg_controldata. I can find the 
sending/receiving WAL processes via ps or replay/receive_location(). However 
there seems to be no correlation to real-world times or dates as in "how many 
seconds/minutes/hours behind are my standby servers." 


b) This segues nicely into the archive_status/*.done files. The _only_ files in 
my archive_status directory on my standby servers are .done files. Will these 
clean up on their own? Can I stat the last one to know the maximum discrepancy 
between my master and slave? If they do NOT clean up on their own, is it safe 
to remove them? I couldn't find any information on what that directory 
specifically does or whether .done files are required. 


c) It would appear that a FULL vacuum can hose replication (in the case of a 
long-standing hanging transaction). Has anyone else run into this? 
Specifically, my standby was up and running and replicating. Sunday morning (I 
do full vacuums on Saturday due to an old bug in an old database) my standby 
was trying to replay a WAL file that didn't exist because I had hanging 
transactions that began before my oldest WAL segment. Since the replication was 
up and running on Saturday, I was wondering whether full vacuum could have been 
to blame for the servers getting out ot sync. 


Thanks, 
Scott Whitney

[ADMIN] Some replication-related notes and questions

Reply via email to