On Mon, Dec 15, 2014 at 2:18 PM, Patrick Krecker <patr...@judicata.com> wrote:
> On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh > <jkreg...@sproutloud.com> wrote: > > > > > > On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patr...@judicata.com> > > wrote: > >> > >> On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh < > jkreg...@sproutloud.com> > >> wrote: > >> > Hello, > >> > > >> > I have a master multi slave streaming replication setup. One master > and > >> > two > >> > slaves. I need to do some maintenance on one of the slaves as one of > the > >> > drives died however there is some other weird things going on in that > >> > array > >> > that I would need to investigate. So I am expecting the machine to be > >> > down > >> > at least two hours. > >> > > >> > I remember reading that if a master cannot connect to the slave it > would > >> > hold the log file from shipping. Is there any other way to hold the > file > >> > until the slave comes back online? Would it affect both slaves not > >> > getting > >> > their files shipped over? > >> > > >> > The good thing is that the slave in question is not serving any > >> > connections. > >> > > >> > From what I remember emptying out the archive_command would pause log > >> > shipping. Can the same be done by issuing a pg_stop_backup()? > >> > > >> > Thanks, > >> > -Joseph Kregloh > >> > >> I think you will need to change your archive_command so it saves the > >> WALs to a location reachable by both slaves and the master, and have > >> both slaves pull from the same location. I don't think > >> pg_stop_backup() is useful in this situation. > >> > > > > Currently my archive_command is to a sh script which internally does an > > rsync. It actually rsyncs to both slaves and then a Barman location. If I > > fail the archive_command, then i'll have a problem because my primary > slave > > serves read only queries, so it might start serving out stale data. > > > > What I was thinking is shipping the log files that would go to the second > > slave to another machine or location on the master. Then once I am done > with > > the maintenance i'll move those files over to the incoming folder. That > > would give a hopefully contain all the WAL files for the slave to catch > up. > > Any thoughts against this? > > Seems OK as long as you have the disk space to support the > accumulation of WALs (considering for the situation where the downtime > is much longer than anticipated). > Plenty of disk space to accumulate WAL files. > > When you say "i'll move those files over to the incoming folder," what > do you mean? I think that restore_command should be used on the slave > to retrieve the WALs from the archive location. Once the secondary has > caught up, you can change the configuration back to the old setup and > remove the accumulated WALs from the temporary location. > I will disable Postgres from starting up. Once the machine is up and running I will move all of the "saved" WAL files into the folder I have designated in the restore command in my recovery.conf. In my case the following line: restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null' Once all of the WAL files are there and the master is shipping WAL files I will start Postgres on the machine and it will begin processing the files until it catches up. > > > >> > >> The master will hold the logs as long as archive_command fails [1]. To > >> the extent that archive_command involves connecting to the slave, then > >> yes, Postgres will hold the WAL archives while the slave is down. > >> There are (at least) two reasons that saving the archives to some > >> other location is useful: > >> > >> 1) You don't risk running out of disk on the master due to batched up > >> WALs if a slave goes down. > >> 2) The backup of logs can be used to aid in point-in-time recovery. > >> > >> [1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html > > > > >