Scott,
Thank you for your insight. I do have some extra disk and network throughput to 
spare. However, my question is ‘Can I run rsync while streaming is running?’
A streaming replica is a physical copy of a master, so why not. My concern is a 
possible silent introduction of some block corruptions, that would not be fixed 
by a block copy in wal files. I think such corruptions should not happen, and I 
saw a few instances where running rsync seemed to work. 
I’m curious if somebody is aware about a situation where a corruption is likely 
to happen.

Igor

> On Sep 27, 2017, at 12:48, Scott Mead <sco...@openscg.com> wrote:
> 
> 
> 
> On Wed, Sep 27, 2017 at 1:59 PM, Igor Polishchuk <ora4...@gmail.com 
> <mailto:ora4...@gmail.com>> wrote:
> Sorry, here are the missing details, if it helps:
> Postgres 9.6.5 on CentOS 7.2.1511
> 
> > On Sep 27, 2017, at 10:56, Igor Polishchuk <ora4...@gmail.com 
> > <mailto:ora4...@gmail.com>> wrote:
> >
> > Hello,
> > I have a multi-terabyte streaming replica on a bysy database. When I set it 
> > up, repetative rsyncs take at least 6 hours each.
> > So, when I start the replica, it begins streaming, but it is many hours 
> > behind right from the start. It is working for hours, and cannot reach a 
> > consistent state
> > so the database is not getting opened for queries. I have plenty of WAL 
> > files available in the master’s pg_xlog, so the replica never uses archived 
> > logs.
> > A question:
> > Should I be able to run one more rsync from the master to my replica while 
> > it is streaming?
> > The idea is to overcome the throughput limit imposed by a single recovery 
> > process on the replica and allow to catch up quicker.
> > I remember doing it many years ago on Pg 8.4, and also heard from other 
> > people doing it. In all cases, it seamed working.
> > I’m just not sure if there is no high risk of introducing some hidden data 
> > corruption, which I may not notice for a while on such a huge database.
> > Any educated opinions on the subject here?
> 
> It really comes down to the amount of I/O (network and disk) your system can 
> handle while under load.  I've used 2 methods to do this in the past:
> 
> - http://moo.nac.uci.edu/~hjm/parsync/ <http://moo.nac.uci.edu/~hjm/parsync/>
> 
>   parsync (parallel rsync)is nice, it does all the hard work for you of 
> parellizing rsync.  It's just a pain to get all the prereqs installed.
> 
> 
> - rsync --itemize-changes
>   Essentially, use this to get a list of files, manually split them out and 
> fire up a number of rsyncs.  parsync does this for you, but, if you can't get 
> it going for any reason, this works.
> 
> 
> The real trick, after you do your parallel rsync, make sure that you run one 
> final rsync to sync-up any missed items.
> 
> Remember, it's all about I/O.  The more parallel threads you use, the harder 
> you'll beat up the disks / network on the master, which could impact 
> production.
> 
> Good luck
> 
> --Scott
> 
> 
> 
> 
> 
>  
> >
> > Thank you
> > Igor Polishchuk
> 
> 
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org 
> <mailto:pgsql-general@postgresql.org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general 
> <http://www.postgresql.org/mailpref/pgsql-general>
> 
> 
> 
> -- 
> --
> Scott Mead
> Sr. Architect
> OpenSCG <http://openscg.com/>
> http://openscg.com <http://openscg.com/>

Reply via email to