Scott, Thank you for your insight. I do have some extra disk and network throughput to spare. However, my question is ‘Can I run rsync while streaming is running?’ A streaming replica is a physical copy of a master, so why not. My concern is a possible silent introduction of some block corruptions, that would not be fixed by a block copy in wal files. I think such corruptions should not happen, and I saw a few instances where running rsync seemed to work. I’m curious if somebody is aware about a situation where a corruption is likely to happen.
Igor > On Sep 27, 2017, at 12:48, Scott Mead <sco...@openscg.com> wrote: > > > > On Wed, Sep 27, 2017 at 1:59 PM, Igor Polishchuk <ora4...@gmail.com > <mailto:ora4...@gmail.com>> wrote: > Sorry, here are the missing details, if it helps: > Postgres 9.6.5 on CentOS 7.2.1511 > > > On Sep 27, 2017, at 10:56, Igor Polishchuk <ora4...@gmail.com > > <mailto:ora4...@gmail.com>> wrote: > > > > Hello, > > I have a multi-terabyte streaming replica on a bysy database. When I set it > > up, repetative rsyncs take at least 6 hours each. > > So, when I start the replica, it begins streaming, but it is many hours > > behind right from the start. It is working for hours, and cannot reach a > > consistent state > > so the database is not getting opened for queries. I have plenty of WAL > > files available in the master’s pg_xlog, so the replica never uses archived > > logs. > > A question: > > Should I be able to run one more rsync from the master to my replica while > > it is streaming? > > The idea is to overcome the throughput limit imposed by a single recovery > > process on the replica and allow to catch up quicker. > > I remember doing it many years ago on Pg 8.4, and also heard from other > > people doing it. In all cases, it seamed working. > > I’m just not sure if there is no high risk of introducing some hidden data > > corruption, which I may not notice for a while on such a huge database. > > Any educated opinions on the subject here? > > It really comes down to the amount of I/O (network and disk) your system can > handle while under load. I've used 2 methods to do this in the past: > > - http://moo.nac.uci.edu/~hjm/parsync/ <http://moo.nac.uci.edu/~hjm/parsync/> > > parsync (parallel rsync)is nice, it does all the hard work for you of > parellizing rsync. It's just a pain to get all the prereqs installed. > > > - rsync --itemize-changes > Essentially, use this to get a list of files, manually split them out and > fire up a number of rsyncs. parsync does this for you, but, if you can't get > it going for any reason, this works. > > > The real trick, after you do your parallel rsync, make sure that you run one > final rsync to sync-up any missed items. > > Remember, it's all about I/O. The more parallel threads you use, the harder > you'll beat up the disks / network on the master, which could impact > production. > > Good luck > > --Scott > > > > > > > > > > Thank you > > Igor Polishchuk > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org > <mailto:pgsql-general@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > <http://www.postgresql.org/mailpref/pgsql-general> > > > > -- > -- > Scott Mead > Sr. Architect > OpenSCG <http://openscg.com/> > http://openscg.com <http://openscg.com/>