First of all thanks Fabio. I think that I'm missing something : In the next questionI'm not talking about streaming replication,rather on recovery :
1.When the secondary get the wals from the primary it tries to replay them correct ? 2. By replaying it just go over the wal records and run them in the secondary ? 3.All those changes are saved in the shared_buffer(secondary) or the changed are immediately done on the data files blocks ? 4.The secondary will need a checkpoint in order to flush those changes to the data files and in order to reach a restart point ? So, basically If I had a checkpoint during the clone, the secondary should also have a checkpoint when I recover the secondary right ? בתאריך יום ג׳, 28 במאי 2019 ב-13:54 מאת Fabio Pardi < f.pa...@portavita.eu>: > Hi Mariel, > > please keep the list posted. When you reply, use 'reply all'. That will > maybe help others in the community and you might also get more help from > others. > > answers are in line here below > > > > On 28/05/2019 10:54, Mariel Cherkassky wrote: > > I have pg 9.6, repmgr version 4.3 . > > I see in the logs that wal files are restored : > > 2019-05-22 12:35:12 EEST 60942 LOG: restored log file > "000000010000377B000000DB" from archive > > that means that the restore_command worked right ? > > > > right > > > According to the docs : > > "In standby mode, a restartpoint is also triggered > if checkpoint_segments log segments have been replayed since last > restartpoint and at least one checkpoint record has been replayed. > Restartpoints can't be performed more frequently than checkpoints in the > master because restartpoints can only be performed at checkpoint records" > > so maybe I should decrease max_wal_size or even checkpoint_timeout to > force a restartpoint ? > > During this gap (standby clone) 6-7 wals were generated on the primary > > > > > From what you posted earlier, you should in any case have hit a checkpoint > every 30 minutes. (That was also the assumption in the previous messages. > If that's not happening, then i would really investigate.) > > That is, if during your cloning only a few WAL files were generated, then > it is not enough to trigger a checkpoint and you fallback to the 30 minutes. > > I would not be bothered if i were you, but can always force a checkpoint > on the master issuing: > > CHECKPOINT ; > > at that stage, on the standby logs you will see the messages: > > restartpoint starting: .. > > restartpoint complete: .. > > > > regards, > > fabio pardi > > > > > בתאריך יום ב׳, 27 במאי 2019 ב-14:04 מאת Fabio Pardi < > f.pa...@portavita.eu <mailto:f.pa...@portavita.eu>>: > > > > If you did not even see this messages on your standby logs: > > > > restartpoint starting: xlog > > > > then it means that the checkpoint was even never started. > > > > In that case, I have no clue. > > > > Try to describe step by step how to reproduce the problem together > with > > your setup and the version number of Postgres and repmgr, and i > might be > > able to help you further. > > > > regards, > > > > fabio pardi > > > > On 5/27/19 12:17 PM, Mariel Cherkassky wrote: > > > standby_mode = 'on' > > > primary_conninfo = 'host=X.X.X.X user=repmgr connect_timeout=10 ' > > > recovery_target_timeline = 'latest' > > > primary_slot_name = repmgr_slot_1 > > > restore_command = 'rsync -avzhe ssh > > > postgres@x.x.x.x:/var/lib/pgsql/archive/%f > /var/lib/pgsql/archive/%f ; > > > gunzip < /var/lib/pgsql/archive/%f > %p' > > > archive_cleanup_command = '/usr/pgsql-9.6/bin/pg_archivecleanup > > > /var/lib/pgsql/archive %r' > > > > > > בתאריך יום ב׳, 27 במאי 2019 ב-12:29 מאת Fabio Pardi > > > <f.pa...@portavita.eu <mailto:f.pa...@portavita.eu> <mailto: > f.pa...@portavita.eu <mailto:f.pa...@portavita.eu>>>: > > > > > > Hi Mariel, > > > > > > let s keep the list in cc... > > > > > > settings look ok. > > > > > > what's in the recovery.conf file then? > > > > > > regards, > > > > > > fabio pardi > > > > > > On 5/27/19 11:23 AM, Mariel Cherkassky wrote: > > > > Hey, > > > > the configuration is the same as in the primary : > > > > max_wal_size = 2GB > > > > min_wal_size = 1GB > > > > wal_buffers = 16MB > > > > checkpoint_completion_target = 0.9 > > > > checkpoint_timeout = 30min > > > > > > > > Regarding your question, I didnt see this message > (consistent recovery > > > > state reached at), I guess thats why the secondary isnt > avaialble > > > yet.. > > > > > > > > Maybe I'm wrong, but what I understood from the > documentation- restart > > > > point is generated only after the secondary had a checkpoint > wihch > > > means > > > > only after 30 minutes or after max_wal_size is reached ? But > > > still, why > > > > wont the secondary reach a consisteny recovery state (does it > > > requires a > > > > restart point to be generated ? ) > > > > > > > > > > > > בתאריך יום ב׳, 27 במאי 2019 ב-12:12 מאת Fabio Pardi > > > > <f.pa...@portavita.eu <mailto:f.pa...@portavita.eu> > <mailto:f.pa...@portavita.eu <mailto:f.pa...@portavita.eu>> > > > <mailto:f.pa...@portavita.eu <mailto:f.pa...@portavita.eu> > <mailto:f.pa...@portavita.eu <mailto:f.pa...@portavita.eu>>>>: > > > > > > > > Hi Mariel, > > > > > > > > if i m not wrong, on the secondary you will see the > messages you > > > > mentioned when a checkpoint happens. > > > > > > > > What are checkpoint_timeout and max_wal_size on your > standby? > > > > > > > > Did you ever see this on your standby log? > > > > > > > > "consistent recovery state reached at .." > > > > > > > > > > > > Maybe you can post your whole configuration of your > standby > > > for easier > > > > debug. > > > > > > > > regards, > > > > > > > > fabio pardi > > > > > > > > > > > > > > > > > > > > On 5/27/19 10:49 AM, Mariel Cherkassky wrote: > > > > > Hey, > > > > > PG 9.6, I have a standalone configured. I tried to > start up a > > > > secondary, > > > > > run standby clone (repmgr). The clone process took 3 > hours > > > and during > > > > > that time wals were generated(mostly because of the > > > > checkpoint_timeout). > > > > > As a result of that, when I start the secondary ,I see > that the > > > > > secondary keeps getting the wals but I dont see any > messages > > > that > > > > > indicate that the secondary tried to replay the wals. > > > > > messages that i see : > > > > > receiving incremental file list > > > > > 000000010000377B000000DE > > > > > > > > > > sent 30 bytes received 4.11M bytes 8.22M bytes/sec > > > > > total size is 4.15M speedup is 1.01 > > > > > 2019-05-22 12:48:10 EEST 60942 LOG: restored log > file > > > > > "000000010000377B000000DE" from archive > > > > > 2019-05-22 12:48:11 EEST db63311 FATAL: the database > system is > > > > starting up > > > > > 2019-05-22 12:48:12 EEST db63313 FATAL: the database > system is > > > > > starting up > > > > > > > > > > I was hoping to see the following messages (taken from > a > > > different > > > > > machine) : > > > > > 2019-05-27 01:15:37 EDT 7428 LOG: restartpoint > starting: time > > > > > 2019-05-27 01:16:18 EDT 7428 LOG: restartpoint > complete: > > > wrote 406 > > > > > buffers (0.2%); 1 transaction log file(s) added, 0 > removed, 0 > > > > recycled; > > > > > write=41.390 s, sync=0.001 s, total=41.582 s; sync file > > > > > s=128, longest=0.000 s, average=0.000 s; distance=2005 > kB, > > > > estimate=2699 kB > > > > > 2019-05-27 01:16:18 EDT 7428 LOG: recovery restart > point at > > > > 4/D096C4F8 > > > > > > > > > > My primary settings(wals settings) : > > > > > wal_buffers = 16MB > > > > > checkpoint_completion_target = 0.9 > > > > > checkpoint_timeout = 30min > > > > > > > > > > Any idea what can explain why the secondary doesnt > replay > > > the wals ? > > > > > > > > > > > > > >