On 20 December 2012 12:45, Heikki Linnakangas <hlinnakan...@vmware.com>wrote:
> On 17.12.2012 15:05, Thom Brown wrote: > >> I just set up 120 chained standbys, and for some reason I'm seeing these >> errors: >> >> LOG: replication terminated by primary server >> DETAIL: End of WAL reached on timeline 1 >> LOG: record with zero length at 0/301EC10 >> LOG: fetching timeline history file for timeline 2 from primary server >> LOG: restarted WAL streaming at 0/3000000 on timeline 1 >> LOG: replication terminated by primary server >> DETAIL: End of WAL reached on timeline 1 >> LOG: new target timeline is 2 >> LOG: restarted WAL streaming at 0/3000000 on timeline 2 >> LOG: replication terminated by primary server >> DETAIL: End of WAL reached on timeline 2 >> FATAL: error reading result of streaming command: ERROR: requested WAL >> segment 000000020000000000000003 has already been removed >> >> ERROR: requested WAL segment 000000020000000000000003 has already been >> removed >> LOG: started streaming WAL from primary at 0/3000000 on timeline 2 >> ERROR: requested WAL segment 000000020000000000000003 has already been >> removed >> > > I just committed a patch that should make the "requested WAL segment > 000000020000000000000003 has already been removed" errors go away. The > trick was for walsenders to not switch to the new timeline until at least > one record has been replayed on it. That closes the window where the > walsender already considers the new timeline to be the latest, but the WAL > file has not been created yet. > Now I'm getting this on all standbys after promoting the first standby in a chain. LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 LOG: record with zero length at 0/301EC10 LOG: fetching timeline history file for timeline 2 from primary server LOG: restarted WAL streaming at 0/3000000 on timeline 1 FATAL: could not receive data from WAL stream: LOG: new target timeline is 2 FATAL: could not connect to the primary server: FATAL: the database system is in recovery mode LOG: started streaming WAL from primary at 0/3000000 on timeline 2 TRAP: FailedAssertion("!(((sentPtr) <= (SendRqstPtr)))", File: "walsender.c", Line: 1425) LOG: server process (PID 19917) was terminated by signal 6: Aborted LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing LOG: database system was interrupted while in recovery at log time 2012-12-20 23:41:23 GMT HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode FATAL: the database system is in recovery mode LOG: redo starts at 0/2000028 LOG: consistent recovery state reached at 0/20000E8 LOG: database system is ready to accept read only connections LOG: record with zero length at 0/301EC70 LOG: started streaming WAL from primary at 0/3000000 on timeline 2 LOG: unexpected EOF on standby connection And if I restart the new primary, the first new standby connected to it shows: LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 2 FATAL: error reading result of streaming command: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. LOG: record with zero length at 0/301F1E0 However, all other standbys don't show any additional log output. -- Thom