Re: [HACKERS] Switching timeline over streaming replication

Thom Brown Thu, 20 Dec 2012 15:51:45 -0800

On 20 December 2012 12:45, Heikki Linnakangas <hlinnakan...@vmware.com>wrote:

> On 17.12.2012 15:05, Thom Brown wrote:
>
>> I just set up 120 chained standbys, and for some reason I'm seeing these
>> errors:
>>
>> LOG:  replication terminated by primary server
>> DETAIL:  End of WAL reached on timeline 1
>> LOG:  record with zero length at 0/301EC10
>> LOG:  fetching timeline history file for timeline 2 from primary server
>> LOG:  restarted WAL streaming at 0/3000000 on timeline 1
>> LOG:  replication terminated by primary server
>> DETAIL:  End of WAL reached on timeline 1
>> LOG:  new target timeline is 2
>> LOG:  restarted WAL streaming at 0/3000000 on timeline 2
>> LOG:  replication terminated by primary server
>> DETAIL:  End of WAL reached on timeline 2
>> FATAL:  error reading result of streaming command: ERROR:  requested WAL
>> segment 000000020000000000000003 has already been removed
>>
>> ERROR:  requested WAL segment 000000020000000000000003 has already been
>> removed
>> LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
>> ERROR:  requested WAL segment 000000020000000000000003 has already been
>> removed
>>
>
> I just committed a patch that should make the "requested WAL segment
> 000000020000000000000003 has already been removed" errors go away. The
> trick was for walsenders to not switch to the new timeline until at least
> one record has been replayed on it. That closes the window where the
> walsender already considers the new timeline to be the latest, but the WAL
> file has not been created yet.
>

Now I'm getting this on all standbys after promoting the first standby in a
chain.

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1
LOG:  record with zero length at 0/301EC10
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  restarted WAL streaming at 0/3000000 on timeline 1
FATAL:  could not receive data from WAL stream:
LOG:  new target timeline is 2
FATAL:  could not connect to the primary server: FATAL:  the database
system is in recovery mode

LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
TRAP: FailedAssertion("!(((sentPtr) <= (SendRqstPtr)))", File:
"walsender.c", Line: 1425)
LOG:  server process (PID 19917) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted while in recovery at log time
2012-12-20 23:41:23 GMT
HINT:  If this has occurred more than once some data might be corrupted and
you might need to choose an earlier recovery target.
LOG:  entering standby mode
FATAL:  the database system is in recovery mode
LOG:  redo starts at 0/2000028
LOG:  consistent recovery state reached at 0/20000E8
LOG:  database system is ready to accept read only connections
LOG:  record with zero length at 0/301EC70
LOG:  started streaming WAL from primary at 0/3000000 on timeline 2
LOG:  unexpected EOF on standby connection

And if I restart the new primary, the first new standby connected to it
shows:

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 2
FATAL:  error reading result of streaming command: server closed the
connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.

LOG:  record with zero length at 0/301F1E0

However, all other standbys don't show any additional log output.

-- 
Thom

Re: [HACKERS] Switching timeline over streaming replication

Reply via email to