On 31.10.2010 23:31, Greg Smith wrote:
LOG: replication connection authorized: user=rep host= port=52571
FATAL: requested WAL segment 000000010000000000000000 has already been

Which is confusing because that file is certainly on the master still,
and hasn't even been considered archived yet much less removed:

[mas...@pyramid pg_log]$ ls -l $PGDATA/pg_xlog
-rw------- 1 master master 16777216 Oct 31 16:29 000000010000000000000000
drwx------ 2 master master 4096 Oct 4 12:28 archive_status
[mas...@pyramid pg_log]$ ls -l $PGDATA/pg_xlog/archive_status/
total 0

So why isn't SR handing that data over? Is there some weird unhandled
corner case this exposes, but that wasn't encountered by the systems the
tutorial was tried out on?

Yes, indeed there is a corner-case bug when you try to stream the very first WAL segment, with log==seg==0. We keep track of the last removed WAL segment, and before a piece of WAL is sent to the standby, walsender checks that the requested WAL segment is > the last removed. Before any WAL segments have been removed since postmaster startup, the latest removed segment is initialized to 0/0, with the idea that 0/0 precedes any valid WAL segment. That's clearly not true though, it does not precede the very first WAL segment after initdb, 0/0.

Seems that we need to change the meaning of the last removed WAL segment to avoid the ambiguity of 0/0. Let's store the (last removed)+1 in the global variable instead.

  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to