Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-02 Thread Heikki Linnakangas
On 02.11.2010 00:47, Tom Lane wrote: Greg Starkgsst...@mit.edu writes: On Mon, Nov 1, 2010 at 12:37 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yes, indeed there is a corner-case bug when you try to stream the very first WAL segment, with log==seg==0. This smells

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Heikki Linnakangas
On 01.11.2010 05:21, Robert Haas wrote: There seem to be two cases in the code that can generate that error. One, attempting to open the file returns ENOENT. Two, after the data has been read, the last-removed position returned by XLogGetLastRemoved precedes the data we think we just read,

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Heikki Linnakangas
On 31.10.2010 23:31, Greg Smith wrote: LOG: replication connection authorized: user=rep host=127.0.0.1 port=52571 FATAL: requested WAL segment 0001 has already been removed Which is confusing because that file is certainly on the master still, and hasn't even been considered

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Heikki Linnakangas
On 01.11.2010 09:37, Heikki Linnakangas wrote: On 31.10.2010 23:31, Greg Smith wrote: LOG: replication connection authorized: user=rep host=127.0.0.1 port=52571 FATAL: requested WAL segment 0001 has already been removed Which is confusing because that file is certainly on

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Fujii Masao
On Mon, Nov 1, 2010 at 5:17 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Committed that. Thanks for the report, both of you. I'm not subscribed to pgsql-admin which is why I didn't see Matt's original report. Thanks! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Greg Smith
Heikki Linnakangas wrote: Yes, indeed there is a corner-case bug when you try to stream the very first WAL segment, with log==seg==0. I confirmed that the bug exists in only this case by taking my problem install and doing this: psql -d postgres -c checkpoint; select pg_switch_xlog(); To

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Greg Stark
On Mon, Nov 1, 2010 at 12:37 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yes, indeed there is a corner-case bug when you try to stream the very first WAL segment, with log==seg==0. This smells very much like

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-11-01 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: On Mon, Nov 1, 2010 at 12:37 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yes, indeed there is a corner-case bug when you try to stream the very first WAL segment, with log==seg==0. This smells very much like

[HACKERS] SR fails to send existing WAL file after off-line copy

2010-10-31 Thread Greg Smith
Last week we got this report from Matt Chesler: http://archives.postgresql.org/pgsql-admin/2010-10/msg00221.php that he was getting errors when trying to do a simple binary replication test. The problem is that what appears to be a perfectly good WAL segment doesn't get streamed to the

Re: [HACKERS] SR fails to send existing WAL file after off-line copy

2010-10-31 Thread Robert Haas
On Sun, Oct 31, 2010 at 5:31 PM, Greg Smith g...@2ndquadrant.com wrote: Which is confusing because that file is certainly on the master still, and hasn't even been considered archived yet much less removed: [mas...@pyramid pg_log]$ ls -l $PGDATA/pg_xlog -rw--- 1 master master 16777216 Oct