On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > On 13.02.2012 01:04, Jeff Janes wrote: >> >> Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD. >> I have no idea if I did it correctly, in particular if calling >> GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side >> effects that make that a bad thing to do. I'm not proposing it as the >> real fix, I just wanted to get around this problem in order to do more >> testing. > > > Thanks. That's basically the right approach. Attached patch contains a > cleaned up version of that. > > >> It does get rid of the "there is no contrecord flag" errors, but >> recover still does not work. >> >> Now the count of tuples in the table is always correct (I never >> provoke a crash during the initial table load), but sometimes updates >> to those tuples that were reported to have been committed are lost. >> >> This is more subtle, it does not happen on every crash. >> >> It seems that when recovery ends on "record with zero length at...", >> that recovery is correct. >> >> But when it ends on "invalid magic number 0000 in log file.." then the >> recovery is screwed up. > > > Can you write a self-contained test case for that? I've been trying to > reproduce that by running the regression tests and pgbench with a streaming > replication standby, which should be pretty much the same as crash recovery. > No luck this far.
Probably I could reproduce the same problem as Jeff got. Here is the test case: $ initdb -D data $ pg_ctl -D data start $ psql -c "create table t (i int); insert into t values(generate_series(1,10000)); delete from t" $ pg_ctl -D data stop -m i $ pg_ctl -D data start The crash recovery emitted the following server logs: LOG: database system was interrupted; last known up at 2012-02-14 02:07:01 JST LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/179CC90 LOG: invalid magic number 0000 in log file 0, segment 1, offset 8060928 LOG: redo done at 0/17AD858 LOG: database system is ready to accept connections LOG: autovacuum launcher started After recovery, I could not see the table "t" which I created before: $ psql -c "select count(*) from t" ERROR: relation "t" does not exist Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers