On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > On 13.02.2012 19:13, Fujii Masao wrote: >> >> On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas >> <heikki.linnakan...@enterprisedb.com> wrote: >>> >>> On 13.02.2012 01:04, Jeff Janes wrote: >>>> >>>> >>>> Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD. >>>> I have no idea if I did it correctly, in particular if calling >>>> GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side >>>> effects that make that a bad thing to do. I'm not proposing it as the >>>> real fix, I just wanted to get around this problem in order to do more >>>> testing. >>> >>> >>> >>> Thanks. That's basically the right approach. Attached patch contains a >>> cleaned up version of that. >>> >>> >>>> It does get rid of the "there is no contrecord flag" errors, but >>>> recover still does not work. >>>> >>>> Now the count of tuples in the table is always correct (I never >>>> provoke a crash during the initial table load), but sometimes updates >>>> to those tuples that were reported to have been committed are lost. >>>> >>>> This is more subtle, it does not happen on every crash. >>>> >>>> It seems that when recovery ends on "record with zero length at...", >>>> that recovery is correct. >>>> >>>> But when it ends on "invalid magic number 0000 in log file.." then the >>>> recovery is screwed up. >>> >>> >>> >>> Can you write a self-contained test case for that? I've been trying to >>> reproduce that by running the regression tests and pgbench with a >>> streaming >>> replication standby, which should be pretty much the same as crash >>> recovery. >>> No luck this far. >> >> >> Probably I could reproduce the same problem as Jeff got. Here is the test >> case: >> >> $ initdb -D data >> $ pg_ctl -D data start >> $ psql -c "create table t (i int); insert into t >> values(generate_series(1,10000)); delete from t" >> $ pg_ctl -D data stop -m i >> $ pg_ctl -D data start >> >> The crash recovery emitted the following server logs: >> >> LOG: database system was interrupted; last known up at 2012-02-14 >> 02:07:01 JST >> LOG: database system was not properly shut down; automatic recovery in >> progress >> LOG: redo starts at 0/179CC90 >> LOG: invalid magic number 0000 in log file 0, segment 1, offset 8060928 >> LOG: redo done at 0/17AD858 >> LOG: database system is ready to accept connections >> LOG: autovacuum launcher started >> >> After recovery, I could not see the table "t" which I created before: >> >> $ psql -c "select count(*) from t" >> ERROR: relation "t" does not exist > > > Are you still seeing this failure with the latest patch I posted > (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)?
Yes. Just to be safe, I again applied the latest patch to HEAD, compiled that and tried the same test. Then unfortunately I got the same failure again. I ran the configure with '--enable-debug' '--enable-cassert' 'CPPFLAGS=-DWAL_DEBUG', and make with -j 2 option. When I ran the test with wal_debug = on, I got the following assertion failure. LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/197 STATEMENT: create table t (i int); insert into t values(generate_series(1,10000)); delete from t LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap - insert: rel 1663/12277/16384; tid 0/198 STATEMENT: create table t (i int); insert into t values(generate_series(1,10000)); delete from t TRAP: FailedAssertion("!(((bool) (((void*)(&(target->tid)) != ((void *)0)) && ((&(target->tid))->ip_posid != 0))))", File: "heapam.c", Line: 5578) LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0 LOG: xlog bg flush request 0/17B4000; write 0/17B0000; flush 0/17B0000 LOG: server process (PID 16806) was terminated by signal 6: Abort trap This might be related to the original problem which Jeff and I saw. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers