Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Heikki Linnakangas
On 18.08.2012 08:52, Amit kapila wrote: Tom Lane Sent: Saturday, August 18, 2012 7:16 AM so it merrily tries to compute a checksum on a gigabyte worth of data, and soon falls off the end of memory. In reality, inspection of the WAL file suggests that this is the end of valid data and what

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 18.08.2012 08:52, Amit kapila wrote: I think that missing check of total length has caused this problem. However now this check will be different. That check still exists, in ValidXLogRecordHeader(). However, we now allocate

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Andres Freund
On Monday, August 20, 2012 04:04:52 PM Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 18.08.2012 08:52, Amit kapila wrote: I think that missing check of total length has caused this problem. However now this check will be different. That check still

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Heikki Linnakangas
On 20.08.2012 17:04, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: On 18.08.2012 08:52, Amit kapila wrote: I think that missing check of total length has caused this problem. However now this check will be different. That check still exists, in

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 20.08.2012 17:04, Tom Lane wrote: Uh, no, you misread it. xl_tot_len is *zero* in this example. The problem is that RecordIsValid believes xl_len (and backup block size) even when it exceeds xl_tot_len. Ah yes, I see that

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-20 Thread Heikki Linnakangas
On 20.08.2012 18:25, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: I was thinking that we might read gigabytes worth of bogus WAL into the memory buffer, if xl_tot_len is bogus and large, e.g 0x. But now that I look closer, the xlog record is validated

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

2012-08-17 Thread Amit kapila
Tom Lane Sent: Saturday, August 18, 2012 7:16 AM The startup process's stack trace is #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15) at xlog.c:3713 3713COMP_CRC32(crc, XLogRecGetData(record), len); (gdb) bt #0 0x26fd1c in RecordIsValid