Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

Heikki Linnakangas Mon, 20 Aug 2012 08:09:51 -0700

On 20.08.2012 17:04, Tom Lane wrote:

Heikki Linnakangas<heikki.linnakan...@enterprisedb.com>  writes:

On 18.08.2012 08:52, Amit kapila wrote:

I think that missing check of total length has caused this problem. However now 
this check will be different.

That check still exists, in ValidXLogRecordHeader(). However, we now
allocate the buffer for the whole record before that check, based on
xl_tot_len, if the record header is split across pages. The theory in
allocating the buffer is that a bogus xl_tot_len field will cause the
malloc() to fail, returning NULL, and we treat that the same as a broken
header.


Uh, no, you misread it.  xl_tot_len is *zero* in this example.  The
problem is that RecordIsValid believes xl_len (and backup block size)
even when it exceeds xl_tot_len.

Ah yes, I see that now. I think all we need then is a check forxl_tot_len >= SizeOfXLogRecord.

I think we need to delay the allocation of the record buffer. We need to
read and validate the whole record header first, like we did before,
before we trust xl_tot_len enough to call malloc() with it. I'll take a
shot at doing that.


I don't believe this theory at all.  Overcommit applies to writing on
pages that were formerly shared with the parent process --- it should
not have anything to do with malloc'ing new space.  But anyway, this
is not what happened in my example.

I was thinking that we might read gigabytes worth of bogus WAL into thememory buffer, if xl_tot_len is bogus and large, e.g 0xffffffff. But nowthat I look closer, the xlog record is validated after reading the firstcontinuation page, so we should catch a bogus xl_tot_len value at thatpoint. And there is a cross-check with xl_rem_len on every continuationpage, too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New WAL code dumps core trivially on replay of bad data

Reply via email to