On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > We're talking about a corrupt record (incorrect CRC, incorrect backlink > etc.), not errors within redo functions. During crash recovery, a corrupt > record means you've reached end of WAL. In standby mode, when streaming WAL > from master, that shouldn't happen, and it's not clear what to do if it > does. PANIC is not a good idea, at least if the server uses hot standby, > because that only makes the situation worse from availability point of view. > So we log the error as a WARNING, and keep retrying. It's unlikely that the > problem will just go away, but we keep retrying anyway in the hope that it > does. However, it seems that we're too aggressive with the retries.
Right. The attached patch calms down the retries: if we found an invalid record while streaming WAL from master, we sleep for 5 seconds (needs to be reduced?) before retrying to replay the record which is in the same location where the invalid one was found. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
calm_down_retries_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers