Re: [HACKERS] Behavior for crash recovery when it detects a corrupt WAL record

2012-10-10 Thread Heikki Linnakangas

On 10.10.2012 17:37, Amit Kapila wrote:

On Tuesday, October 09, 2012 7:38 PM Heikki Linnakangas wrote:

We rely on the CRC to detect end of WAL during recovery. If the
system crashes while the WAL is being flushed to disk, it's normal that
there's a corrupt (ie. partially written) record at the end of the WAL.
This is a common technique used by pretty much every system with a
transaction log / journal.


Yeah, Can't we check if there is a next valid page, then it can be
derived that current page has some corruption and not a partial page
write problem.


No. The OS or disk controller can flush the pages out-of-order, so on 
recovery, it's entirely possible that the next page is valid even if the 
previous one is not.


BTW, this means that the CRC on WAL records can *not* be used to detect 
random corruption of the WAL, because if will be confused with 
end-of-WAL. I don't think many people realize that. You will have to use 
a filesystem with checksums if you want to detect random bit errors etc. 
in the WAL. In crash recovery, anyway; in archive recovery or 
replication you can make more assumptions.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Behavior for crash recovery when it detects a corrupt WAL record

2012-10-10 Thread Amit Kapila
On Tuesday, October 09, 2012 7:38 PM Heikki Linnakangas wrote:
 On 09.10.2012 16:42, Amit Kapila wrote:
  I have observed that currently during recovery, while it applies the
 WAL
  records even if it detects that there is a corrupt record
 
  by crc validation, it proceeds.
 
  Basically ReadRecord(), returns NULL in such cases which makes the
 behavior
  same as it has reached end of WAL.
 
  After that server get started and user can perform operations
 normally.
 
 Yeah. We rely on the CRC to detect end of WAL during recovery. If the
 system crashes while the WAL is being flushed to disk, it's normal that
 there's a corrupt (ie. partially written) record at the end of the WAL.
 This is a common technique used by pretty much every system with a
 transaction log / journal.
 
 The other option would be to perform two fsyncs for every commit; one to
 flush the WAL to disk, and another to update some global pointer to
 point to the end of valid WAL (e.g in pg_control).

Yeah, Can't we check if there is a next valid page, then it can be derived
that
current page has some corruption and not a partial page write problem. 
Though it might not address problem in all scenarios like, with this we
can't identify if there are more valid records on same
Page where we find the CRC problem.

In general, do you think it is a genuine to give such feature to user as we
already have CRC on WAL records, so it is comparatively easy to detect
corruption.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Behavior for crash recovery when it detects a corrupt WAL record

2012-10-09 Thread Amit Kapila
I have observed that currently during recovery, while it applies the WAL
records even if it detects that there is a corrupt record

by crc validation, it proceeds. 

Basically ReadRecord(), returns NULL in such cases which makes the behavior
same as it has reached end of WAL.

After that server get started and user can perform operations normally. 

However ITSM that this is a problem as user might loose some committed data.

 

Is there any particular reason for this behavior?

 

With Regards,

Amit Kapila.



Re: [HACKERS] Behavior for crash recovery when it detects a corrupt WAL record

2012-10-09 Thread Heikki Linnakangas

On 09.10.2012 16:42, Amit Kapila wrote:

I have observed that currently during recovery, while it applies the WAL
records even if it detects that there is a corrupt record

by crc validation, it proceeds.

Basically ReadRecord(), returns NULL in such cases which makes the behavior
same as it has reached end of WAL.

After that server get started and user can perform operations normally.


Yeah. We rely on the CRC to detect end of WAL during recovery. If the 
system crashes while the WAL is being flushed to disk, it's normal that 
there's a corrupt (ie. partially written) record at the end of the WAL. 
This is a common technique used by pretty much every system with a 
transaction log / journal.


The other option would be to perform two fsyncs for every commit; one to 
flush the WAL to disk, and another to update some global pointer to 
point to the end of valid WAL (e.g in pg_control).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers