On Mon, 2023-01-16 at 14:37 +0000, HECTOR INGERTO wrote:
> > The database relies on the data being consistent when it performs crash 
> > recovery.
> > Imagine that a checkpoint is running while you take your snapshot.  The 
> > checkpoint
> > syncs a data file with a new row to disk.  Then it writes a WAL record and 
> > updates
> > the control file.  Now imagine that the table with the new row is on a 
> > different
> > file system, and your snapshot captures the WAL and the control file, but 
> > not
> > the new row (it was still sitting in the kernel page cache when the 
> > snapshot was taken).
> > You end up with a lost row.
> > 
> > That is only one scenario.  Many other ways of corruption can happen.
>  
> Can we say then that the risk comes only from the possibility of a checkpoint 
> running
> inside the time gap between the non-simultaneous snapshots?

Another case: a transaction COMMITs, and a slightly later transaction reads the 
data
and sets a hint bit.  If the snapshot of the file system with the data 
directory in it
is slightly later than the snapshot of the file system with "pg_wal", the 
COMMIT might
not be part of the snapshot, but the hint bit could be.

Then these uncommitted data could be visible if you recover from the snapshot.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com


Reply via email to