Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

Alban Hertroys Tue, 17 Jan 2023 00:05:22 -0800


> On 16 Jan 2023, at 15:37, HECTOR INGERTO <hector_...@hotmail.com> wrote:
> 
> > The database relies on the data being consistent when it performs crash 
> > recovery.
> > Imagine that a checkpoint is running while you take your snapshot.  The 
> > checkpoint
> > syncs a data file with a new row to disk.  Then it writes a WAL record and 
> > updates
> > the control file.  Now imagine that the table with the new row is on a 
> > different
> > file system, and your snapshot captures the WAL and the control file, but 
> > not
> > the new row (it was still sitting in the kernel page cache when the 
> > snapshot was taken).
> > You end up with a lost row.
> > 
> > That is only one scenario.  Many other ways of corruption can happen.
>  
> Can we say then that the risk comes only from the possibility of a checkpoint 
> running inside the time gap between the non-simultaneous snapshots?


I recently followed a course on distributed algorithms and recognised one of 
the patterns here.

The problem boils down to a distributed snapshotting algorithm, where both ZFS 
filesystem processes each initiate their own snapshot independently.

Without communicating with each other and with the database which messages (in 
this case traffic to and from the database to each FS) are part of their 
snapshots (sent or received), there are chances of lost messages, where either 
none of the process snapshots know that a 'message' was sent or none received 
it.

Algorithms like Tarry, Lai-Yang or the Echo algorithm solve this by adding 
communication between those processes about messages in transit.

Alban Hertroys
--
There is always an exception to always.

Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

Reply via email to