On Sun, Jan 15, 2023 at 8:18 PM HECTOR INGERTO <hector_...@hotmail.com> wrote:
> Hello everybody, > > > > I’m using PostgreSQL on openZFS. I use ZFS snapshots as a backup + > hotspare method. > > > > From man zfs-snapshot: “Snapshots are taken atomically, so that all > snapshots correspond to the same moment in time.” So if a PSQL instance is > started from a zfs snapshot, it will start to replay the WAL from the last > checkpoint, in the same way it would do in a crash or power loss scenario. So > from my knowledge, ZFS snapshots can be used to rollback to a previous > point in time. Also, sending those snapshots to other computers will allow > you to have hotspares and remote backups. If I’m wrong here, I would > appreciate being told about it because I’m basing the whole question on > this premise. > > > > On the other hand, we have the tablespace PGSQL feature, which is great > because it allows “unimportant” big data to be written into cheap HDD and > frequently used data into fast NVMe. > > > > So far, so good. The problem is when both ideas are merged. Then, > snapshots from different pools are NOT atomical, snapshot on the HDD pool > isn’t going to be done at the same exact time as the one on the SSD pool, > and I don’t know enough about PGSQL internals to know how dangerous this > is. So here is where I would like to ask for your help with the following > questions: > > > > First of all, what kind of problem can this lead to? Are we talking about > potential whole DB corruption or only the loss of a few of the latest > transactions? > Silent data corruption. *not* just losing your latest transaction. > In second place, if I’m initializing a corrupted PGSQL instance because > ZFS snapshots are from different pools and slightly different times, am I > going to notice it somehow or is it going to fail silently? > Silent. You might notice at the application level. Might. > In third and last place, is there some way to quantify the amount of risk > taken when snapshotting a PGSQL instance spread across two (or more) > different pools? > > > "Don't do it". If you can't get atomic snapshots, don't do it, period. You can use them together with a regular online backup. That is pg_start_backup() // <snapshot multiple volumes> // pg_stop_backup() together with log archiving. That's a perfectly valid method. But you cannot and should not rely on snapshots alone. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>