Hi, On Sat, Apr 22, 2023 at 12:53:23PM -0400, Isaac Morland wrote: > > I have an application for this: creating various dev/test versions of data > from production. > > Start by restoring a copy of production from backup. Then successively > create several altered versions of the data and save them to a place where > developers can pick them up. For example, you might have one version which > has all data old than 1 year deleted, and another where 99% of the > students/customers/whatever are deleted. Anonymization could also be > applied. This would give you realistic (because it ultimately originates > from production) test data. > > This could be done by starting a non-committable transaction, making the > adjustments, then doing a pg_dump in the same transaction (using --snapshot > to allow it to see that transaction). Then rollback, and repeat for the > other versions. This saves repeatedly restoring the (probably very large) > production data each time.
There already are tools to handle those use cases. Looks for instance at https://github.com/mla/pg_sample to backup a consistent subset of the data, or https://github.com/rjuju/pg_anonymize to transparently pg_dump (or interactively query) anonymized data. Both tool also works when connected on a physical standby, while trying to update data before dumping them wouldn't.