Hello list,

I am storing dumps of a database (pg_dump custom format) in a de-duplicating backup server. Each dump is many terabytes in size, so deduplication is very important. And de-duplication itself is based on rolling checksums which is pretty flexible, it can compensate for blocks moving by some offset.

Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new.

This means that the data has been significantly altered. The new dumps contain the same rows but probably in very different order. Could the row-order have changed when doing COPY FROM with pg_restore? No idea, but now that I think about it this can happen by many operations, like CLUSTER, VACUUM FULL etc so the question still applies.

A *logical* dump of data shouldn't be affected by on-disk order. Internal representation shouldn't affect the output.

This makes me wonder: Is there a way to COPY TO in primary-key order?

If that is possible, then pg_dump could make use of it.


Thanks in advance,
Dimitris


Reply via email to