On Thu, Feb 23, 2023 at 6:04 AM <marekmosiew...@gmail.com> wrote: > Hey, > > It depnends on scenario, but there is many use cases that hack data > change from somebody with admin privileges could be disaster. > That is the place where data history could come with help. Some basic > solution would be trigger which writes previous version of record > to some other table. Trigger however can be disabled or removed (crazy > solution would be to provide pernament > triggers and tables which can only be pernamently inserted). > Then we have also possibility to modify tablespace directly on disk. > > But Postgres has ability to not override records when two concurrent > transaction modify data to provide MVCC. > > So what about pernamently not vacuumable tables. Adding some xid log > tables with hash of record on hash on previous hash. > I think that would be serious additional advantage for best open source > relational databes. > > Best regards, > Marek Mosiewicz >
What you are describing sounds like the "system versioning" flavor of "temporal" tables. It's a part of the SQL Standard, but PostgreSQL has yet to implement it in core. Basically, every row has a start_timestamp and end_timestamp field. Updating a row sets the end_timestamp of the old version and inserts a new one with a start_timestamp matching the end-timestamp of the previous row. Once a record has a non-null [1] end_timestamp, it is not possible to update that row via SQL. Regular SQL statements effectively have a "AND end_timestamp IS NULL" filter on them, so the old rows are not visible without specifically invoking temporal features to get point-in-time queries. At the implementation level, this probably means a table with 2 partitions, one for live rows all having null end_timestamps, and one for archived rows which is effectively append-only. This strategy is common practice for chain of custody and auditing purposes, either as a feature of the RDBMS or home-rolled. I have also seen it used for developing forecasting models (ex "what would this model have told us to do if we had run it a year ago?"). A few years ago, I personally thought about implementing a hash-chain feature, but my research at the time concluded that: * Few customers were interested in going beyond what was required for regulatory compliance * Once compliant, any divergence from established procedures, even if it was an unambiguous improvement, only invited re-examination of it and adjacent procedures, and they would avoid that * They could get the same validation by comparing against a secured backup and out-of-band audit "logs" (most would call them "reports") * They were of the opinion that if a bad actor got admin access, it was "game over" anyway The world may have changed since then, but even if there is now interest, I wonder if that isn't better implemented at the OS level rather than the RDBMS level. [1] some implementations don't use null, they use an end-timestamp set to a date implausibly far in the future ( 3999-12-31 for example ), but the concept remains that once the column is set to a real timestamp, the row isn't visible to update statements.