Large table, *mostly* insert+select only, either a sequence or a timestamp
column set to current time. This is an ideal usecase for a BRIN index.

But then consider the same table. Except rows are typically updated once or
twice when they are new, and *then* go read only. And we also have a
process that at some point deletes *some* old rows (but not all - in fact,
only a small portion).

In this case, the next INSERT once VACUUM has run is likely to stick a
"new" row somewhere very "far back" in the table, since there is now free
space there. This more or less completely ruins the BRIN index usability,
as the "old" blocks will now contain a single row from a "new" series.

For a scenario like this, would it make sense to have an option that could
be set on an individual table making it physical append only? Basically
VACUUM would run as normal and clean up the old space when rows are deleted
back in history, but when new space is needed for a row the system would
never look at the old blocks, and only append to the end.

Yes, this wastes space in the database. But not as much as trying to track
the deleted rows somewhere else and do an anti-join or something like that.
And it would obviously not be on by default.

Eventually the lost space might grow to a point where re-CLUSTERing the
table might be worthwhile. But given the cost of that, I can see many
scenarios where people are just willing to pay that extra overhead on large
tables that are more or less never deleted from.

I've run into a number of cases recently where this would've made the BRIN
indexes on huge tables *much* more efficient. At least, it seems to me they
would :)

Or am I missing something that would make this not work?

 Magnus Hagander

Reply via email to