Jim Nasby <[EMAIL PROTECTED]> writes:

> Even if we stopped right there it would still be a huge win in many  (most?)
> cases. How often do the indexes on a table comprise even 50%  of the table's
> size? 

I would say they're usually roughly comparable actually. It depends on how
wide your table is of course but the wider your table rows the more indexes
you're likely to have on the table too.

> Even in the  50% case, you've gone from 1.5X to .6X

Sure, and a 3x speedup is nothing to sneeze at, that would be a great
improvement to vacuum. But it's still just a linear speedup and doesn't
address the algorithmic problem. 

The fundamental problem is we have a process that's O(m) where m is the total
space taken by a table and its indexes. The actual amount of space it has to
reclaim is n. Other than n<m there's basically no relationship between these
figures. As long as that's the case vacuum may as well be O(n^2) or O(n!).

We frequently assume -- and often it's a valid assumption -- that these
figures are roughly proportional. Hence all the talk about databases reaching
a "steady-state" where the amount of dead space is constantly being reclaimed
at more or less the same speed it's being generated. But there are also plenty
of use cases where a complete vacuum pass takes thousands of times longer than
the i/o it took to generate those dead tuples. Currently Postgres just isn't
that great a tool for those use cases.

Unfortunately while I'm convinced of the problem I'm equally unconvinced of
the solution. I tried to solve online index builds using retail index lookups
in a very similar way to what's being discussed here. And ran into the same
problems. I eventually decided that while it could be made to work that way it
would be far too much code, far too unsafe, and far too invasive in the index
access methods to be the right approach.

Our existing method works with minimal help from the index access methods
which allows for an enormous degree of freedom in the index design.. To be
able to support retail vacuum you would have to force index method
implementors to keep information in a way that allowed them to look up a
particular value/tid efficiently which would limit the kinds of indexes you
could support drastically.


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to