On Tue, Sep 6, 2016 at 2:39 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> I could attempt that, but I don't see the difference between
>> vacuum and create index in this case. Both could allocate a huge chunk
>> of the virtual address space if maintainance work mem says so, both
>> proportional to the size of the table. I can't see how that could take
>> any DBA by surprise.
> Really? CREATE INDEX isn't going to allocate more storage space than
> the size of the data actually being sorted, because tuplesort.c is
> smart about that kind of thing. But VACUUM will very happily allocate
> vastly more memory than the number of dead tuples. It is thankfully
> smart enough not to allocate more storage than the number of line
> pointers that could theoretically exist in a relation of the given
> size, but that only helps for very small relations. In a large
> relation that divergence between the amount of storage space that
> could theoretically be needed and the amount that is actually needed
> is likely to be extremely high. 1 TB relation = 2^27 blocks, each of
> which can contain MaxHeapTuplesPerPage dead line pointers. On my
> system, MaxHeapTuplesPerPage is 291, so that's 291 * 2^27 possible
> dead line pointers, which at 6 bytes each is 291 * 6 * 2^27 = ~218GB,
> but the expected number of dead line pointers is much less than that.
> Even if this is a vacuum triggered by autovacuum_vacuum_scale_factor
> and you're using the default of 0.2 (probably too high for such a
> large table), assuming there are about 60 tuples for page (which is
> what I get with pgbench -i) the table would have about 2^27 * 60 = 7.7
> billion tuples of which 1.5 billion would be dead, meaning we need
> about 9-10GB of space to store all of those dead tuples. Allocating
> as much as 218GB when we need 9-10GB is going to sting, and I don't
> see how you will get a comparable distortion with CREATE INDEX. I
> might be missing something, though.
CREATE INDEX could also allocate 218GB, you just need to index enough
columns and you'll get that.
Aside from the fact that CREATE INDEX will only allocate what is going
to be used and VACUUM will overallocate, the potential to fully
allocate the amount given is still there for both cases.
> There's no real issue when there's only one process running on the
> system at a time. If the user set maintenance_work_mem to an amount
> of memory that he can't afford to pay even once, then that's simple
> misconfiguration and it's not really our problem. The issue is that
> when there are 3 or potentially more VACUUM processes running plus a
> CREATE INDEX or two at the same time. If you set maintenance_work_mem
> to a value that is large enough to make the CREATE INDEX run fast, now
> with your patch that is also going to cause each VACUUM process to
> gobble up lots of extra memory that it probably doesn't need, and now
> you may well start to get failures. I've seen this happen even with
> the current 1GB limit, though you need a pretty small system - e.g.
> 8GB RAM - for it to be a problem. I think it is really really likely
> to cause big problems for us if we dramatically increase that limit
> without making the allocation algorithm smarter.
Ok, a pity it will invalidate all the testing already done though (I
was almost done with the testing).
I guess I'll send the results anyway.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: