On Wed, Sep 7, 2016 at 2:39 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 10:28 PM, Claudio Freire <klaussfre...@gmail.com>
>>> The problem with this is that we allocate the entire amount of
>>> maintenance_work_mem even when the number of actual dead tuples turns
>>> out to be very small. That's not so bad if the amount of memory we're
>>> potentially wasting is limited to ~1 GB, but it seems pretty dangerous
>>> to remove the 1 GB limit, because somebody might have
>>> maintenance_work_mem set to tens or hundreds of gigabytes to speed
>>> index creation, and allocating that much space for a VACUUM that
>>> encounters 1 dead tuple does not seem like a good plan.
>>> What I think we need to do is make some provision to initially
>>> allocate only a small amount of memory and then grow the allocation
>>> later if needed. For example, instead of having
>>> vacrelstats->dead_tuples be declared as ItemPointer, declare it as
>>> ItemPointer * and allocate the array progressively in segments. I'd
>>> actually argue that the segment size should be substantially smaller
>>> than 1 GB, like say 64MB; there are still some people running systems
>>> which are small enough that allocating 1 GB when we may need only 6
>>> bytes can drive the system into OOM.
>> This would however incur the cost of having to copy the whole GB-sized
>> chunk every time it's expanded. It woudln't be cheap.
> No, I don't want to end up copying the whole array; that's what I
> meant by allocating it progressively in segments. Something like what
> you go on to propose.
>> I've monitored the vacuum as it runs and the OS doesn't map the whole
>> block unless it's touched, which it isn't until dead tuples are found.
>> Surely, if overcommit is disabled (as it should), it could exhaust the
>> virtual address space if set very high, but it wouldn't really use the
>> memory unless it's needed, it would merely reserve it.
> Yeah, but I've seen actual breakage from exactly this issue on
> customer systems even with the 1GB limit, and when we start allowing
> 100GB it's going to get a whole lot worse.
>> To fix that, rather than repalloc the whole thing, dead_tuples would
>> have to be an ItemPointer** of sorted chunks. That'd be a
>> significantly more complex patch, but at least it wouldn't incur the
> Right, this is what I had in mind. I don't think this is actually
> very complicated, because the way we use this array is really simple.
> We basically just keep appending to the array until we run out of
> space, and that's not very hard to implement with an array-of-arrays.
> The chunks are, in some sense, sorted, as you say, but you don't need
> to do qsort() or anything like that. You're just replacing a single
> flat array with a data structure that can be grown incrementally in
> fixed-size chunks.
If we replaced dead_tuples with an array-of-array, isn't there
negative performance impact for lazy_tid_reap()?
As chunk is added, that performance would be decrease.
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: