On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> On 14.07.2011 18:57, Pavan Deolasee wrote:
>
>> On Thu, Jul 14, 2011 at 11:46 AM, Simon Riggs<si...@2ndquadrant.com>
>>  wrote:
>>
>>> I'd say that seems way too complex for such a small use case and we've
>>>
>>> only just fixed the bugs from 8.4 vacuum map complexity. The code's
>>> looking very robust now and I'm uneasy that such changes are really
>>> worth it.
>>>
>>>  Thanks Simon for looking at the patch.
>>
>> I am not sure if the use case is really narrow. Today, we dirty the pages
>> in
>> both the passes and also emit WAL records. Just the heap scan can take a
>> very long time for large tables, blocking the autovacuum worker threads
>> from
>> doing useful work on other tables. If I am not wrong, we use ring buffers
>> for vacuum which would most-likely force those buffers to be written/read
>> twice to the disk.
>>
>
> Seems worthwhile to me. What bothers me a bit is the need for the new
> 64-bit LSN value on each heap page. Also, note that temporary tables are not
> WAL-logged, so there's no LSNs.
>
>
Yeah, the good thing is we store it only when its needed. Temp and unlogged
tables don't need any special handling because we don't rely on the WAL
logging for the table itself. As you have noted down the thread, any counter
which is guaranteed to not wrap-around would have worked. LSN is just very
handy for the purpose (let me think more if we can just do with a flag).


> How does this interact with the visibility map? If you set the visibility
> map bit after vacuuming indexes, a subsequent vacuum will not visit the
> page. The second vacuum will update relindxvacxlogid/off, but it will not
> clean up the dead line pointers left behind by the first vacuum. Now the LSN
> on the page differs from the one stored in pg_class, so subsequent pruning
> will not remove the dead line pointers either. I think you can sidestep that
> if you check that the page's vacuum LSN <= vacuum LSN in pg_class, instead
> of equality.
>
>
Hmm. We need to carefully see that. As the patch stands, we don't set the
visibility map bit when there are any dead line pointers on the page. I'm
not sure if its clear, but the ItemIdIsDead() test accounts for dead as well
as dead-vacuum line pointers, so the test in lazy_heap_scan() won't let VM
bit to be set early. The next vacuum cycle would still look at the page and
set the bit if the page appears all-visible at that time. Note that in the
next vacuum cycle, we would first clear the dead-vacuum line pointers if its
not already done by some intermediate HOT-prune operation.



> Ignoring the issue stated in previous paragraph, I think you wouldn't
> actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
> doesn't matter. In fact, a single bit would be enough. After a successful
> vacuum, the counter on each heap page (with dead line pointers) is N, and
> the value in pg_class is N. There are no other values on the heap, because
> vacuum will have cleaned them up. When you begin the next vacuum, it will
> stamp pages with N+1. So at any stage, there is only one of two values on
> any page, so a single bit is enough. (But as I said, that doesn't hold if
> vacuum skips some pages thanks to the visibility map)
>
>
I am not sure if that can take care of aborted vacuum case with a single bit
or a counter that can wrap-around. Let me think more about it.



> Is there something in place to make sure that pruning uses an up-to-date
> relindxvacxlogid/off value? I guess it doesn't matter if it's out-of-date,
> you'll just miss the opportunity to remove some dead tuples.
>
>
Yeah, exactly. We just rely on the value that was read when the pg_class
tuple was last read. If we don't have the up-to-date value, we might miss
some opportunity to clean up those dead-vauum line pointers.


> Seems odd to store relindxvacxlogid/off as two int32 columns. Store it in
> one uint64 column, or invent a new datatype for LSNs, or store it as text in
> %X/%X format.
>
>
Yeah. I don't remember what issues I faced with uint64 column, may be did
not get around the case where uint64 is not natively supported on the
platform. But %X/%X looks very reasonable. Will change to that.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Reply via email to