On Wed, Jan 10, 2018 at 11:28 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Sun, Jan 7, 2018 at 8:40 AM, Peter Geoghegan <p...@bowt.ie> wrote:
>> On Sat, Jan 6, 2018 at 2:20 PM, Stephen Frost <sfr...@snowman.net> wrote:
>>>> > IIRC the patches that makes the cleanup scan skip has a problem
>>>> > pointed by Peter[1], that is that we stash an XID when a btree page is
>>>> > deleted, which is used to determine when it's finally safe to recycle
>>>> > the page. Yura's patch doesn't have that problem?
>>>> >
>>>> > [1] 
>>>> > https://www.postgresql.org/message-id/CAH2-Wz%3D1%3Dt5fcGGfarQGcAWBqaCh%2BdLMjpYCYHpEyzK8Qg6OrQ%40mail.gmail.com
>>
>>> Masahiko Sawada, if this patch isn't viable or requires serious rework
>>> to be acceptable, then perhaps we should change its status to 'returned
>>> with feedback' and you can post a new patch for the next commitfest..?
>>
>> I believe that the problem that I pointed out with freezing/wraparound
>> is a solvable problem. If we think about it carefully, we will come up
>> with a good solution. I have tried to get the ball rolling with my
>> pd_prune_xid suggestion. I think it's important to not lose sight of
>> the fact that the page deletion/recycling XID thing is just one detail
>> that we need to think about some more.
>>
>> I cannot fault Sawada-san for waiting to hear other people's views
>> before proceeding. It really needs to be properly discussed.
>>
>
> Thank you for commenting.
>
> IIUC we have two approaches: one idea is based on Peter's suggestion.
> We can use pd_prune_xid to store epoch of xid of which the page is
> deleted. That way, we can correctly mark deleted pages as recyclable
> without breaking on-disk format.
>
> Another idea is suggested by  Sokolov Yura. His original patch makes
> btree have a flag in btpo_flags that implies the btree has deleted but
> not recyclable page or not. I'd rather want to store it as bool in
> BTMetaPageData. Currently btm_version is defined as uint32 but I think
> we won't use all of them. If we store it in part of btm_version we
> don't break on-disk format. However, we're now assuming that the
> vacuum on btree index always scans whole btree rather than a part, and
> this approach will nurture it more. It might be possible that it will
> become a restriction in the future.
>

On third thought, it's similar to what we are doing for heaps but we
can store the oldest btpo.xact of a btree index to somewhere(metapage
or pg_class.relfrozenxid?) after vacuumed. We can skip cleanup
vacuuming as long as the xid is not more than a certain threshold old
(say vacuum_index_cleanup_age). Combining with new parameter I
proposed before, the condition of doing cleanup index vacuum will
become as follows.

if (there is garbage on heap)
   Vacuum index, and update oldest btpo.xact
else /* there is no garbage on heap */
{
    if (# of scanned pages > (nblocks *
vacuum_cleanup_index_scale_factor) OR oldest btpo.xact is more than
vacuum_index_cleanup_age old?))
        Cleanup vacuum index, and update oldest btpo.xact
}

In current index vacuuming, it scans whole index pages if there is
even one garbage on heap(I'd like to improve it though someday), which
it also lead to update the oldest btpo.xact at the time. If
vacuum_cleanup_index_slace_factor is enough high, we can skip the
scanning whole index as long as either the table isn't modified for 2
billion transactions or the oldest btpo.xact isn't more than
vacuum_index_cleanup_age transactions old. But if there is a opened
transaction for a very long time, we might not be able to mark deleted
page as recyclable. Some tricks might be required. Feedback is
welcome.

Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply via email to